Jump to content

Hello.

I cant find a way to get the text inside a sentence in python. This is the code 

 

import re
f=open("weather.txt", mode="r", encoding="utf8")
content=f.read()
f.close()

pattern=re.compile("[0-9]") # TO BE CHANGED!
result=pattern.findall(content)

print(result)
print(len(result))

I cant get the proper command for re.compile. For eg. I need to get the program to tell me the items inside the ( ).

 

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/
Share on other sites

Link to post
Share on other sites

10 minutes ago, FreQUENCY said:

I need to get the program to tell me the items inside the ( )

So in weather.txt you have numbers within () that you want to match ie (123)

Regex would be

\([0-9]+\)

The backslash is used to mean match literal () as they have special meaning in regex.

 

 

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/#findComment-15388399
Share on other sites

Link to post
Share on other sites

3 minutes ago, C2dan88 said:

The regex becomes a bit more complex

\(([0-9]+(\.[0-9]+)?) C\)

This interactive website will help to explain it

https://regex101.com/r/vI6O1w/1

for eg  the items inside (-1.7 C) will be output as ('0.7', '.7'), There is something with the decimals.I will look into the link in the meanwhile

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/#findComment-15388411
Share on other sites

Link to post
Share on other sites

Just now, FreQUENCY said:

for eg  the items inside (-1.7 C) will be output as ('0.7', '.7'), There is something with the decimals.I will look into the link in the meanwhile

\((-?[0-9]+(?:\.[0-9]+)?) C\)

Use above pattern to include negatives. With this you should get only 1 output. Don't forget these are strings, you'll need to parse them into double (use "float(x)")

If you found my answer to your post helpful, be sure to react or mark it as solution 😄

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/#findComment-15388418
Share on other sites

Link to post
Share on other sites

4 minutes ago, FreQUENCY said:

@C2dan88 Works but i need to tweak it to get the C too. The result is

['-1.7', '0.7', '1.4', '1.5', '1.7', '1.9', '24.3', '24.2', '23.8', '23.3', '23.1', '22.8', '22.7', '22.6']
14

Is your way the "correct" way or is it a workaround for this case?

Then simply extend the capture group to include the 'C'

Quote
\(([+-]?[0-9]+(?:\.[0-9]+)? C)\)

 

If you found my answer to your post helpful, be sure to react or mark it as solution 😄

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/#findComment-15388429
Share on other sites

Link to post
Share on other sites

6 minutes ago, JogerJ said:

Then simply extend the capture group to include the 'C'

 

Works like a charm.I picked a course that teaches regex too and i am so lost.

I need to figure out how to:

  • get the word that is previous of the date
  • find the words that start with a Capitalised letter and have at least 3 characters in length.

@JogerJCan you please explain me in english how this is formed? So i can understand it ?

Quote
\(([+-]?[0-9]+(?:\.[0-9]+)? C)\)
\(([+-]?[0-9]+(?:\.[0-9]+)? C)\)

 

 

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/#findComment-15388438
Share on other sites

Link to post
Share on other sites

Well I don't like doing other people's homework but this should provide a clearer breakdown https://regexr.com/6l8cg
You should at least familiarize yourself with the cheatsheet so you know what you can do with regex, then work your way with it.
I'll provide some hints:

  • Getting word previous of date:
    Pattern for the word belongs in first capture group, then follows a non-capturing group with the date
  • Capitalised 3+ char words: 
    First letter is uppercase, then followed by at 2 or more lowercase letters (or also uppercase or even numbers, depending on definition of "word")

Give a shot at it and I'll help correct mistakes you might make

If you found my answer to your post helpful, be sure to react or mark it as solution 😄

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/#findComment-15388757
Share on other sites

Link to post
Share on other sites

  • 1 month later...

I just want to drop this here.

 

When learning regular expressions, please, please, please read about "greediness", particular in relation to .* and please, please, please learn to "anchor" your regexps.  This is by far the most common problem with regular expressions in the wild.

 

Example:

Make sure this is our domain:    .*.ourdomain.com

 

Nope.  That's not what that does.  If fact.  These match it.

www.yourdomain.com

lets-hack.ourdomain.com.xackorspz.org

Make sure this is our domain:    ourdomain\.com$

 

Better.  The .* is pointless and the $ forces this now to match only strings which end with the literal ourdomain.com.  It's also simplier.

 

I would recommend, taking any code which uses a regular expression, pulling the regexp bit out into a method and unit testing the sh1t out of it.  Give it a list of things, positive and negative tests.  Send it round the team and make it a challenge.  Can you break it?

Link to comment
https://linustechtips.com/topic/1430008-python-regex-issue/#findComment-15439356
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×