Python regex issue

FreQUENCY · May 9, 2022

Hello.

I cant find a way to get the text inside a sentence in python. This is the code

import re
f=open("weather.txt", mode="r", encoding="utf8")
content=f.read()
f.close()

pattern=re.compile("[0-9]") # TO BE CHANGED!
result=pattern.findall(content)

print(result)
print(len(result))

I cant get the proper command for re.compile. For eg. I need to get the program to tell me the items inside the ( ).

C2dan88 · May 9, 2022

10 minutes ago, FreQUENCY said:

I need to get the program to tell me the items inside the ( )

So in weather.txt you have numbers within () that you want to match ie (123)

Regex would be

\([0-9]+\)

The backslash is used to mean match literal () as they have special meaning in regex.

FreQUENCY · May 9, 2022

@C2dan88The result i get is [] . Inside the parentheses are items like (24.2 C) or (25.2 C)

C2dan88 · May 9, 2022

The regex becomes a bit more complex

\(([0-9]+(\.[0-9]+)?) C\)

This interactive website will help to explain it

https://regex101.com/r/vI6O1w/1

FreQUENCY · May 9, 2022

3 minutes ago, C2dan88 said:
The regex becomes a bit more complex
$([0-9]+(\.[0-9]+)?) C$
This interactive website will help to explain it

https://regex101.com/r/vI6O1w/1

for eg the items inside (-1.7 C) will be output as ('0.7', '.7'), There is something with the decimals.I will look into the link in the meanwhile

JogerJ · May 9, 2022

Just now, FreQUENCY said:

for eg the items inside (-1.7 C) will be output as ('0.7', '.7'), There is something with the decimals.I will look into the link in the meanwhile

\((-?[0-9]+(?:\.[0-9]+)?) C\)

Use above pattern to include negatives. With this you should get only 1 output. Don't forget these are strings, you'll need to parse them into double (use "float(x)")

C2dan88 · May 9, 2022

3 minutes ago, FreQUENCY said:

will be output as ('0.7', '.7')

Ooops, forgot non capture group

Also added optional match for + or -

\(([+-]?[0-9]+(?:\.[0-9]+)?) C\)

FreQUENCY · May 9, 2022

@C2dan88 Works but i need to tweak it to get the C too. The result is

['-1.7', '0.7', '1.4', '1.5', '1.7', '1.9', '24.3', '24.2', '23.8', '23.3', '23.1', '22.8', '22.7', '22.6']
14

Is your way the "correct" way or is it a workaround for this case?

JogerJ · May 9, 2022

4 minutes ago, FreQUENCY said:
@C2dan88 Works but i need to tweak it to get the C too. The result is
['-1.7', '0.7', '1.4', '1.5', '1.7', '1.9', '24.3', '24.2', '23.8', '23.3', '23.1', '22.8', '22.7', '22.6']
14
Is your way the "correct" way or is it a workaround for this case?

Then simply extend the capture group to include the 'C'

Quote

\(([+-]?[0-9]+(?:\.[0-9]+)? C)\)

FreQUENCY · May 9, 2022

6 minutes ago, JogerJ said:

Then simply extend the capture group to include the 'C'

Works like a charm.I picked a course that teaches regex too and i am so lost.

I need to figure out how to:

get the word that is previous of the date
find the words that start with a Capitalised letter and have at least 3 characters in length.

@JogerJCan you please explain me in english how this is formed? So i can understand it ?

Quote

\(([+-]?[0-9]+(?:\.[0-9]+)? C)\)

\(([+-]?[0-9]+(?:\.[0-9]+)? C)\)

JogerJ · May 9, 2022

Well I don't like doing other people's homework but this should provide a clearer breakdown https://regexr.com/6l8cg
You should at least familiarize yourself with the cheatsheet so you know what you can do with regex, then work your way with it.
I'll provide some hints:

Getting word previous of date:
Pattern for the word belongs in first capture group, then follows a non-capturing group with the date
Capitalised 3+ char words:
First letter is uppercase, then followed by at 2 or more lowercase letters (or also uppercase or even numbers, depending on definition of "word")

Give a shot at it and I'll help correct mistakes you might make

PaulCam · June 15, 2022

I just want to drop this here.

When learning regular expressions, please, please, please read about "greediness", particular in relation to .* and please, please, please learn to "anchor" your regexps. This is by far the most common problem with regular expressions in the wild.

Example:

Make sure this is our domain: .*.ourdomain.com

Nope. That's not what that does. If fact. These match it.

www.yourdomain.com

lets-hack.ourdomain.com.xackorspz.org

Make sure this is our domain: ourdomain\.com$

Better. The .* is pointless and the $ forces this now to match only strings which end with the literal ourdomain.com. It's also simplier.

I would recommend, taking any code which uses a regular expression, pulling the regexp bit out into a method and unit testing the sh1t out of it. Give it a list of things, positive and negative tests. Send it round the team and make it a challenge. Can you break it?

Sign In

Python regex issue

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

This Perfectly Silent Fan Took 300 Years to Make

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

Wait wasn't this game dead??

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI