Jump to content

Lazy regex question

Castdeath97

Okay, so I have this string "1476707729-5804c5642b2cf-52590" and if I apply this regular expression ".*?[a-f]" it matches "1476707729-5804c" in pretty much all the online tools like regex 101 regardless of what options and engine I set. But, somehow my Web tech Professor thinks it matches "c". I'm I missing something here or is my Professor wrong?

If you want to reply back to me or someone else USE THE QUOTE BUTTON!                                                      
Pascal laptops guide

Link to comment
Share on other sites

Link to post
Share on other sites

.* is any character, 0 or more

Then we have ?, which makes it lazy, trying to match as few characters as possible (so 0).

And finally [a-f] which means the string has to end with that character.

 

So we have 0 characters (matching as few) and then just matching c, which falls in [a-f].

 

HAL9000: AMD Ryzen 9 3900x | Noctua NH-D15 chromax.black | 32 GB Corsair Vengeance LPX DDR4 3200 MHz | Asus X570 Prime Pro | ASUS TUF 3080 Ti | 1 TB Samsung 970 Evo Plus + 1 TB Crucial MX500 + 6 TB WD RED | Corsair HX1000 | be quiet Pure Base 500DX | LG 34UM95 34" 3440x1440

Hydrogen server: Intel i3-10100 | Cryorig M9i | 64 GB Crucial Ballistix 3200MHz DDR4 | Gigabyte B560M-DS3H | 33 TB of storage | Fractal Design Define R5 | unRAID 6.9.2

Carbon server: Fujitsu PRIMERGY RX100 S7p | Xeon E3-1230 v2 | 16 GB DDR3 ECC | 60 GB Corsair SSD & 250 GB Samsung 850 Pro | Intel i340-T4 | ESXi 6.5.1

Big Mac cluster: 2x Raspberry Pi 2 Model B | 1x Raspberry Pi 3 Model B | 2x Raspberry Pi 3 Model B+

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, jj9987 said:

.* is any character, 0 or more

Then we have ?, which makes it lazy, trying to match as few characters as possible (so 0).

And finally [a-f] which means the string has to end with that character.

 

So we have 0 characters (matching as few) and then just matching c, which falls in [a-f].

 

Yeah but shouldn't it match whatever is before it to get to the c? At least that what's most online tools seem to do.

If you want to reply back to me or someone else USE THE QUOTE BUTTON!                                                      
Pascal laptops guide

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, Castdeath97 said:

Yeah but shouldn't it match whatever is before it to get to the c? At least that what's most online tools seem to do.

It matches these too in addition to plain "c".

HAL9000: AMD Ryzen 9 3900x | Noctua NH-D15 chromax.black | 32 GB Corsair Vengeance LPX DDR4 3200 MHz | Asus X570 Prime Pro | ASUS TUF 3080 Ti | 1 TB Samsung 970 Evo Plus + 1 TB Crucial MX500 + 6 TB WD RED | Corsair HX1000 | be quiet Pure Base 500DX | LG 34UM95 34" 3440x1440

Hydrogen server: Intel i3-10100 | Cryorig M9i | 64 GB Crucial Ballistix 3200MHz DDR4 | Gigabyte B560M-DS3H | 33 TB of storage | Fractal Design Define R5 | unRAID 6.9.2

Carbon server: Fujitsu PRIMERGY RX100 S7p | Xeon E3-1230 v2 | 16 GB DDR3 ECC | 60 GB Corsair SSD & 250 GB Samsung 850 Pro | Intel i340-T4 | ESXi 6.5.1

Big Mac cluster: 2x Raspberry Pi 2 Model B | 1x Raspberry Pi 3 Model B | 2x Raspberry Pi 3 Model B+

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, jj9987 said:

.* is any character, 0 or more

Then we have ?, which makes it lazy, trying to match as few characters as possible (so 0).

And finally [a-f] which means the string has to end with that character.

 

So we have 0 characters (matching as few) and then just matching c, which falls in [a-f].

 

Also, someone in my class seemed to have found this as well: 

http://www.rexegg.com/regex-quantifiers.html#lazy

 

Not sure about its correctness.

If you want to reply back to me or someone else USE THE QUOTE BUTTON!                                                      
Pascal laptops guide

Link to comment
Share on other sites

Link to post
Share on other sites

Just now, jj9987 said:

It matches these too in addition to plain "c".

Aha, my Professor seems to think it just matches the plain c on its own for some reason.

If you want to reply back to me or someone else USE THE QUOTE BUTTON!                                                      
Pascal laptops guide

Link to comment
Share on other sites

Link to post
Share on other sites

2 minutes ago, Castdeath97 said:

Yeah but shouldn't it match whatever is before it to get to the c? At least that what's most online tools seem to do.

That depends on the implementation of the regex matching algorithm - in many regex implementations you'll have two functions (search vs match in python, for example) where one method (match, in the python example) requires that the regex matches from the beginning of the string (similar to adding an implicit '^' at the beginning of the RE), in which case it would match everything up to and including the first character in [a-f] as you expect. In the case of search, it does not require that the regex matches from the beginning of the string, in which case due to lazy matching it would not match the first part of the string, and only match the 'c' as your professor said. To clear up any confusion, you could explicitly add the '^' to your regular expression (which matches the empty string only at the beginning of a line), giving you "^.*?[a-f]", which certainly will match everything up to and including the first character matching [a-f].

 

Given that you mentioned he's your web professor, I expect you're talking about JavaScript, which off the top of my head I can't remember the implementation details for - the mozilla developer network documentation should prove helpful for clearing up the distinction. :)

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×