Jump to content

Python 3 help with RegEx To Split User Input

straight_stewie

So I thought that I had a good opportunity to try to learn RegEx. It's something I've wanted to do for a while, but have never had the actual willingness to try it. I have this basic code here:


This program takes the radius and height of a cylinder and tells the user the surface area and volume. 

from math import pi


# Abusing the system to validate user input. I would like to add more functionality to this.
def get_input():
    my_list = []

    while True:
        try:
            my_list.append(float(input("Please enter the height of your cylinder: ")))
            my_list.append(float(input("Please enter the cylinders radius: ")))
            return my_list

        except:
            print("You did not enter a number. Please enter only numbers containing digits 0-9 and decimal points '.'")


# Return the volume of a cylinder
def cylinder_volume(radius, height):
    return pi * (radius ** 2) * height


# Return the surface area of a cylinder
def cylinder_area(radius, height):
    return (2 * pi * radius * height) + (2 * pi * (radius ** 2))


for i in range(0, 3):
    my_list = get_input()
    print("Volume: {v}\tSurface Area: {s}" .format(v=cylinder_volume(my_list[1], my_list[0]), s=cylinder_area(my_list[1], my_list[0])))


So that works all fine and dandily, but what if I want the user to be able to enter units? The simplest version is an Imperial only version where the user can enter something like "4.5 feet" and "3 feet" and the program will return that with the units appended to the end. A more complex example of what I would one day like to do is to be able to have the user enter the radius as "4.5 feet" and the height as "300 centimeters", do the correct unit conversions, and spit out the answer with the correct units in the result string.

I have tried to teach myself RegEx to accomplish this task (is this even the right thing to use?) but have thus far not been successful. Here are some examples of what I've tried so far:
 

radius_match = re.split(r"\d[inches|feet]", user_input)
radius_match = re.split(r"[0-9inches|feet]", user_input)
radius_match = re.split(r"[inches|feet]", user_input)
radius_match = re.split(r"[feet]", user_input)
radius_match = re.split(r"feet", user_input)
radius_match = re.split...

radius_match = re.match<insert all of the above examples and more>

radius_match = re.findall<insert many variations of patterns>

... This continues on and on. I have been working on this for two weeks.


No matter what I do it seems that I'm unable to teach myself RegEx. For whatever reason, I can't even get simple things to work like "return true if found 'STRING' in 'abddfkjdskfjdkSTRINGdslfjldkfsd'".

Is there anyone here that's willing to help me do this? I don't just want the aforementioned cylinder problem solved, I want to learn RegEx, that's just a problem that I thought might be good to use to learn with.

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

I'm not best at regexp but I can do some simple ones, I figured out this:

https://regex101.com/r/W7uqPl/2

 

I used there two ways of matching the number, one is more complex and probably slower and second is less complex, probably faster.

One is (\d+(?:\.\d+)?) and another is ([\d\.]+)

 

Working example in python3:

import re

m = re.search(r'^(\d+(?:\.\d+)?)\s+((?:feet|inche?)s?)\s+([\d\.]+)\s+(centimeters?)', '1    feets       24        centimeter')
print(m.groups())

The result of above code is ('1', 'feets', '24', 'centimeter') 

 

 

the site does some explanations itself. As I am writing this post, I started to explaining this regexp, but it may be too complex as first regexp to learn from, so I abandoned those explanations, Instead I would like to start with something easier, like matching string in text.

 

Usually regular expresions tarts and ends with some delimiter, for example /test/gi where / is delimiter, test is match expression and gi are flags. Python seems to not accept those delimiters, but you can still apply flags as third argument of eg. search function. In python, search function looks for match anywhere, and match does it only at the beginning of string. I would say that search('^test') is similar (if not equal, I don't know for sure) to match('test'). ^ says that match has to start at beginning of the string.

 

For examples I will be match on this text:

This is 1st example text.

Such regexp:

1st

will match "1st" (so it will return MatchObject, for example "1set" would result in None) but wont return anything as there is no groups. If we surround it with brackets we will be creating group:

(1st)

And this will result in groups returning ('1st', )

 

If we would match any numer like 1st 2st 3st and so one we can use \d which stands for any digit 0-9 (There is also \s, I used it in the complex one regexp, and it stands for whitespace characters such as space or tabulator, \d and \s are only examples, there is more)

(\dst)

but there is no such thing as 2st, there is 2nd and 3rd and 4th so we can add another group that will match for those too

(\d(st|nd|rd|th))

But as we have two groups, the result will return them both, like: ('1st', 'st'), to still have a group but not return it in result you can use not-capturing group.

(\d(?:st|nd|rd|th))

Also \d matches any but only one digit, to make it match many digits you would do \d+, + means one or more, there is also * which mean zero or more.

If we woudl have "This is 10th example text." the match would be 0th. (there is also . which stands for any single character, you can notice \. in the complex regexp, \ escapes dot wildcard so it match dot itself, not any character)

(\d+(?:st|nd|rd|th))

since now on 1st matches and 10th also matches as whole.

 

I could go on, on this example and improve it, but this post is getting longer and longer, if you have any questions that I am able to answer to, I will try. As of this example regexp, there is much to improve. If you would like to validate data with regex you would need to improve the regexp so it wouldn't match eg. 2st and 10st. But if you have set of data (that is probably valid) you want to parse trough and extract numbers, the above example would be enough, as it does matches all numbers with st nd rd and th.

 

P.S.: As I was writing this post, I learned non-capturing groups, this may say for my regexp skills that they're basic.

Link to comment
Share on other sites

Link to post
Share on other sites

10 hours ago, straight_stewie said:

....

1.No matter what I do it seems that I'm unable to teach myself RegEx.

...

2. I don't just want the aforementioned cylinder problem solved,

 

1. I want to learn RegEx,

 

2. that's just a problem that I thought might be good to use to learn with.

 

#!/usr/bin/python3

'''
Always mark your code as your own. Not too shabby. CRSaka... :)

Abusing the system to validate user input.
I would like to add more functionality to this.
'''

'''
used to match unit input with conversion to centimeters.
try using a regex to replace the list provided. after the conversion number :)
'''
convert_dict = {
			0.39370: ["inches","inch","i"], 
			1: ["centimeters","centimeter","c"]
		}

def convert(message):
	match = 0
	while match != 1:
		try:
			measurment, unit = input(message).split()
			for divideBy,ListMatch in convert_dict.items():
				if unit in ListMatch:
					match = 1
					return(int(measurment)/divideBy,"centimeters")
			if match == 0:
				print("\nError unsupported unit of measurement!")
				print("Supported units:\ninch, centimeter")
				continue #return to start of while loop
		except:
			print("\nERROR invalid format!")
			print("Please provide measurement and unit of measurement")
			continue #return to start of while loop

def get_input():
	# h is the measurement returned, u is unit or 'centimeter' for this snippet
	h,u = convert("Enter the height of your cylinder: ")
	return("%s %s"% (h,u))

if __name__ == '__main__':
	print(get_input())

 

I've provided a measurement converter that standardizes valid input into centimeters(RedEx challenge is to change the dictionary lists to a regex string that can be used in place and still support multiple input values. ;-P)

 

1. I'd suggest finding a text editor that supports regex replace (or search) and just brute force it. regex documentation is far from intuitive and it's really a pain to learn through code. the text editor will give live highlighting and let's you know what concept you're understanding and where the gaps are.

 

2. No solutions just a non regex snippet to help you on your project. repost your code if you end up getting the challenge done :)

BitBucket/Github:

PM if interested.

In accordance to Forum Community Standards:

No advertising of any non LTT/LMG material

  • Personal websites or businesses

Signatures

  • No Advertising/External Links, other than to other LTT forum posts

Link to comment
Share on other sites

Link to post
Share on other sites

Could you not ask what unit to use before taking the lengths? seems like a much simpler solution.

                     ¸„»°'´¸„»°'´ Vorticalbox `'°«„¸`'°«„¸
`'°«„¸¸„»°'´¸„»°'´`'°«„¸Scientia Potentia est  ¸„»°'´`'°«„¸`'°«„¸¸„»°'´

Link to comment
Share on other sites

Link to post
Share on other sites

7 hours ago, vorticalbox said:

Could you not ask what unit to use before taking the lengths? seems like a much simpler solution.

The goal isn't to solve this problem, the goal is to learn ReGex

ENCRYPTION IS NOT A CRIME

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×