Jump to content

Hey all, so I have this file of links:

https://domain.com/works/11017482/other tag
https://domain.com/works/1122855

And this is my current code:

file = open("Links.txt", "r")   

for i in range(2):
    line = file.readline()
    print(line)
    removedEnd = re.search('[https://domain.com/works/0-9]', '', str(line))
    print(removedEnd)

How can I change it so that it only outputs "http://domain.com/works/list_of_numbers" no matter which version of the URL is in the file?

 

The random bit of numbers after "/works/" can change in length too.

 

Thanks!

Link to comment
https://linustechtips.com/topic/1153176-python-string-manipulation-help/
Share on other sites

Link to post
Share on other sites

1 minute ago, IAmAndre said:

The regular expression is wrong. It should be something like 

Sorry for the poor formatting, I'm on mobile.

So I changed that, but I get an error:

TypeError: unsupported operand type(s) for &: 'str' and 'int'

 

Link to post
Share on other sites

10 minutes ago, IAmAndre said:

You are using the function wrong because you're not passing the right types of parameters. So you can just check the documentation and see what types are expected.

What types does it need then?

 

Sorry, I'm just really new to regex and am trying to learn it.

Link to post
Share on other sites

3 hours ago, KeyboardShortcuts said:

What types does it need then?

 

Sorry, I'm just really new to regex and am trying to learn it.

So you needed to check the documentation of the search function. As you can see, this is how it should be used:

re.search(pattern, string, flags=0)

So in your case, it should be

re.search('[https://domain.com/works/0-9]', str(line)) // You don't need to set flags for that

On the same documentation, you can see that the function will return a Match object, which isn't what you need. You might want to use split() instead.

Link to post
Share on other sites

file = open("Links.txt", "r")   

for i in range(2):
    line = file.readline()
    print(line)
    matches = re.match('https://domain.com/works/[0-9]+', line)
    if not matches:
    	print("FAILURE to match regex.")
    else:
    	removedEnd = matches.group(0)
        print(removedEnd)

 

The above should work. As IAmAndre mentioned the pattern needed to be changed as well as the arguments to search/match. I used match instead of search, but either should work fine for your use case.

 

Feel free to ask more questions as you try to understand and modify the code.

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×