Jump to content

Python string manipulation help

KeyboardShortcuts

Hey all, so I have this file of links:

https://domain.com/works/11017482/other tag
https://domain.com/works/1122855

And this is my current code:

file = open("Links.txt", "r")   

for i in range(2):
    line = file.readline()
    print(line)
    removedEnd = re.search('[https://domain.com/works/0-9]', '', str(line))
    print(removedEnd)

How can I change it so that it only outputs "http://domain.com/works/list_of_numbers" no matter which version of the URL is in the file?

 

The random bit of numbers after "/works/" can change in length too.

 

Thanks!

Link to comment
Share on other sites

Link to post
Share on other sites

The regular expression is wrong. It should be something like 

Quote

https://domain.com/works/[0-9]+

Sorry for the poor formatting, I'm on mobile.

Link to comment
Share on other sites

Link to post
Share on other sites

1 minute ago, IAmAndre said:

The regular expression is wrong. It should be something like 

Sorry for the poor formatting, I'm on mobile.

So I changed that, but I get an error:

TypeError: unsupported operand type(s) for &: 'str' and 'int'

 

Link to comment
Share on other sites

Link to post
Share on other sites

Check the documentation of the function and see what types are expected. You can send me the link if you want.

Link to comment
Share on other sites

Link to post
Share on other sites

I was talking about the search function. It's what's giving you the bug. The expression is correct.

Link to comment
Share on other sites

Link to post
Share on other sites

3 minutes ago, IAmAndre said:

I was talking about the search function. It's what's giving you the bug. The expression is correct.

Which function should I use then?

Link to comment
Share on other sites

Link to post
Share on other sites

You are using the function wrong because you're not passing the right types of parameters. So you can just check the documentation and see what types are expected.

Link to comment
Share on other sites

Link to post
Share on other sites

10 minutes ago, IAmAndre said:

You are using the function wrong because you're not passing the right types of parameters. So you can just check the documentation and see what types are expected.

What types does it need then?

 

Sorry, I'm just really new to regex and am trying to learn it.

Link to comment
Share on other sites

Link to post
Share on other sites

3 hours ago, KeyboardShortcuts said:

What types does it need then?

 

Sorry, I'm just really new to regex and am trying to learn it.

So you needed to check the documentation of the search function. As you can see, this is how it should be used:

re.search(pattern, string, flags=0)

So in your case, it should be

re.search('[https://domain.com/works/0-9]', str(line)) // You don't need to set flags for that

On the same documentation, you can see that the function will return a Match object, which isn't what you need. You might want to use split() instead.

Link to comment
Share on other sites

Link to post
Share on other sites

file = open("Links.txt", "r")   

for i in range(2):
    line = file.readline()
    print(line)
    matches = re.match('https://domain.com/works/[0-9]+', line)
    if not matches:
    	print("FAILURE to match regex.")
    else:
    	removedEnd = matches.group(0)
        print(removedEnd)

 

The above should work. As IAmAndre mentioned the pattern needed to be changed as well as the arguments to search/match. I used match instead of search, but either should work fine for your use case.

 

Feel free to ask more questions as you try to understand and modify the code.

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×