Sed command not working

babadoctor · June 2, 2017

So I want to cut out certain portions of a text file whenever a pattern is matched. The pattern would be to cut anything between https://www.twitch.tv/videos/ and ". The number of digits between those two strings would be a set amount, 9. (I don't think it will ever reach 10 digits. It probably wont, at least in the next 10 years.) I would then set that number as a variable, and use it in a separate command.

I tried running this:

cat output | sed -e 's^https://www.twitch.tv/videos/\(.*\)"^\1^'

But it didn't work! The command may look weird to you, but I needed to use a different delimiter instead of the default /, due to the actual pattern having /'s in it.

All it does is print out everything in the file, without running it through sed.

I got the command from this stackoverflow post:

https://stackoverflow.com/questions/13242469/how-to-use-sed-grep-to-extract-text-between-two-words (the answer)

As I said, I replaced the default delimiter with ^. It should work, as you can replace the delimiter with anything you want... but as I said, it didn't work.

Any ideas?

Azgoth 2 · June 3, 2017

Use the -E flag to use extended regexp syntax, and get rid of the backslashes around your parentheses. Then use special \1 escape character in place of a replacement string to print out the found match for the capturing parentheses. (\1-\9 refer to matched sub-strings; with capturing parentheses they can refer to what the parentheses found)

sed -E 's_https://www.twitch.tv/videos/(.*)"_\1_'

babadoctor · June 3, 2017

1 hour ago, Azgoth 2 said:

Use the -E flag to use extended regexp syntax, and get rid of the backslashes around your parentheses. Then use special \1 escape character in place of a replacement string to print out the found match for the capturing parentheses. (\1-\9 refer to matched sub-strings; with capturing parentheses they can refer to what the parentheses found)

sed -E 's_https://www.twitch.tv/videos/(.*)"_\1_'

It still doesn't seem to work... Am I piping the text into the command incorrectly?

This is the command:

cat output | sed -E 's_https://www.twitch.tv/videos/(.*)"_\1_'

This is the output:

https://hastebin.com/izezororem.scala

Original file:

https://hastebin.com/cuvudamisu.rb

Azgoth 2 · June 3, 2017

Ah, i see what's happening. I was testing this on a single random twitch.tv video URL--in your text it's replacing the url with just what comes after the /video/ part. Sed is really meant for manipulating text--for just matching substrings, you'll want (g)awk.

gawk 'match($0, /https:\/\/www\.twitch\.tv\/videos\/([^\"]*)\",/, arr) {print arr[1]}' output

Regular expressions in (g)awk are surrounded by / characters, s there's a lot of ugly escaping. Quotation marks are also escaped because they normally represent literal string delimiters.

babadoctor · June 3, 2017

48 minutes ago, Azgoth 2 said:
Ah, i see what's happening. I was testing this on a single random twitch.tv video URL--in your text it's replacing the url with just what comes after the /video/ part. Sed is really meant for manipulating text--for just matching substrings, you'll want (g)awk.
gawk 'match($0, /https:\/\/www\.twitch\.tv\/videos\/([^\"]*)\",/, arr) {print arr[1]}' output
Regular expressions in (g)awk are surrounded by / characters, s there's a lot of ugly escaping. Quotation marks are also escaped because they normally represent literal string delimiters.

It works! Great! Now all I need to figure out is how to make this detect all patterns in the file, and not stop after only detecting one pattern...

Do you have any ideas? (I am thinking of writing each argument to a variable, possibly with xargs -n1?)

Azgoth 2 · June 3, 2017

As long as you don't have multiple matches per line to worry about that should work--it'll print each match to a new line in stdout, based on my quick tests. Admittedly I don't use awk/gawk much so I wasn't aware it had issues with multiple matches per line until I looked it up just now. Frankly I'm starting to lean towards just writing a script in Python or whatever language you're comfortable with to do the matching for you:

#!/usr/bin/python3
import re
f = open("/path/to/file", "r").read()
for i in re.findall('https://www\.twitch\.tv/videos/(.*?)",', f):
  print(i)

# or for multiple files in a single directory
import os
import re
files = [i.path for i in os.scandir("/path/to/files")]
for F in files:
  f = open(F, "r").read()
  for i in re.findall('https://www\.twitch\.tv/videos/(.*?)",', f):
    print(i)

That has no issue matching across multiple lines or multiple matches within a line, again based on some of my quick and hacky testing.

Sign In

Sed command not working

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

The Wiiiiiiiiiiiiiiide Gaming Setup

Latest From Tech Quickie:

Yes, It’s Real: PCI Express x32

Latest From TechLinked:

M4 Already!?

Latest From GameLinked:

More Xbox Studios Will Close ...

Latest From ShortCircuit:

I need to test a theory… - AYANEO Flip DS

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!

My Activity Streams