Jump to content

Need help with PowerShell, trouble with HTML

Hey guys, hopefully I can explain this well enough to get some answers.

I started learning PowerShell for work and school, but I'm still having trouble using it regarding web content, like parsing HTML data. I'm used to DOS/Command Prompt, but today I'm trying to get some HTML text that was commented out using the old <!-- --> tags. I can see it in Chrome with inspect element but I can't get PowerShell to output that comment when I'm using the wget cmdlet. If it helps, I know where it is inside of the HTML script. It's right before the </body> closing tag. I'll post what I've written so far. For privacy reasons I'm not including the page itself.

 

My .ps1 so far:

 

 wget https://google.com | findstr example

 

From here I just want to pull that example text (which represents the commented out text) and display it inside of PowerShell when I execute it.

Link to comment
Share on other sites

Link to post
Share on other sites

Hey mate,

 

I mostly deal with Linux systems, but I know that Powershell has curl.  Wget is a tool that downloads a webpage, and can download files via http.  But it doesn't download the HTML. 

 

You can use the Curl command to pull down raw HTML. So you were on the right page, but just not using the correct command.  Try the following command:

 

curl https://google.com | findstr example

It should work, or at least put you on the right path. 

 

Cheers,

Link to comment
Share on other sites

Link to post
Share on other sites

Thank you, but no luck. I should have said I attempted it with curl too. My bad. It gets my the same result as wget sadly (as in what PowerShell prints in the CLI). The comment out text is still ignored, so I could be doing something wrong on my end. I did have an acquaintance write their work to me, but it doesn't accomplish what I was trying to get at.

Link to comment
Share on other sites

Link to post
Share on other sites

wget in powershell is a wrapper for Invoke-Webrequest. 

 

Invoke-Webrequest returns an HtmlWebResponseObject object, not the raw source code of the file you requested. So piping to findstr wont work.

 

This article explains how to scrape a webpage with Invoke-Webrequest

https://4sysops.com/archives/powershell-invoke-webrequest-parse-and-scrape-a-web-page/

 

For finding the string in a html comment you will do something like this

$WebResponse = wget "page url here"
$WebResponse.AllElements | Where {$_.TagName -eq "!"} | findstr "text to search here"

 

Link to comment
Share on other sites

Link to post
Share on other sites

Well I figured it out on my own. I used:

Invoke-RestMethod http://google.com | findstr example

This was effective for some reason although I don't understand why. I'm just glad it parsed it for me but my job isn't done. Now I have to set the last line to a variable, trim and strip the output until I get only the example and not the entire line containing "example".

Link to comment
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×