Web Crawling / HTML External Source crawler

N3rot0xin · January 29, 2015

Just got a quick question here.. if I want to see a list of links of external media (such as stylesheets or images) used on a website, is there a tool or a website for that?

For example, you want to add yahoo.com to an exclusion rule on a firewall. But if you ONLY add yahoo.com, you get a white and black page. How would you go about finding out the other links needed, other then viewing the source?

ekv · January 29, 2015

If ur looking for somthing as downloading a whole webiste on local PC you can use "HTTrack Website Copier".

But if ur looking for somthing to like take data from news site and put on ur webiste, then you can use PHP script as "PHP Simple HTML DOM Parser".

Google both, you will find them easy.

keja · January 29, 2015

you can do a simple script to get all sources.

curl domain.tld | grep -o 'src="[^"]*"'

N3rot0xin · January 29, 2015

Basically all I need was something where I can see all the sources a certain webpage uses. The use case for me was adding yahoo.com to a firewall exclusion, but I was getting a white page with simple text because the firewall was blocking all of the other source links not hosted in a *yahoo.com domain. What @keja said was pretty close, but also not something I can do from a windows computer. This is kinda of for work, so i'm fairly limited hence why i'm looking for a site that can do that.

ekv · January 31, 2015

You can do that with Linux (Not sure for Windows) or try to make PHP script for that.

Sign In

Web Crawling / HTML External Source crawler

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Topics

Latest From Linus Tech Tips:

The Future of PC Cooling?

Latest From ShortCircuit:

The coolest looking monitor. Period. - ASUS ROG display at Computex (Sponsored)

Latest From TechLinked:

Microsoft Just Can’t Help Itself

Latest From GameLinked:

Gamers, We’re Eatin’ Good

Latest From Tech Quickie:

Who's Tracking Your Phone Right Now?

Latest From The WAN Show:

Pizza Hut is Being Sued Over AI

My Activity Streams