Jump to content

Web Crawling / HTML External Source crawler

Just got a quick question here.. if I want to see a list of links of external media (such as stylesheets or images) used on a website, is there a tool or a website for that?

 

For example, you want to add yahoo.com to an exclusion rule on a firewall. But if you ONLY add yahoo.com, you get a white and black page. How would you go about finding out the other links needed, other then viewing the source?

 

 

I am whatever I am. 

 

 

Link to comment
https://linustechtips.com/topic/300132-web-crawling-html-external-source-crawler/
Share on other sites

Link to post
Share on other sites

If ur looking for somthing as downloading a whole webiste on local PC you can use "HTTrack Website Copier".

But if ur looking for somthing to like take data from news site and put on ur webiste, then you can use PHP script as "PHP Simple HTML DOM Parser".

Google both, you will find them easy.

Link to post
Share on other sites

Basically all I need was something where I can see all the sources a certain webpage uses. The use case for me was adding yahoo.com to a firewall exclusion, but I was getting a white page with simple text because the firewall was blocking all of the other source links not hosted in a *yahoo.com domain. What @keja said was pretty close, but also not something I can do from a windows computer. This is kinda of for work, so i'm fairly limited hence why i'm looking for a site that can do that.

I am whatever I am. 

 

 

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×