Jump to content

I am trying to develop a application that you enter a URL and it will scan the site for media and download it. I got this idea off IDM when I would enter a website and it would download the media instantly. (I didn't even mean it to). Do you think this would be possible?

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/
Share on other sites

Link to post
Share on other sites

depends on the way the files are downloaded, if it's a direct link to the file, it's possible :)

something like this:

<a href="site.com/download/fileiwannadownload.pdf">File :D</a>

you can just search for a tags with href in them and look at the extension :)

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7927097
Share on other sites

Link to post
Share on other sites

27 minutes ago, mattonfire said:

I am trying to develop a application that you enter a URL and it will scan the site for media and download it. I got this idea off IDM when I would enter a website and it would download the media instantly. (I didn't even mean it to). Do you think this would be possible?

so you want a web browser without the web ?

~New~  BoomBerryPi project !  ~New~


new build log : http://linustechtips.com/main/topic/533392-build-log-the-scrap-simulator-x/?p=7078757 (5 screen flight sim for 620$ CAD)LTT Web Challenge is back ! go here  :  http://linustechtips.com/main/topic/448184-ltt-web-challenge-3-v21/#entry601004

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7927172
Share on other sites

Link to post
Share on other sites

It's possible. I've personally dealt with one that gets the .jpg files from instagram but I don't see why all else would be much different. If you want it to work with any site and any media, you're in for a world of pain. The admins do actual work to prevent stuff like this. Generate stuff on the fly with serverside scripts and mess with the source code and urls. But if it's one site with one type of media, it can be easy as pie.

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7927191
Share on other sites

Link to post
Share on other sites

3 hours ago, Naeaes said:

It's possible. I've personally dealt with one that gets the .jpg files from instagram but I don't see why all else would be much different. If you want it to work with any site and any media, you're in for a world of pain. The admins do actual work to prevent stuff like this. Generate stuff on the fly with serverside scripts and mess with the source code and urls. But if it's one site with one type of media, it can be easy as pie.

If I do inspect element it's just a URL box and it links straight to a url with a MP4

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7928031
Share on other sites

Link to post
Share on other sites

21 hours ago, mattonfire said:

If I do inspect element it's just a URL box and it links straight to a url with a MP4

If the tag you are looking for never changes and the URL is plaintext inside the tags you can just use either method of getElementsByTagName or selectNodes (Google can help you implement either of those options) and then grab the innertext and download the file with that URL. As previously mentioned if it resides within a tag attribute itself (I.E. <a>) you can get that tag attribute (href) and use that URL to download the file. If there are multiple <a> tags you will need to look at the href attributes and find the one with an extension matching the types of files you want to download (.mp4 in this instance). Hope this helps steer you in the right direction.

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7932601
Share on other sites

Link to post
Share on other sites

Well..C++ is Turing complete...so I would have to say yes.

However, I would highly recommend using a different language for this task, with perl or python being recommended.  Just download the page to a buffer and run a regular expression on it.  It should look something like this.

<a[ a-zA-z0-9]*\s+href=\"([a-zA-Z0-9\\\/]+\.mp4)\"[ a-zA-z0-9]*>

That's probably not entirely correct, as it is off the top of my head, but both perl and python are largely focused around text processing, and as a consequence have very nice and easy to use regular expression features.  It will capture the part in the parenthesis, which will be the link you can then download the actual file from.

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7937064
Share on other sites

Link to post
Share on other sites

I did something like this in python. It's much much easier and faster. I wrote a program that downloads every image from image hosting site. I did a test and downloaded ~34000 images that was first year when this site was made. I counted images and there are 584 000 images on that site till today. If i wrote this in C++ it would take me at least two weeks i guess, I made this in one day :)

Computer users fall into two groups:
those that do backups
those that have never had a hard drive fail.

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7956128
Share on other sites

Link to post
Share on other sites

On 6/21/2016 at 4:12 PM, Yamoto42 said:

Well..C++ is Turing complete...so I would have to say yes.

However, I would highly recommend using a different language for this task, with perl or python being recommended.  Just download the page to a buffer and run a regular expression on it.  It should look something like this.


<a[ a-zA-z0-9]*\s+href=\"([a-zA-Z0-9\\\/]+\.mp4)\"[ a-zA-z0-9]*>

That's probably not entirely correct, as it is off the top of my head, but both perl and python are largely focused around text processing, and as a consequence have very nice and easy to use regular expression features.  It will capture the part in the parenthesis, which will be the link you can then download the actual file from.

C++ supports regex too.

i5 4670k @ 4.2GHz (Coolermaster Hyper 212 Evo); ASrock Z87 EXTREME4; 8GB Kingston HyperX Beast DDR3 RAM @ 2133MHz; Asus DirectCU GTX 560; Super Flower Golden King 550 Platinum PSU;1TB Seagate Barracuda;Corsair 200r case. 

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7956914
Share on other sites

Link to post
Share on other sites

On 6/21/2016 at 8:12 AM, Yamoto42 said:

 

Well..C++ is Turing complete...so I would have to say yes.

 

That is the best answer I have ever heard. 

Here's a good first foray into the subject of webscraping. It's very basic, but it gives you a good idea of what will be going on.

ENCRYPTION IS NOT A CRIME

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7958211
Share on other sites

Link to post
Share on other sites

On 6/19/2016 at 9:53 AM, mattonfire said:

I am trying to develop a application that you enter a URL and it will scan the site for media and download it. I got this idea off IDM when I would enter a website and it would download the media instantly. (I didn't even mean it to). Do you think this would be possible?

So you're wanting to "crawl" a web page, then? There are already command-line tools that can do this. wget has command-line options available that will crawl a webpage. I think cURL does as well, but I'm not entirely sure as I've never used it for that.

Wife's build: Amethyst - Ryzen 9 3900X, 32GB G.Skill Ripjaws V DDR4-3200, ASUS Prime X570-P, EVGA RTX 3080 FTW3 12GB, Corsair Obsidian 750D, Corsair RM1000 (yellow label)

My build: Mira - Ryzen 7 3700X, 32GB EVGA DDR4-3200, ASUS Prime X470-PRO, EVGA RTX 3070 XC3, beQuiet Dark Base 900, EVGA 1000 G6

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7974527
Share on other sites

Link to post
Share on other sites

On 25/06/2016 at 7:03 AM, straight_stewie said:

That is the best answer I have ever heard. 

Here's a good first foray into the subject of webscraping. It's very basic, but it gives you a good idea of what will be going on.

Hiya, this is all on C#.

Link to comment
https://linustechtips.com/topic/613441-would-this-be-possible-c/#findComment-7986630
Share on other sites

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×