Need help with a program

Cryosec · March 3, 2016

I'm coding a program that will extract the text from a specific HTML tag in webpages loaded by url from a file (around a hundred links). The problem is, I have no idea how to go throu the HTML source of each webpage, read the text from the specific (and only one in the page) tag and eventually delete unwanted text (I need numbers, but there could be unnecessary text with it).

What would be the best language to use? And how can I do it?

glitchmaster0001 · March 3, 2016

maybe try php5

madknight3 · March 3, 2016

7 minutes ago, Cryosec said:

What would be the best language to use?

What language(s) do you know already? Chances are you won't need to learn a new language just for this task.

8 minutes ago, Cryosec said:

And how can I do it?

Look into web scraping.

Cryosec · March 3, 2016

Just now, madknight3 said:

What language(s) do you know already?

C++, Python and Java. I'm still learning other languages, so just these for the moment.

Cr3at1v3 · March 3, 2016

In Java you could use something like Jsoup. It can fetch the page from URL and parse it for you. Syntax is very easy, you just have to give it a CSS selector to get some HTML element.

Cryosec · March 3, 2016

Just now, Cr3at1v3 said:

In Java you could use something like Jsoup. It can fetch the page from URL and parse it for you. Syntax is very easy, you just have to give it a CSS selector to get some HTML element.

I'll try this out, thanks

madknight3 · March 4, 2016

5 hours ago, Cryosec said:

C++, Python and Java. I'm still learning other languages, so just these for the moment.

Python should be pretty simple. You have plenty of options to choose from to help you out.

Gachr · March 4, 2016

fizzlesticks · March 4, 2016

17 minutes ago, Gachr said:

Unless the website is so malformed a real parser won't work or you're trying to parse inline JS, using regex on html is a pretty bad idea.

Using an html / xml parser like the ones @madknight3 listed will be much easier, less prone to breaking on small changes to the website and be faster.

Sign In

Need help with a program

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Link to comment

Share on other sites

Link to post

Share on other sites

Create an account or sign in to comment

Create an account

Sign in

Featured Topics

Topics

Latest From Linus Tech Tips:

I’m Selling This PC for $2… I Hope It Sucks

Latest From Tech Quickie:

Yes, It’s Real: PCI Express x32

Latest From TechLinked:

M4 Already!?

Latest From GameLinked:

Nintendo Spilled the Beans

Latest From ShortCircuit:

An ignorant rabbit r1 unboxing

Latest From Mac Address:

Why did you buy an Apple Vision Pro?

Latest From Channel Super Fun:

I Swapped the CEO's Assistant For a Day!