Learning Python Webscraping for Noob

Locojob · March 18, 2025

Good day all, I'm building a mobile app with Python for use here in NZ, the core function of said app requires regular scraping of a few different local websites and I'm looking for solid recommendations of where to start.

I have some very minimal experience with JS scripting, though most of that is from almost a decade ago, I mention this more to suggest ability to learn than imagining that I'm some sort of whiz.

I have looked into the legal side of things regarding scraping and these specific websites, looked over the ToS & the robot.txt on all and there's not hurdles there.

So...any helpful thoughts? Thanks in advance!

apoyusiken · March 18, 2025

what kind of thoughts are you looking for

Locojob · March 19, 2025

Well, my lack of familiarity makes it kind of difficult to know even where to start.

I search on YouTube and there's probably hundreds of "Best Guides" or "Only Guide You'll Ever Need" or "Learn to Scrap like a 1337 H@X0R" videos ranging from 15 minutes to hours long.

Google isn't much better, not to mention all the so called courses that offer to teach you to be a pro for varying fees.

I could attempt to have an LLM code the various modules, then have another LLM or two check the code...but I'd like to have a bit more understanding than that. I guess I'm looking for a decent voice of experience or two to suggest a useful direction to get me started, perhaps even, dare I say, a mentor.

riklaunim · March 23, 2025

Python and mobile?

For web scraping you can use "requests" and "beautifulsoup" to extract data from HTML. If the page has to be rendered (JS apps) then you will have to use selenium.

AnAverageName · March 29, 2025

Built lots of stuff like this. A few thoughts:

@riklaunim gave a great overview of the Python landscape of web scraping tools. To add to that: if you’re building for a mobile app, it’s common to set up an API using Flask or FastAPI that either runs your scraper directly or exposes cached scraping results. These backends often use libraries that control headless browsers—browsers that operate without a visible UI—to fetch and parse data as needed.

I'll also add that especially in some enterprise environments, it’s increasingly common to see web scraping handled through server-side code in frameworks like Next.js. There are several libraries in that ecosystem (like Puppeteer, Playwright, or Cheerio) that make it straightforward to scrape content during server-side rendering or API route execution, integrating cleanly with full-stack TypeScript workflows.

A lot of that has to do with how Selenium is really more built for automated browser testing than web scraping at scale, but not sure if that's really a concern for you yet. There's a lot to be said for how fast you can write some useful code in Selenium, but even more to be said on the hurdles you face when scaling it. But I digress!

Feel free to ping me if you have any questions. LLMs are great at writing this code if you're very specific with the prompts (e.g. go to this xpath, find_all of these elements, click on that, etc.)