If you know Javascript/NodeJs this could be a fun project, you could setup a simple command line program to do this. Just thinking on the fly here: most new websites have a search bar, take note of how they search(most you just post a simple query) to each of these and collect the first X number of articles and dump each article into a folder. Boom, research machine!
You really wouldn't need to use any external libraries, most languages have some form of DOM parser/manipulator(thing used to manipulate web pages), and HTTP library for making requests.
It will take some work and some fine tuning for sure, but ultimately you could customize it to include thousands of sites and make your own academic search engine.