It recently hit me that I often repeat the same tasks when dealing with websites in my projects. Visiting a website, with or without a proxy is often the same task, and I repeatedly have to create the same classes.
Therefore I decided to write a little framework or module to make the task easier. I just started out, but it's getting along quite fast. I will keep you updated on the project I just named "JScrape - The Java WebScraping Framework".
The scraper comes with a few build-in parsers. At this moment:
- images on a webpage
- links on a webpage
- emailaddresses on a webpage
- Proxies in the form om IP:PORT on a webpage
class diagram:
