Skip to content

Web Scraping unstructured dataλ︎

Complexity of JavaScript websitesλ︎

  • Sparkledriver

Toolsλ︎

  • Enlive
  • Scoopi - a tool to extract and transform data from web pages
  • demeter - fast, concurrent web scraper with headless JavaScript execution
  • web-scraper - library with fairly good JavaScript support
  • Web Page Summarizer - gui for getting web pages and summarizing them. Demonstrates enlive and compojure
  • Scraper - JavaFX web engine and WebKit
  • Abrade - scraping web sites, even ones that heavily rely on Javascript. The Java HtmlUnit library is used under the hood
  • Etaoin - Clojure implementation of webdriver protocol

Example projectsλ︎

Referencesλ︎

Hint::Be Respectful of data sourcesλ︎

Avoid high number of requests to websites with unstructured data, they are unlikely to have much capacity to serve requests. Consider downloading the content locally to minimise the requests to the website.