Web Scraping unstructured dataλ︎

Complexity of JavaScript websitesλ︎

Sparkledriver

Toolsλ︎

Enlive
Scoopi - a tool to extract and transform data from web pages
demeter - fast, concurrent web scraper with headless JavaScript execution
web-scraper - library with fairly good JavaScript support
Web Page Summarizer - gui for getting web pages and summarizing them. Demonstrates enlive and compojure
Scraper - JavaFX web engine and WebKit
Abrade - scraping web sites, even ones that heavily rely on Javascript. The Java HtmlUnit library is used under the hood
Etaoin - Clojure implementation of webdriver protocol

Example projectsλ︎

parkrun-app - enlive
clj-scraper - enlive, http-kit, core.async
ldnpyvideo - Scraper (from pyvideo.org) and web site for London PyCon video meetup
nba-scraper - scraping NBA boxscore data from ESPN
Clojure web scraping with Enlive

Referencesλ︎

Hint::Be Respectful of data sourcesλ︎

Avoid high number of requests to websites with unstructured data, they are unlikely to have much capacity to serve requests. Consider downloading the content locally to minimise the requests to the website.