Example: Hitchhikers Guideλ︎
This is an example of using the threading macros and a REPL to give fast feedback as you are developing code.
NOTEλ︎
Write functions that will give a list of the most used words used in a book, excluding the common English words like "the, and, it, I". Join those functions with a threading macro.
Suggest you use the assumed perfectly legal copy of the Hitch-hickers book text using the slurp function
Approximate algorithm
- Use a regular expression to create a collection of individual words - eg. #"[a-zA-Z0-9|']+"
- Convert all the words to lower case so they match with common words source -
clojure.string/lower-case Removethe common English words used in the book, leaving more context specific words- Calculate the
frequenciesof the remaining words, returning a map of word & word count pairs Sort-byword count values in the mapReversethe collection so the most commonly used word is the first element in the map
(def book (slurp "http://clearwhitelight.org/hitch/hhgttg.txt"))
(def common-english-words
(-> (slurp "https://www.textfixer.com/tutorials/common-english-words.txt")
(clojure.string/split #",")
set))
;; using a function to pull in any book
(defn get-book [book-url]
(slurp book-url))
(defn -main [book-url]
(->> (get-book book-url)
(re-seq #"[a-zA-Z0-9|']+")
(map #(clojure.string/lower-case %))
(remove common-english-words)
frequencies
(sort-by val)
reverse))
;; Call the program
(-main "http://clearwhitelight.org/hitch/hhgttg.txt")
Deconstructing the code in the replλ︎
To understand what each of the functions do in the -main function then you can simply comment out one or more expressions using in front of the expression #_
(defn -main [book-url]
(->> (get-book book-url)
#_(re-seq #"[a-zA-Z0-9|']+")
#_(map #(clojure.string/lower-case %))
#_(remove common-english-words)
#_frequencies
#_(sort-by val)
#_reverse))
Now the -main function will only return the result of the (get-book book-url) function. To see what each of the other lines do, simply remove the #_ character from the front of an expression and re-evaluate the -main function in the repl
Hint In Spacemacs / Emacs, the keybinding C-c C-p show the output in a separate buffer. Very useful when the function returns a large results set.
Off-line sources of Hitch-hickers book and common English wordsλ︎
(def book (slurp "./hhgttg.txt"))
(def common-english-words
(-> (slurp "common-english-words.txt")
(clojure.string/split #",")
set))
Original concept from Misophistful: Understanding thread macros in clojure
Hint The
slurpfunction holds the contents of the whole file in memory, so it may not be appropriate for very large files. If you are dealing with a large file, consider wrapping slurp in a lazy evaluation or use Java IO (eg.java.io.BufferedReader,java.io.FileReader.). See the Clojure I/O cookbook and The Ins & Outs of Clojure for examples.