Regular Expressions - regexλ︎
Regular expressions are a powerful and compact way to find specific patterns in text strings. Clojure provides a simple syntax for Java regex patterns.
#"pattern" is the literal representation of a regular expressions in Clojure, where
pattern is the regular expression.
Create regular expression pattern
(re-pattern pattern) will return the Clojure literal representation of a given regex pattern.
A string can become a regular expression pattern, e.g.
":" becomes the regex pattern
The regular expression syntax cheatsheet by Mozilla is an excellent reference for regular expression patterns.
Regular expressions overviewλ︎
Regular expressions in Clojure
Find the most common word in a book using regular expressions
Double escaping not required
The Clojure syntax means you do not need to double escape special characters, eg.
\\, and keeps the patterns clean and simple to read. In other languages, backslashes intended for consumption by the regex compiler must be doubled.
The rules for embedding unusual literal characters or predefined character classes are listed in the Javadoc for Pattern.
Host platform supportλ︎
Clojure runs on the Java Virtual Machine and uses Java regular expressions.
Regular expressions in Clojure create a java.util.regex.Pattern type
Regular expression option flags can make a pattern case-insensitive or enable multiline mode. Clojure's regex literals starting with (?
#"(?i)yo" matches the strings
Flags that can be used in Clojure regular-expression patterns, along with their long name and a description of what they do. See Java's documentation for the java.util.regex.Pattern class for more details.
|d||UNIX_LINES||., ^, and $ match only the Unix line terminator '\n'.|
|i||CASE_INSENSITIVE||ASCII characters are matched without regard to uppercase or lower-case.|
|x||COMMENTS||Whitespace and comments in the pattern are ignored.|
|m||MULTILINE||^ and $ match near line terminators instead of only at the beginning or end of the entire input string.|
|s||DOTALL||. matches any character including the line terminator.|
|u||UNICODE_CASE||Causes the i flag to use Unicode case insensitivity instead of ASCII.|
The re-seq function is Clojure's regex workhorse. It returns a lazy seq of all matches in a string, which means it can be used to efficiently test whether a string matches or to find all matches in a string or a mapped file:
The preceding regular expression has no capturing groups, so each match in the returned seq is a string. A capturing group (subsegments that are accessible via the returned match object) in the regex causes each returned item to be a vector: