Nucleotide Countλ︎
Clojure Track: Nucleotide Count
Given a string representing a DNA sequence, count how many of each nucleotide is present.
If the string contains characters other than A, C, G, or T then an error should be throw.
Represent a DNA sequence as an ordered collection of nucleotides, e.g. a string of characters such as "ATTACG".
DNA Nucleotide names
A
is Adenine, C
is Cytosine, G
is Guanine and T
is Thymine
Code for this solution on GitHub
practicalli/exercism-clojure-guides contains the design journal and solution to this exercise and many others.
Create the projectλ︎
Download the Nucleotide Count exercise using the exercism CLI tool
Use the REPL workflow to explore solutions locally
Open the project in a Clojure aware editor and start a REPL, using a rich comment form to experiment with code to solve the challenge.
Starting pointλ︎
Unit test code calls functions from the src
tree which must exist with the correct argument signature for the unit test code to compile successfully.
Reviewing each assertion in the unit test code identifies the function definitions required.
Exercism Unit Tests
(ns nucleotide-count-test
(:require [clojure.test :refer [deftest is]]
nucleotide-count))
(deftest empty-dna-strand-has-no-adenosine
(is (= 0 (nucleotide-count/count-of-nucleotide-in-strand \A, ""))))
(deftest empty-dna-strand-has-no-nucleotides
(is (= {\A 0, \T 0, \C 0, \G 0}
(nucleotide-count/nucleotide-counts ""))))
(deftest repetitive-cytidine-gets-counted
(is (= 5 (nucleotide-count/count-of-nucleotide-in-strand \C "CCCCC"))))
(deftest repetitive-sequence-has-only-guanosine
(is (= {\A 0, \T 0, \C 0, \G 8}
(nucleotide-count/nucleotide-counts "GGGGGGGG"))))
(deftest counts-only-thymidine
(is (= 1 (nucleotide-count/count-of-nucleotide-in-strand \T "GGGGGTAACCCGG"))))
(deftest validates-nucleotides
(is (thrown? Throwable (nucleotide-count/count-of-nucleotide-in-strand \X "GACT"))))
(deftest counts-all-nucleotides
(let [s "AGCTTTTCATTCTGACTGCAACGGGCAATATGTCTCTGTGTGGATTAAAAAAAGAGTGTCTGATAGCAGC"]
(is (= {\A 20, \T 21, \G 17, \C 12}
(nucleotide-count/nucleotide-counts s)))))
Function definitions required to compile unit test code
Making the tests passλ︎
Select one assertion from the unit tests and write code to make the test pass.
Experiment with solutions in the comment
form and add the chosen approach to the respective function definition.
Counting nucleotidesλ︎
Use test data from the unit test code, e.g. "GGGGGTAACCCGG"
How often does a nucleotide appear
Add the result to get the total count
Is there a more elegant way?
When only the matching nucleotide is in the strand, then all the elements of the strand can be counted.
filter
the DNA strand with a predicate function (returns true/false) that returns only the matching nucleotide.
;; Count the elements in the returned sequence for the total
Add this code into the starting function
Run unit testsλ︎
Run the unit tests to see if they pass. x should pass, x should fail.
Nucleotide occurancesλ︎
Count the occurances
"GGGGGTAACCCGG"
Define the data
Exception handling required
Or use a predicate with some (some element? in the sequence)
(defn count-of-nucleotide-in-strand
[nucleotide strand]
(if (some #(= nucleotide %) valid-nucleotides)
(count
(filter #(= nucleotide %)
strand))
(throw (Throwable.))))
(count-of-nucleotide-in-strand \T "GGGGGTAACCCGG")
Design the second function
How often does a nucleotide appear
Add the result to get the total count
Is there a more elegant way?
Count the elements in the returned sequence for the total
Design the second function
How often does a nucleotide appear
NOTE: zero must be returned when there are no appearences
Return value always in the form
Hammock time...λ︎
- How often does something appear,
- how frequenct is it?
- Is there a clojure standard library for that (approx 700 functions), review https://clojure-docs.org/
If there are missing nucleotides then there is no answer
What if there is a starting point
;; Then merge the result of frequencies
Update the function definition and run tests
Solutionsλ︎
There are many ways to solve a challenge and there is value trying different approaches to help learn more about the Clojure language.
The following solution includes filter
and frequencies
functions which are commonly used functions from the Clojure standard library.
Example Solution
(ns nucleotide-count)
(def valid-nucleotides
"Characters representing valid nucleotides"
[\A \C \G \T])
(defn count-of-nucleotide-in-strand
[nucleotide strand]
(if (some #(= nucleotide %) valid-nucleotides)
(count
(filter #(= nucleotide %)
strand))
(throw (Throwable.))))
(defn nucleotide-counts
"Count all nucleotide in a strand"
[strand]
(merge {\A 0 \C 0 \G 0 \T 0}
(frequencies "GGGGGAACCCGG")))