|
| 1 | +# Counting Point Mutations |
| 2 | + |
| 3 | +🤔 [Problem link](https://rosalind.info/problems/hamm/) |
| 4 | + |
| 5 | +!!! warning "The Problem" |
| 6 | + |
| 7 | + Given two strings s and t of equal length, the Hamming distance between s and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t. |
| 8 | + |
| 9 | + |
| 10 | + Given: Two DNA strings s and t of equal length (not exceeding 1 kbp). |
| 11 | + |
| 12 | + Return: The Hamming distance dH(s,t). |
| 13 | + |
| 14 | + ***Sample Dataset*** |
| 15 | + |
| 16 | + ``` |
| 17 | + GAGCCTACTAACGGGAT |
| 18 | + CATCGTAATGACGGCCT |
| 19 | + ``` |
| 20 | + |
| 21 | + ***Sample Output*** |
| 22 | + |
| 23 | + ``` |
| 24 | + 7 |
| 25 | + ``` |
| 26 | + |
| 27 | + |
| 28 | +To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. |
| 29 | + |
| 30 | + We can calculate the Hamming Distance by looping over the characters in one of the strings and checking if the corresponding character at the same index in the other string matches. |
| 31 | + |
| 32 | + Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable. |
| 33 | + |
| 34 | +Let's give this a try! |
| 35 | + |
| 36 | +```julia |
| 37 | +ex_seq_a = "GAGCCTACTAACGGGAT" |
| 38 | +ex_seq_b = "CATCGTAATGACGGCCT" |
| 39 | + |
| 40 | +function hamming(seq_a, seq_b) |
| 41 | + |
| 42 | + |
| 43 | + # check if the strings are empty |
| 44 | + if isempty(seq_a) |
| 45 | + throw(ErrorException("empty sequences")) |
| 46 | + end |
| 47 | + |
| 48 | + # check if the strings are different lengths |
| 49 | + if length(seq_a) != length(seq_b) |
| 50 | + throw(ErrorException(" sequences have different lengths")) |
| 51 | + end |
| 52 | + |
| 53 | + mismatches = 0 |
| 54 | + for i in 1:length(seq_a) |
| 55 | + if seq_a[i] != seq_b[i] |
| 56 | + mismatches += 1 |
| 57 | + end |
| 58 | + end |
| 59 | + return mismatches |
| 60 | +end |
| 61 | + |
| 62 | +hamming(ex_seq_a, ex_seq_b) |
| 63 | + |
| 64 | +``` |
| 65 | + |
| 66 | + |
| 67 | + |
| 68 | +## BioAlignments method |
| 69 | + |
| 70 | +Instead of writing your own function, an alternative would be to use the readily-available Hamming Distance [function](https://github.com/BioJulia/BioAlignments.jl/blob/0f3cc5e1ac8b34fdde23cb3dca7afb9eb480322f/src/pairwise/algorithms/hamming_distance.jl#L4) in the `BioAlignments.jl` package. |
| 71 | + |
| 72 | +```julia |
| 73 | +using BioAlignments |
| 74 | + |
| 75 | +ex_seq_a = "GAGCCTACTAACGGGAT" |
| 76 | +ex_seq_b = "CATCGTAATGACGGCCT" |
| 77 | + |
| 78 | +bio_hamming = BioAlignments.hamming_distance(Int64, ex_seq_a, ex_seq_b) |
| 79 | + |
| 80 | +bio_hamming[1] |
| 81 | + |
| 82 | +``` |
| 83 | + |
| 84 | +```julia |
| 85 | +# Double check that we got the same values from both ouputs |
| 86 | +@assert calcHamming(ex_seq_a, ex_seq_b) == bio_hamming[1] |
| 87 | +``` |
| 88 | + |
| 89 | + |
| 90 | + The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. |
| 91 | + |
| 92 | + In the above example, `Int64` is provided as the first input variable, but `Float64` or `Int8` are also acceptable inputs. The second two input variables are the two sequences that are being compared. |
| 93 | + |
| 94 | + There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a one-dimensional array (vector) that is the same length as the length of the input strings. |
| 95 | + |
| 96 | + Each value in the vector is also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch). |
| 97 | + |
| 98 | + The Alignment Anchor for the above example is: |
| 99 | + ``` |
| 100 | + AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')] |
| 101 | +
|
| 102 | + ``` |
| 103 | + |
| 104 | + |
0 commit comments