Skip to content

Commit d62948c

Browse files
outline of hamming distance tutorial
1 parent f925655 commit d62948c

1 file changed

Lines changed: 92 additions & 0 deletions

File tree

docs/src/rosalind/06-hamming.md

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
# Counting Point Mutations
2+
3+
!!! warning "The Problem"
4+
Problem
5+
6+
Given two strings s and t of equal length, the Hamming distance between s
7+
and t, denoted dH(s,t), is the number of corresponding symbols that differ in s and t.
8+
9+
Given: Two DNA strings s and t of equal length (not exceeding 1 kbp).
10+
11+
Return: The Hamming distance dH(s,t).
12+
13+
***Sample Dataset***
14+
15+
```
16+
GAGCCTACTAACGGGAT
17+
CATCGTAATGACGGCCT
18+
```
19+
20+
***Sample Output***
21+
```
22+
7
23+
```
24+
25+
26+
To calculate the Hamming Distance between two strings/sequences, the two strings/DNA sequences must be the same length. Therefore, we can calculate the Hamming Distance by looping over one of the strings and checking if the corresponding character in the other string matches. Each mismatch will cause 1 to be added to a `counter` variable. At the end of the loop, we can return the total value of the `counter` variable.
27+
28+
Let's give this a try!
29+
30+
```julia
31+
SampleSeqA = "GAGCCTACTAACGGGAT"
32+
SampleSeqB = "CATCGTAATGACGGCCT"
33+
34+
function calcHamming(SeqA, SeqB)
35+
SeqLength = length(SeqA)
36+
37+
# check if the strings are empty
38+
if SeqLength == 0
39+
return 0
40+
end
41+
42+
mismatches = 0
43+
for i in 1:SeqLength
44+
# print(i)
45+
if SeqA[i] != SeqB[i]
46+
mismatches += 1
47+
end
48+
end
49+
return mismatches
50+
end
51+
52+
calcHamming(SampleSeqA, SampleSeqB)
53+
54+
```
55+
56+
57+
58+
## BioAlignments method
59+
60+
Instead of writing your own function, an alternative would be to use the readily-available Hamming Distance [function](https://github.com/BioJulia/BioAlignments.jl/blob/0f3cc5e1ac8b34fdde23cb3dca7afb9eb480322f/src/pairwise/algorithms/hamming_distance.jl#L4) in the `BioAlignments.jl` package.
61+
62+
```julia
63+
using BioAlignments
64+
65+
seqA = "GAGCCTACTAACGGGAT"
66+
seqB = "CATCGTAATGACGGCCT"
67+
68+
BioAlignmentsHamming = BioAlignments.hamming_distance(Int64, "GAGCCTACTAACGGGAT", "CATCGTAATGACGGCCT")
69+
70+
BioAlignmentsHamming[1]
71+
72+
```
73+
74+
```julia
75+
# Double check that we got the same values from both ouputs
76+
@assert calcHamming(SampleSeqA, SampleSeqB) == BioAlignmentsHamming[1]
77+
```
78+
79+
80+
The BioAlignments `hamming_distance` function requires three input variables -- the first of which allows the user to control the `type` of the returned hamming distance value. In the above example, `Int64` is provided as the input variable, but `Float64` or `UInt8` are also acceptable inputs.
81+
82+
The second two input variables are the two sequences that are being compared.
83+
84+
There are two outputs of this function: the actual Hamming Distance value and the Alignment Anchor. The Alignment Anchor is a a one-dimensional array (vector) that is the same length as the length of the input strings. Each value in the vector is a also an AlignmentAnchor with three fields: sequence position, reference position, and an operation code ('0' for start, '=' for match, 'X' for mismatch).
85+
86+
The Alignment Anchor for the above example is
87+
```
88+
AlignmentAnchor[AlignmentAnchor(0, 0, '0'), AlignmentAnchor(1, 1, 'X'), AlignmentAnchor(2, 2, '='), AlignmentAnchor(3, 3, 'X'), AlignmentAnchor(4, 4, '='), AlignmentAnchor(5, 5, 'X'), AlignmentAnchor(7, 7, '='), AlignmentAnchor(8, 8, 'X'), AlignmentAnchor(9, 9, '='), AlignmentAnchor(10, 10, 'X'), AlignmentAnchor(14, 14, '='), AlignmentAnchor(16, 16, 'X'), AlignmentAnchor(17, 17, '=')]
89+
90+
```
91+
92+

0 commit comments

Comments
 (0)