Skip to content

Commit 485bb3a

Browse files
adding thought process to solve problem
1 parent 9bbaaf5 commit 485bb3a

1 file changed

Lines changed: 91 additions & 1 deletion

File tree

docs/src/rosalind/08-prot.md

Lines changed: 91 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,4 +20,94 @@
2020
Sample Output
2121
```
2222
MAMAPRTEINSTRING
23-
```
23+
```
24+
25+
### DIY solution
26+
Let's first tackle this problem by writing our own solution.
27+
28+
First, we will check that this is a coding region by verifying that the string starts with a start codon (`AUG`). If not, we can still convert the string to protein, but we'll throw an error. There may be a frame shift, in which case the returned translation will be incorrect.
29+
30+
We'll also do a check that the string is divisible by three. If it is not, this will likely mean that there was a mutation in the string (addition or deletion). Again, we can still convert as much of the the string as possible. However, we should alert the user that this result may be incorrect!
31+
32+
We need to convert this string of DNA to a string of proteins using the RNA codon table. We can convert the RNA codon table into a dictionary, which can map over our codons.
33+
34+
Then, we'll break the string into codons by slicing at every three characters. These codons can be matched to the strings into the RNA codon table to get the corresponding amino acid. We'll append this amino acid to a string.
35+
36+
We'll need to deal with any three-character strings that don't match a codon. This likely means that there was a mutation in the input DNA string! If we get a codon that doesn't match, we can return "X" for that amino acid, and continue translating the rest of the string. However, if we get a string X's, that will definitely signal to us that there was some kind of frame shift.
37+
38+
Now that we have established an approach, let's turn this into code!
39+
40+
```julia
41+
42+
dna = "AUGGCCAUGGCGCCCAGAACUGAGAUCAAUAGUACCCGUAUUAACGGGUGA"
43+
44+
# note: this can be created by hand
45+
# or it can be accessed using
46+
codon_table = rna_codon_table = {
47+
# Phenylalanine (F)
48+
'UUU': 'F', 'UUC': 'F',
49+
# Leucine (L)
50+
'UUA': 'L', 'UUG': 'L', 'CUU': 'L', 'CUC': 'L', 'CUA': 'L', 'CUG': 'L',
51+
# Isoleucine (I)
52+
'AUU': 'I', 'AUC': 'I', 'AUA': 'I',
53+
# Methionine (M) - Start Codon
54+
'AUG': 'M',
55+
# Valine (V)
56+
'GUU': 'V', 'GUC': 'V', 'GUA': 'V', 'GUG': 'V',
57+
# Serine (S)
58+
'UCU': 'S', 'UCC': 'S', 'UCA': 'S', 'UCG': 'S', 'AGU': 'S', 'AGC': 'S',
59+
# Proline (P)
60+
'CCU': 'P', 'CCC': 'P', 'CCA': 'P', 'CCG': 'P',
61+
# Threonine (T)
62+
'ACU': 'T', 'ACC': 'T', 'ACA': 'T', 'ACG': 'T',
63+
# Alanine (A)
64+
'GCU': 'A', 'GCC': 'A', 'GCA': 'A', 'GCG': 'A',
65+
# Tyrosine (Y)
66+
'UAU': 'Y', 'UAC': 'Y',
67+
# Stop Codons (*)
68+
'UAA': '*', 'UAG': '*', 'UGA': '*',
69+
# Histidine (H)
70+
'CAU': 'H', 'CAC': 'H',
71+
# Glutamine (Q)
72+
'CAA': 'Q', 'CAG': 'Q',
73+
# Asparagine (N)
74+
'AAU': 'N', 'AAC': 'N',
75+
# Lysine (K)
76+
'AAA': 'K', 'AAG': 'K',
77+
# Aspartic Acid (D)
78+
'GAU': 'D', 'GAC': 'D',
79+
# Glutamic Acid (E)
80+
'GAA': 'E', 'GAG': 'E',
81+
# Cysteine (C)
82+
'UGU': 'C', 'UGC': 'C',
83+
# Tryptophan (W)
84+
'UGG': 'W',
85+
# Arginine (R)
86+
'CGU': 'R', 'CGC': 'R', 'CGA': 'R', 'CGG': 'R', 'AGA': 'R', 'AGG': 'R',
87+
# Glycine (G)
88+
'GGU': 'G', 'GGC': 'G', 'GGA': 'G', 'GGG': 'G'
89+
}
90+
91+
92+
# check if starts with start codon
93+
94+
# check if string is divisible by three
95+
96+
# separate string into codons, map over with codon table
97+
98+
# dealing with codons not in codon_table
99+
100+
# return amino acid string
101+
102+
```
103+
104+
105+
### Biojulia Solution
106+
107+
An alternative way to approach this problem would be to leverage an already written, established function from BioJulia.
108+
109+
```julia
110+
111+
112+
113+
```

0 commit comments

Comments
 (0)