Skip to content

Commit 65200a8

Browse files
first commit
1 parent bfc070e commit 65200a8

File tree

1 file changed

+118
-0
lines changed

1 file changed

+118
-0
lines changed

docs/src/rosalind/09-subs.md

Lines changed: 118 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,118 @@
1+
# Finding a Motif in DNA
2+
3+
🤔 [Problem link](https://rosalind.info/problems/subs/)
4+
5+
!!! warning "The Problem"
6+
7+
Given two strings s and t,
8+
t is a substring of s if t is contained as a contiguous collection of symbols in s
9+
(as a result, t must be no longer than s).
10+
11+
The position of a symbol in a string is the total number of symbols found to its left, including itself.
12+
(e.g., the positions of all occurrences of 'U' in "AUGCUUCAGAAAGGUCUUACG" are 2, 5, 6, 15, 17, and 18).
13+
The symbol at position i of s is denoted by s[i].
14+
15+
A substring of s can be represented as s[j:k],
16+
where j and k represent the starting and ending positions of the substring in s;
17+
for example, if s= "AUGCUUCAGAAAGGUCUUACG",
18+
then s[2:5]= "UGCU".
19+
20+
The location of a substring s[j:k]is its beginning position j;
21+
note that t will have multiple locations in s
22+
if it occurs more than once as a substring of s.
23+
(see the Sample below).
24+
25+
Given:
26+
Two DNA strings s and t.
27+
(each of length at most 1 kbp).
28+
29+
Return:
30+
All locations of t as a substring of s.
31+
32+
Sample Dataset
33+
`GATATATGCATATACTTATAT`
34+
35+
Sample Output
36+
`2 4 10`
37+
38+
### Handwritten solution
39+
The clunkiest solution uses a for-loop.
40+
We can loop over every character within the input string and
41+
check if we can find the substring in the subsequent characters.
42+
43+
44+
```julia
45+
46+
dataset = "GATATATGCATATACTTATAT"
47+
search_string = "ATAT"
48+
49+
function haystack(substring, string)
50+
# check if the strings are empty
51+
if isempty(substring) || isempty(string)
52+
throw(ErrorException("empty sequences"))
53+
end
54+
55+
# check that string exists in data
56+
if ! occursin(substring, string)
57+
return []
58+
end
59+
60+
output = []
61+
62+
for i in eachindex(string)
63+
# check if first letter of string matches character at the index
64+
if string[i] == substring[1]
65+
# check if full
66+
if i + length(substring) - 1 <= length(string) && string[i:i+length(substring)-1] == substring
67+
push!(output, i)
68+
end
69+
end
70+
end
71+
return output
72+
end
73+
74+
haystack(search_string, dataset)
75+
```
76+
We can also use the [`findnext`](https://docs.julialang.org/en/v1/base/strings/#Base.findnext) function in Julia so that we don't have to loop through every character in the string.
77+
78+
```julia
79+
function haystack_findnext(substring, string)
80+
# check if the strings are empty
81+
if isempty(substring) || isempty(string)
82+
throw(ErrorException("empty sequences"))
83+
end
84+
85+
# check that string exists in data
86+
if ! occursin(substring, string)
87+
return []
88+
end
89+
90+
output = []
91+
i = 1
92+
# while index is less than the length of string
93+
while i < length(string)
94+
result = findnext(substring, string, i)
95+
if result == nothing
96+
break
97+
end
98+
99+
if result != nothing
100+
push!(output, first(result))
101+
i = first(result) + 1
102+
end
103+
end
104+
return output
105+
end
106+
107+
108+
haystack_findnext(search_string, dataset)
109+
```
110+
111+
### Biojulia solution
112+
113+
Lastly, we can leverage some functions in the Kmers Biojulia package to help us!
114+
115+
```julia
116+
117+
118+
```

0 commit comments

Comments
 (0)