Skip to content

Commit 52cc16c

Browse files
add regex solution
1 parent 65200a8 commit 52cc16c

File tree

1 file changed

+38
-6
lines changed

1 file changed

+38
-6
lines changed

docs/src/rosalind/09-subs.md

Lines changed: 38 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,13 +36,14 @@
3636
`2 4 10`
3737

3838
### Handwritten solution
39-
The clunkiest solution uses a for-loop.
39+
Let's start off with the most verbose solution.
4040
We can loop over every character within the input string and
4141
check if we can find the substring in the subsequent characters.
4242

43+
In the first solution,
44+
we will check each index for an exact match to the substring we are searching for.
4345

4446
```julia
45-
4647
dataset = "GATATATGCATATACTTATAT"
4748
search_string = "ATAT"
4849

@@ -58,11 +59,11 @@ function haystack(substring, string)
5859
end
5960

6061
output = []
61-
6262
for i in eachindex(string)
6363
# check if first letter of string matches character at the index
6464
if string[i] == substring[1]
65-
# check if full
65+
# check if full substring matches at index
66+
# make sure not to search index past string
6667
if i + length(substring) - 1 <= length(string) && string[i:i+length(substring)-1] == substring
6768
push!(output, i)
6869
end
@@ -73,7 +74,17 @@ end
7374

7475
haystack(search_string, dataset)
7576
```
76-
We can also use the [`findnext`](https://docs.julialang.org/en/v1/base/strings/#Base.findnext) function in Julia so that we don't have to loop through every character in the string.
77+
We can also use the [`findnext`](https://docs.julialang.org/en/v1/base/strings/#Base.findnext) function in Julia.
78+
79+
There are similar `findfirst` and `findlast` functions,
80+
but since we want to find all matches,
81+
we will use `findnext`.
82+
83+
Currently, there isn't a `findall` function that allows us to avoid a loop.
84+
We'll still also loop over every character in the string,
85+
as there could be overlapping substrings.
86+
87+
7788

7889
```julia
7990
function haystack_findnext(substring, string)
@@ -105,6 +116,25 @@ function haystack_findnext(substring, string)
105116
end
106117

107118

119+
haystack_findnext(search_string, dataset)
120+
```
121+
122+
Lastly, we can also use Regex's search function,
123+
which produces quite the elegant solution!
124+
125+
126+
```julia
127+
function haystack_regex(substring, string)
128+
if isempty(substring) || isempty(string)
129+
throw(ErrorException("emptysequences"))
130+
end
131+
if !occursin(substring, string)
132+
return[]
133+
end
134+
135+
return [m.offset for m in eachmatch(Regex(substring), string, overlap=true) ]
136+
end
137+
108138
haystack_findnext(search_string, dataset)
109139
```
110140

@@ -115,4 +145,6 @@ Lastly, we can leverage some functions in the Kmers Biojulia package to help us!
115145
```julia
116146

117147

118-
```
148+
```
149+
150+

0 commit comments

Comments
 (0)