Skip to content

Commit a0857a2

Browse files
add findall function
1 parent 52cc16c commit a0857a2

File tree

1 file changed

+17
-35
lines changed

1 file changed

+17
-35
lines changed

docs/src/rosalind/09-subs.md

Lines changed: 17 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ Let's start off with the most verbose solution.
4040
We can loop over every character within the input string and
4141
check if we can find the substring in the subsequent characters.
4242

43-
In the first solution,
43+
In other words,
4444
we will check each index for an exact match to the substring we are searching for.
4545

4646
```julia
@@ -74,20 +74,20 @@ end
7474

7575
haystack(search_string, dataset)
7676
```
77-
We can also use the [`findnext`](https://docs.julialang.org/en/v1/base/strings/#Base.findnext) function in Julia.
7877

79-
There are similar `findfirst` and `findlast` functions,
80-
but since we want to find all matches,
81-
we will use `findnext`.
78+
### Biojulia solution
79+
80+
The BioSequences package has a helpful function [`findall`](https://github.com/BioJulia/BioSequences.jl/blob/b626dbcaad76217b248449e6aa2cc1650e95660c/src/BioSequences.jl#L261-L316),
81+
which returns the indices of all exact string matches.
8282

83-
Currently, there isn't a `findall` function that allows us to avoid a loop.
84-
We'll still also loop over every character in the string,
85-
as there could be overlapping substrings.
83+
It isn't included in the documentation about exact string search [here](https://biojulia.dev/BioSequences.jl/v2.0/sequence_search/#Exact-search-1),
84+
but the function exists!
8685

86+
BioSequences has other helpful exact string search functions like `findfirst`, `firstnext`, and `findlast`.
8787

8888

8989
```julia
90-
function haystack_findnext(substring, string)
90+
function haystack_findall(substring, string)
9191
# check if the strings are empty
9292
if isempty(substring) || isempty(string)
9393
throw(ErrorException("empty sequences"))
@@ -98,29 +98,19 @@ function haystack_findnext(substring, string)
9898
return []
9999
end
100100

101-
output = []
102-
i = 1
103-
# while index is less than the length of string
104-
while i < length(string)
105-
result = findnext(substring, string, i)
106-
if result == nothing
107-
break
108-
end
109-
110-
if result != nothing
111-
push!(output, first(result))
112-
i = first(result) + 1
113-
end
114-
end
115-
return output
101+
matches = findall(ExactSearchQuery(dna"$substring"),dna"$string")
102+
return first.(matches)
116103
end
117104

118105

119-
haystack_findnext(search_string, dataset)
106+
haystack_findall(search_string, dataset)
120107
```
108+
### Regex solution
121109

122-
Lastly, we can also use Regex's search function,
123-
which produces quite the elegant solution!
110+
Lastly, we can also use Regex's search function.
111+
Here the "pattern" we are searching for is the exact string.
112+
This is the a great solution if we wanted to look for patterns of more complicated strings,
113+
but it works for exact matches as well!
124114

125115

126116
```julia
@@ -138,13 +128,5 @@ end
138128
haystack_findnext(search_string, dataset)
139129
```
140130

141-
### Biojulia solution
142-
143-
Lastly, we can leverage some functions in the Kmers Biojulia package to help us!
144-
145-
```julia
146-
147-
148-
```
149131

150132

0 commit comments

Comments
 (0)