You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cookbook/03-blast.md
+17-12Lines changed: 17 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,12 +6,16 @@ rss_descr = "Using NCBIBlast.jl to run BLAST searches"
6
6
+++
7
7
8
8
# Introduction to BLAST
9
-
A BLAST search allows you to query a sequence (either nucleotide or protein) against an entire database of sequences.
9
+
A BLAST search allows you to query a sequence (either nucleotide or protein) against an entire database of sequences.
10
+
It can be helpful for quickly compare unknown sequences to databases of established reference sequences for purposes such as species identity or assignment gene function.
10
11
11
12
More information about how to use BLAST can be found in its [manual](https://www.ncbi.nlm.nih.gov/books/NBK569856/).
12
13
13
-
BLAST's can be run from the BLAST web page [here](https://blast.ncbi.nlm.nih.gov/Blast.cgi).
14
-
A user can simply copy in a nucleotide sequence and search for a best match against all of the databases in NCBI!
14
+
BLAST searches can be run from the command line interface (CLI) or through BLAST web page [here](https://blast.ncbi.nlm.nih.gov/Blast.cgi).
15
+
A user can simply copy in a nucleotide sequence and search for the best match in NCBI!
16
+
While searching from the website is fast and straightforward,
17
+
it only searches against the NCBI databases.
18
+
The CLI allows users to query against both NCBI databases and custom databases.
15
19
16
20
`NCBIBlast.jl` is a thin wrapper around the BLAST command line tool,
17
21
allowing users to run the tool within Julia.
@@ -34,7 +38,7 @@ Note: [BioTools BLAST](https://biojulia.dev/BioTools.jl/stable/blast/) is a depr
34
38
35
39
The keywords used in the tool are sent to the shell for running BLAST.
36
40
37
-
As stated on the Github[docs](https://github.com/BioJulia/NCBIBlast.jl), the julia call
41
+
As stated on the GitHub[docs](https://github.com/BioJulia/NCBIBlast.jl), the Julia call
@@ -58,9 +62,9 @@ More directions on building a BLAST database locally can be found [here](https:/
58
62
59
63
## Example: Building a local BLAST database and running the BLAST search
60
64
61
-
For our first example, we will replicate the example on the NCBIBlast.jl Github.
65
+
For our first example, we will replicate the example on the `NCBIBlast.jl` Github.
62
66
63
-
First, we will build a local database using a fasta file found in the NCBIBlast github repository ([link here](https://github.com/BioJulia/NCBIBlast.jl/blob/main/test/example_files/dna2.fasta)).
67
+
First, we will build a local database using a FASTA file found in the NCBIBlast github repository ([link here](https://github.com/BioJulia/NCBIBlast.jl/blob/main/test/example_files/dna2.fasta)).
The command `seek(io,0)` moves the cursor of to the start of the captured object (index 0) so it can be read into a dataframe.
101
+
The command `seek(io,0)` moves the cursor to the start of the captured object (index 0) so it can be read into a dataframe.
98
102
99
103
100
104
```
@@ -113,8 +117,8 @@ This output tells us that the query sequence (`Query_1` is the default name sinc
113
117
There is 100% identity on a region that is 38 nucleotides long.
114
118
There are 0 mismatches or gap openings.
115
119
The match starts at index 1 on the query sequence, and ends at index 82.
116
-
This region matches a region on the `Test1`that spans from index 82 to 119.
117
-
The E-value is `5.64e-18`, meaning that it is extremely unlikely that this match occured simply due to chance.
120
+
This region matches a region in the `Test1`sequence spanning from index 82 to 119.
121
+
The E-value is `5.64e-18`, meaning that it is extremely unlikely that this match occurred simply due to chance.
118
122
119
123
Here is a description of the E-value from the NCBI [website](https://blast.ncbi.nlm.nih.gov/doc/blast-help/FAQ.html):
120
124
> The Expect value (E) is a parameter that describes the number of
@@ -151,7 +155,8 @@ We should see that the query fasta is a direct hit to the _mecA_ gene
151
155
For this BLAST search, I will search against the `core_nt` database,
152
156
which is a faster, smaller, and more focused subset of the traditional `nt` (nucleotide) database.
153
157
This newer database is the default as of August 2024.
154
-
It seeks to reduce redundancy and reduce storage when downloading the database. More information about it can be found [here](https://ncbiinsights.ncbi.nlm.nih.gov/2024/07/18/new-blast-core-nucleotide-database/).
158
+
It seeks to reduce redundancy and storage requirements when downloading the database.
159
+
More information about it can be found [here](https://ncbiinsights.ncbi.nlm.nih.gov/2024/07/18/new-blast-core-nucleotide-database/).
155
160
156
161
General information about the different kinds of BLAST databases is also available [here](https://www.nlm.nih.gov/ncbi/workshops/2023-08_BLAST_evol/databases.html).
157
162
@@ -204,10 +209,10 @@ Because of this, the first row in the results is not necessarily a better match
204
209
even though it appears first.
205
210
206
211
To verify the first hit, we can look up the GenBankID of the first hit: `CP026646.1`.
207
-
The NCBI [page](https://www.ncbi.nlm.nih.gov/nuccore/CP026646.1/)listing this sample confirms that this sample was phenotyped as _S. aureus_.
212
+
The NCBI [page](https://www.ncbi.nlm.nih.gov/nuccore/CP026646.1/)for this sample confirms that this sample was phenotyped as _S. aureus_.
208
213
Our query matches from indices 46719 to 46580.
209
214
When we use the Graphics feature to visualize gene annotations, we see that there is a clear match to _mecA_.
210
215
211
216

212
217
213
-
Overall, this confirms that our BLAST worked correctly!
218
+
Overall, this confirms that our BLAST worked as corrected!
0 commit comments