Skip to content

Commit faf2890

Browse files
add initial draft
1 parent c0207c3 commit faf2890

3 files changed

Lines changed: 126 additions & 0 deletions

File tree

cookbook/assets/mecA.fasta

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
>NC_007795.1:907598-908317 SAOUHSC_00935 [organism=Staphylococcus aureus subsp. aureus NCTC 8325] [GeneID=3920764] [chromosome=]
2+
ATGAGAATAGAACGAGTAGATGATACAACTGTAAAATTGTTTATAACATATAGCGATATCGAGGCCCGTG
3+
GATTTAGTCGTGAAGATTTATGGACAAATCGCAAACGTGGCGAAGAATTCTTTTGGTCAATGATGGATGA
4+
AATTAACGAAGAAGAAGATTTTGTTGTAGAAGGTCCATTATGGATTCAAGTACATGCCTTTGAAAAAGGT
5+
GTCGAAGTCACAATTTCTAAATCTAAAAATGAAGATATGATGAATATGTCTGATGATGATGCAACTGATC
6+
AATTTGATGAACAAGTTCAAGAATTGTTAGCTCAAACATTAGAAGGTGAAGATCAATTAGAAGAATTATT
7+
CGAGCAACGAACAAAAGAAAAAGAAGCTCAAGGTTCTAAACGTCAAAAGTCTTCAGCACGTAAAAATACA
8+
AGAACAATCATTGTGAAATTTAACGATTTAGAAGATGTTATTAATTATGCATATCATAGCAATCCAATAA
9+
CTACAGAGTTTGAAGATTTGTTATATATGGTTGATGGTACTTATTATTATGCTGTATATTTTGATAGTCA
10+
TGTTGATCAAGAAGTCATTAATGATAGTTACAGTCAATTGCTTGAATTTGCTTATCCAACAGACAGAACA
11+
GAAGTTTATTTAAATGACTATGCTAAAATAATTATGAGTCATAACGTAACAGCTCAAGTTCGACGTTATT
12+
TTCCAGAGACAACTGAATAA

cookbook/index.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
+++
2+
using Dates
3+
date = Date("2026-03-04")
4+
title = "BioJulia Cookbook"
5+
rss_descr = "Recipes for performing basic bioinformatics in julia"
6+
+++
7+
8+
# BioJulia Cookbook
9+
10+
This cookbook will provide a series of "recipes" that will help get started quickly with BioJulia so you can doing some bioinformatics!
11+
12+
We have tutorials for reading in files, performing alignments, and using tools such as BLAST,
13+
as well as links to more documentation about specific BioJulia packages.
14+
15+
{{list_dir cookbook}}

cookbook/sequences.md

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
+++
2+
title = "Sequence Input/Output"
3+
rss_descr = "Reading in fasta files using FASTX.jl"
4+
+++
5+
6+
# Sequence Input/Output
7+
8+
In this chapter, we'll talk about how to read in sequence files using the `FASTX.jl` module.
9+
More information about the `FASTX.jl` package can be found at https://biojulia.dev/FASTX.jl/stable/
10+
and with the built-in documentation.
11+
12+
```julia
13+
julia> using FASTX
14+
julia> ?FASTX
15+
```
16+
17+
If FASTX is not already in your environment,
18+
it can be easily added from the Julia Registry.
19+
20+
To demonstrate how to this package,
21+
let's try to read in some real-world data!
22+
23+
The `FASTX` can read in 3 file types: fasta, fastq, and fai.
24+
25+
### FASTA files
26+
FASTA files are text files containing biological sequence data.
27+
They have three parts: name, description, and sequence.
28+
29+
The template of a sequence record is:
30+
```
31+
>{description}
32+
{sequence}
33+
```
34+
35+
### FASTQ files
36+
FASTQ files are also text-based files that contain sequences, along with a name and description. However, they also store sequence quality information (the Q is for quality!).
37+
38+
The template of a sequence record is:
39+
```
40+
@{description}
41+
{sequence}
42+
+{description}?
43+
{qualities}
44+
```
45+
46+
### FAI files
47+
48+
FAI (FASTA index) files are used in conjuction with FASTA/FASTQ files.
49+
They are text files with TAB-delimited columns.
50+
Each line contains information about each region sequence within the FASTA/FASTQ.
51+
More information about fai index files can be found [here](https://www.htslib.org/doc/faidx.html).
52+
53+
```
54+
NAME Name of this reference sequence
55+
LENGTH Total length of this reference sequence, in bases
56+
OFFSET Offset in the FASTA/FASTQ file of this sequence's first base
57+
LINEBASES The number of bases on each line
58+
LINEWIDTH The number of bytes in each line, including the newline
59+
QUALOFFSET Offset of sequence's first quality within the FASTQ file
60+
61+
```
62+
63+
64+
65+
66+
67+
We will read in a fasta file containing the _mecA_ gene.
68+
This gene was taken from NCBI [here](https://www.ncbi.nlm.nih.gov/gene?Db=gene&Cmd=DetailsSearch&Term=3920764#).
69+
70+
First we'll open the file,
71+
then we'll iterate over every record in the file and
72+
print out the sequence identifier, the sequence description and then the corresponding sequence.
73+
74+
```julia
75+
julia> FASTAReader(open("assets/mecA.fasta")) do reader
76+
for record in reader
77+
println(identifier(record))
78+
println(description(record))
79+
println(sequence(record))
80+
end
81+
end
82+
83+
NC_007795.1:907598-908317
84+
NC_007795.1:907598-908317 SAOUHSC_00935 [organism=Staphylococcus aureus subsp. aureus NCTC 8325] [GeneID=3920764] [chromosome=]
85+
ATGAGAATAGAACGAGTAGATGATACAACTGTAAAATTGTTTATAACATATAGCGATATCGAGGCCCGTGGATTTAGTCGTGAAGATTTATGGACAAATCGCAAACGTGGCGAAGAATTCTTTTGGTCAATGATGGATGAAATTAACGAAGAAGAAGATTTTGTTGTAGAAGGTCCATTATGGATTCAAGTACATGCCTTTGAAAAAGGTGTCGAAGTCACAATTTCTAAATCTAAAAATGAAGATATGATGAATATGTCTGATGATGATGCAACTGATCAATTTGATGAACAAGTTCAAGAATTGTTAGCTCAAACATTAGAAGGTGAAGATCAATTAGAAGAATTATTCGAGCAACGAACAAAAGAAAAAGAAGCTCAAGGTTCTAAACGTCAAAAGTCTTCAGCACGTAAAAATACAAGAACAATCATTGTGAAATTTAACGATTTAGAAGATGTTATTAATTATGCATATCATAGCAATCCAATAACTACAGAGTTTGAAGATTTGTTATATATGGTTGATGGTACTTATTATTATGCTGTATATTTTGATAGTCATGTTGATCAAGAAGTCATTAATGATAGTTACAGTCAATTGCTTGAATTTGCTTATCCAACAGACAGAACAGAAGTTTATTTAAATGACTATGCTAAAATAATTATGAGTCATAACGTAACAGCTCAAGTTCGACGTTATTTTCCAGAGACAACTGAATAA
86+
```
87+
88+
In this case, there is only one sequence.
89+
90+
Let's try reading in a larger fastq file.
91+
92+
```julia
93+
94+
95+
96+
```
97+
98+
99+

0 commit comments

Comments
 (0)