@@ -6,18 +6,20 @@ rss_descr = "Align a gene against a reference genome using BioAlignments.jl"
66As mentioned in the previous [ tutorial] ( "../01-sequences.md" ) , in this chapter, we will learn about alignments.
77We will explore pairwise alignment as a tool to compare two copies of the _ mecA_ gene found on NCBI.
88
9- # Pairwise Alignment
9+ # ` BioAlignments ` Implements Only Pairwise Alignment
1010
1111On the most basic level, aligners use algorithms to "line up" sequences
1212and look for regions of similarity.
1313
14- BioAlignments implements only pairwise alignment.
14+ ` BioAlignments ` implements pairwise alignment.
1515Pairwise alignment differs from multiple sequence alignment (MSA) because
1616it only aligns two sequences, while MSAs align any number of sequences.
17+ There is not currently a MSA package in Julia.
1718
1819Pairwise alignment also assumes that the two sequences are roughly homologous.
1920For example, you may use it to align two versions of the same gene.
2021It is not used to map reads to a genome -- mapping would be a better solution for that.
22+ If mapping is your goal, you can use a mapper like ` minimap2 ` and parse the result with ` PairwiseMappingFormat.jl ` .
2123
2224# Running the Alignment
2325There are two main parameters for determining how we want to perform our alignment:
@@ -26,7 +28,7 @@ alignment type and score/cost model.
2628The alignment type specifies the alignment range (local vs global alignment)
2729and the score/cost model explains how to score matches/mismatches in the sequences that are being compared.
2830
29- ### Alignment Types
31+ ## Alignment Types
3032Currently, four types of alignments are supported:
3133- ` GlobalAlignment ` : global-to-global alignment
3234 - Aligns sequences end-to-end
@@ -65,7 +67,7 @@ The alignment type should be selected based on what is already known about the s
6567- Are we looking at two sequences from wildly divergent organisms?
6668
6769
68- ### Cost Model
70+ ## Cost Model
6971
7072The cost model provides a way to calculate penalties for differences between the two sequences,
7173and then finds the alignment that minimizes the total penalty.
@@ -107,7 +109,7 @@ Due to the similarity in the genes we are comparing, it makes the most sense to
107109
108110In this first example, we'll align two strings that contain the genes.
109111
110- ## Running Alignment on BioSequences Object
112+ ## Aligning BioSequences Object
111113
112114``` julia
113115using BioAlignments
@@ -122,7 +124,7 @@ res = pairalign(GlobalAlignment(), mecA, mecA1, scoremodel)
122124```
123125
124126
125- ## Running Alignment on FASTX files
127+ ## Aligning FASTX files
126128In this next example, we'll repeat the same alignment,
127129but read in the files directly from the FASTA files containing the gene.
128130Running the alignment on strings is straightforward with short sequences,
@@ -149,7 +151,7 @@ res_fasta = pairalign(GlobalAlignment(), mecA_fasta, mecA1_fasta, scoremodel)
149151```
150152
151153
152- ### Understanding How Alignments Are Represented
154+ # Understanding How Alignments Are Represented
153155The output of an alignment is a series of ` AlignmentAnchor ` objects.
154156This data structure gives information on the position of the start of the alignment,
155157sections where nucleotides match, as well as where there may be deletions or insertions.
0 commit comments