You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: cookbook/02-alignments.md
+49-36Lines changed: 49 additions & 36 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,24 +4,29 @@ rss_descr = "Align a gene against a reference genome using BioAlignments.jl"
4
4
+++
5
5
6
6
As mentioned in the previous [tutorial]("../01-sequences.md"), in this chapter, we will learn about alignments.
7
-
We will explore pair-wise alignment as a tool to compare two copies of the _mecA_ gene found on NCBI.
7
+
We will explore pairwise alignment as a tool to compare two copies of the _mecA_ gene found on NCBI.
8
8
9
9
# Pairwise Alignment
10
10
11
-
On the most basic level, aligners take two sequences and use algorithms to try to "line them up"
11
+
On the most basic level, aligners use algorithms to "line up" sequences
12
12
and look for regions of similarity.
13
13
14
+
BioAlignments implements only pairwise alignment.
14
15
Pairwise alignment differs from multiple sequence alignment (MSA) because
15
-
it only aligns two sequences, while MSAs align three or more.
16
+
it only aligns two sequences, while MSAs align any number of sequences.
16
17
17
-
### Running the Alignment
18
-
There are two main parameters for determining how we want to perform our alignment:
19
-
the alignment type and score/cost model.
18
+
Pairwise alignment also assumes that the two sequences are roughly homologous.
19
+
For example, you may use it to align two versions of the same gene.
20
+
It is not used to map reads to a genome -- mapping would be a better solution for that.
21
+
22
+
# Running the Alignment
23
+
There are two main parameters for determining how we want to perform our alignment:
24
+
alignment type and score/cost model.
20
25
21
26
The alignment type specifies the alignment range (local vs global alignment)
22
-
and the score/cost model explains how to score insertions and deletions.
27
+
and the score/cost model explains how to score matches/mismatches in the sequences that are being compared.
23
28
24
-
####Alignment Types
29
+
### Alignment Types
25
30
Currently, four types of alignments are supported:
26
31
-`GlobalAlignment`: global-to-global alignment
27
32
- Aligns sequences end-to-end
@@ -66,7 +71,7 @@ The cost model provides a way to calculate penalties for differences between the
66
71
and then finds the alignment that minimizes the total penalty.
67
72
`AffineGapScoreModel` is the scoring model currently supported by `BioAlignments.jl`.
68
73
It imposes an affine gap penalty for insertions and deletions,
69
-
which means that it penalizes the opening of a gap more than a gap extending.
74
+
which means that it penalizes the opening of a gap more than a gap extension.
70
75
Deletions are rare mutations, but if there's a deletion, the length of the deletion is variable.
71
76
Longer deletions are less likely than short ones only because they change the structure of the encoded protein more.
72
77
@@ -80,29 +85,29 @@ These distance metrics are currently supported:
80
85
-`LevenshteinDistance`
81
86
-`HammingDistance`
82
87
83
-
This is a complicated topic, and more information can be found in the BioAlignments documentation about the cost model [here](https://biojulia.dev/BioAlignments.jl/stable/pairalign/).
88
+
More information can be found in the BioAlignments documentation about the cost model [here](https://biojulia.dev/BioAlignments.jl/stable/pairalign/).
84
89
85
90
Just like alignment type, the cost model should be selected based on what the user is optimizing for
86
91
and what is known about the two sequences.
87
92
88
93
89
-
###Calling BioAlignments to Run the Alignment
94
+
## Calling BioAlignments to Run the Alignment
90
95
91
96
Now that we have a good understanding of how `pairalign` works, let's run an example!
92
97
93
-
In this example, we'll compare two similar genes: mecA found in _S. aureus_ ([link to gene on NCBI here](https://www.ncbi.nlm.nih.gov/nuccore/NG_047945.1)), and a homologue, mecA1, found on a_S. sciuri_ ([link to gene on NCBI here](https://www.ncbi.nlm.nih.gov/gene/?term=PBP2a+family+beta-lactam-resistant+peptidoglycan+transpeptidase+MecA1)).
94
-
The two Staphs are closely related species in the same _Staphylococcaceae_ family.
98
+
In this example, we'll compare two similar genes: _mecA_ found in _S. aureus_ ([link to gene on NCBI here](https://www.ncbi.nlm.nih.gov/nuccore/NG_047945.1)), and a homologue, _mecA1_, found in_S. sciuri_ ([link to gene on NCBI here](https://www.ncbi.nlm.nih.gov/gene/?term=PBP2a+family+beta-lactam-resistant+peptidoglycan+transpeptidase+MecA1)).
99
+
These genes are found in two Staph species that are closely related and in the same family (_Staphylococcaceae_).
95
100
96
101
Because we are comparing homologous genes from two closely related species, we wouldn't expect too many differences.
97
-
Although mecA1 doesn't confer resistance to beta-lactams in _S. sciuri_ like _mecA_ does to _S. aureus_,
102
+
Although _mecA1_ doesn't confer resistance to beta-lactams in _S. sciuri_ like _mecA_ does to _S. aureus_,
98
103
the gene should be mostly conserved.
99
-
In fact, mecA1 is considered a pre-cursor to mecA.
100
-
Research indicates that there is 80% nucleotide identity between the two genes.[1].
104
+
In fact, _mecA1_ is considered a precursor to _mecA_[1].
105
+
Research indicates that there is 80% nucleotide identity between the two genes[1].
101
106
Due to the similarity in the genes we are comparing, it makes the most sense to run a global alignment.
102
107
103
108
In this first example, we'll align two strings that contain the genes.
The score returned is entirely dependent on the scoring scheme
323
-
(how we penalized gaps, gap extensions and rewarded matches).
332
+
The alignment score is entirely dependent on the scoring scheme
333
+
(how we penalized gaps, gap extensions and rewarded matches).
324
334
It is not an absolute number that we can compare from alignment to alignment.
325
-
In our example, our score was influenced by -5 for the start of a gap, and -1 for a gap extension.
326
-
If these values were to change, we would get a different score.
327
-
However, generally, longer alignments produce larger scores (as there are more bases being compared).
335
+
In our example, our score was influenced by a -5 penalty for the start of a gap, and a -1 penalty for a gap extension.
336
+
If these values were to change in our cost model,
337
+
this would affect the final score, even if the sequences were the same.
338
+
However, longer alignments generally produce larger scores
339
+
(as there are more bases being compared).
328
340
329
341
Overall, the two sequences are homologous over most of their length.
330
-
There are many matches, but there are frequent small indels and substitutions.
331
-
The biggest mismatch is in a section toward the end,
332
-
where there are large stretches that are missing in the reference sequence (mecA1).
342
+
There are many matches, but also frequent small indels and substitutions.
343
+
The biggest mismatch is in a section toward the end
344
+
(from base 1980 in the reference onwards),
345
+
where there are large stretches that are missing in the reference sequence (_mecA1_).
333
346
334
347
### Citations
335
348
336
-
[1]: Rolo J, Worning P, Boye Nielsen J, Sobral R, Bowden R, Bouchami O, Damborg P, Guardabassi L, Perreten V, Westh H, Tomasz A, de Lencastre H, Miragaia M. Evidence for the evolutionary steps leading to mecA-mediated β-lactam resistance in staphylococci. PLoS Genet. 2017 Apr 10;13(4):e1006674. doi: 10.1371/journal.pgen.1006674. PMID: 28394942; PMCID: PMC5402963. [link to the source](10.1371/journal.pgen.1006674=).
349
+
[1]: Rolo J, Worning P, Boye Nielsen J, Sobral R, Bowden R, Bouchami O, Damborg P, Guardabassi L, Perreten V, Westh H, Tomasz A, de Lencastre H, Miragaia M. Evidence for the evolutionary steps leading to mecA-mediated β-lactam resistance in staphylococci. PLoS Genet. 2017 Apr 10;13(4):e1006674. doi: 10.1371/journal.pgen.1006674. PMID: 28394942; PMCID: PMC5402963. [link to the source](https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1006674).
0 commit comments