@@ -3,6 +3,9 @@ title = "Pairwise alignment"
33rss_descr = " Align a gene against a reference genome using BioAlignments.jl"
44+++
55
6+ As mentioned in the previous [ tutorial] ( "../01-sequences.md" ) , in this chapter, we will learn about alignments.
7+ We will explore pair-wise alignment as a tool to compare two copies of the _ mecA_ gene found on NCBI.
8+
69# Pairwise Alignment
710
811On the most basic level, aligners take two sequences and use algorithms to try to "line them up"
@@ -87,20 +90,58 @@ and what is known about the two sequences.
8790
8891Now that we have a good understanding of how ` pairalign ` works, let's run an example!
8992
93+ In this example, we'll compare two similar genes: mecA found in _ S. aureus_ ([ link to gene on NCBI here] ( https://www.ncbi.nlm.nih.gov/nuccore/NG_047945.1 ) ), and a homologue, mecA1, found on a _ S. sciuri_ ([ link to gene on NCBI here] ( https://www.ncbi.nlm.nih.gov/gene/?term=PBP2a+family+beta-lactam-resistant+peptidoglycan+transpeptidase+MecA1 ) ).
94+ The two Staphs are closely related species in the same _ Staphylococcaceae_ family.
95+
96+ Because we are comparing homologous genes from two closely related species, we wouldn't expect too many differences.
97+ Although mecA1 doesn't confer resistance to beta-lactams in _ S. sciuri_ like _ mecA_ does to _ S. aureus_ ,
98+ the gene should be mostly conserved.
99+ In fact, mecA1 is considered a pre-cursor to mecA.
100+ Research indicates that there is 80% nucleotide identity between the two genes.[ 1] .
101+ Due to the similarity in the genes we are comparing, it makes the most sense to run a global alignment.
102+
103+ In this first example, we'll align two strings that contain the genes.
104+
105+ #### Running Alignment on BioSequences Object
106+
90107``` julia
91108using BioAlignments
92109
93- s1 =
94- s2 =
110+ mecA =
111+ "ATGAAAAAGATAAAAATTGTTCCACTTATTTTAATAGTTGTAGTTGTCGGGTTTGGTATATATTTTTATGCTTCAAAAGATAAAGAAATTAATAATACTATTGATGCAATTGAAGATAAAAATTTCAAACAAGTTTATAAAGATAGCAGTTATATTTCTAAAAGCGATAATGGTGAAGTAGAAATGACTGAACGTCCGATAAAAATATATAATAGTTTAGGCGTTAAAGATATAAACATTCAGGATCGTAAAATAAAAAAAGTATCTAAAAATAAAAAACGAGTAGATGCTCAATATAAAATTAAAACAAACTACGGTAACATTGATCGCAACGTTCAATTTAATTTTGTTAAAGAAGATGGTATGTGGAAGTTAGATTGGGATCATAGCGTCATTATTCCAGGAATGCAGAAAGACCAAAGCATACATATTGAAAATTTAAAATCAGAACGTGGTAAAATTTTAGACCGAAACAATGTGGAATTGGCCAATACAGGAACAGCATATGAGATAGGCATCGTTCCAAAGAATGTATCTAAAAAAGATTATAAAGCAATCGCTAAAGAACTAAGTATTTCTGAAGACTATATCAAACAACAAATGGATCAAAATTGGGTACAAGATGATACCTTCGTTCCACTTAAAACCGTTAAAAAAATGGATGAATATTTAAGTGATTTCGCAAAAAAATTTCATCTTACAACTAATGAAACAAAAAGTCGTAACTATCCTCTAGAAAAAGCGACTTCACATCTATTAGGTTATGTTGGTCCCATTAACTCTGAAGAATTAAAACAAAAAGAATATAAAGGCTATAAAGATGATGCAGTTATTGGTAAAAAGGGACTCGAAAAACTTTACGATAAAAAGCTCCAACATGAAGATGGCTATCGTGTCACAATCGTTGACGATAATAGCAATACAATCGCACATACATTAATAGAGAAAAAGAAAAAAGATGGCAAAGATATTCAACTAACTATTGATGCTAAAGTTCAAAAGAGTATTTATAACAACATGAAAAATGATTATGGCTCAGGTACTGCTATCCACCCTCAAACAGGTGAATTATTAGCACTTGTAAGCACACCTTCATATGACGTCTATCCATTTATGTATGGCATGAGTAACGAAGAATATAATAAATTAACCGAAGATAAAAAAGAACCTCTGCTCAACAAGTTCCAGATTACAACTTCACCAGGTTCAACTCAAAAAATATTAACAGCAATGATTGGGTTAAATAACAAAACATTAGACGATAAAACAAGTTATAAAATCGATGGTAAAGGTTGGCAAAAAGATAAATCTTGGGGTGGTTACAACGTTACAAGATATGAAGTGGTAAATGGTAATATCGACTTAAAACAAGCAATAGAATCATCAGATAACATTTTCTTTGCTAGAGTAGCACTCGAATTAGGCAGTAAGAAATTTGAAAAAGGCATGAAAAAACTAGGTGTTGGTGAAGATATACCAAGTGATTATCCATTTTATAATGCTCAAATTTCAAACAAAAATTTAGATAATGAAATATTATTAGCTGATTCAGGTTACGGACAAGGTGAAATACTGATTAACCCAGTACAGATCCTTTCAATCTATAGCGCATTAGAAAATAATGGCAATATTAACGCACCTCACTTATTAAAAGACACGAAAAACAAAGTTTGGAAGAAAAATATTATTTCCAAAGAAAATATCAATCTATTAACTGATGGTATGCAACAAGTCGTAAATAAAACACATAAAGAAGATATTTATAGATCTTATGCAAACTTAATTGGCAAATCCGGTACTGCAGAACTCAAAATGAAACAAGGAGAAACTGGCAGACAAATTGGGTGGTTTATATCATATGATAAAGATAATCCAAACATGATGATGGCTATTAATGTTAAAGATGTACAAGATAAAGGAATGGCTAGCTACAATGCCAAAATCTCAGGTAAAGTGTATGATGAGCTATATGAGAACGGTAATAAAAAATACGATATAGATGAATAACAAAACAGTGAAGCAATCCGTAACGATGGTTGCTTCACTGTTTTATTATGAATTATTAATAAGTGCTGTTACTTCTCCCTTAAATACAATTTCTTCATTT"
112+ mecA1 = "ATGAAAAAATTAATCATCGCCATCGTGATTGTAATCATCGCTGTTGGTTCAGGCGTATTCTTTTATGCATCTAAAGATAAGAAAATAAACGAAACAATTGATGCCATTGAAGATAAAAACGTTAAGCAAGTCTTTAAAAATAGTACTTACCAATCTAAAAACGATAATGGTGAAGTAGAAATGACAGACCGCCCTATTAAGATTTATGACAGTCTCGGCGTCAAAGATATCAACATTAAAGATCGTGATATCAAAAAGGTTTCGAAAAACAAAAAACAAGTCACAGCAAAGTATGAACTTCAAACGAATTACGGCAAAATTAATCGTGACGTTAAATTAAACTTTATTAAAGAAGATAAAGATTGGAAATTGGATTGGAATCAAAATGCCATTATTCCAGGCATGAAGAAAAATCAATCCATCAATATTGAACCATTGAAATCAGAACGAGGTAAGATTTTAGACAGGAACAATGTAGAGTTAGCCACTACAGGAACAACACATGAAGTTGGTATTGTTCCTAATAATGTTTCCACAAGTGATTACAAAGCAATCGCTGAAAAGTTAGACCTTTCAGAATCGTATATTAAACAGCAAACAGAACAGGATTGGGTTAAAGATGATACATTCGTCCCTCTCAAGACTGTTCAAGATATGAATCAAGATTTAAAGAATTTTGTTGAAAAGTATCATCTCACATCACAGGAAACAGAAAGTCGACAGTATCCGCTTGAAGAAGCAACAACGCACTTACTTGGATATGTTGGCCCTATTAATTCAGAAGAATTGAAGCAAAAAGCATTTAAAGGTTATAAAAAGGATGCCATCGTTGGTAAAAAAGGTATCGAAAAACTATACGATAAAGACCTTCAAAATAAAGACGGATACCGTGTCACAATAATTGATGATAATAATAAAGTTATTGATACATTAATAGAGAAAAAGAAAATAGACGGCAAAGATATTAAATTAACCATTGATGCTAGAGTCCAAAAAAGTATTTATAACAACATGAAAGATGACTACGGTTCGGGGACTGCTATTCATCCACAAACTGGTGAACTCTTAGCACTTGTCAGCACGCCATCTTATGATGTTTATCCATTTATGAATGGAATGAGCGATGAAGATTATAAGAAATTAACTGAAGATGATAAAGAGCCACTCCTTAATAAGTTCCAAATTACGACATCACCAGGTTCGACTCAAAAAATATTAACAGCCATGATTGGCTTAAACAATAAGACATTAGACGGCAAAACAAGTTATAAAATTAATGGAAAAGGTTGGCAAAAAGATAAATCTTGGGGTGACTACAACGTTACAAGATACGAAGTTGTGAATGCCGATATCGACTTAAAACAAGCTATTGAATCATCAGATAATATCTTCTTTGCGAGAGTTGCACTTGAATTAGGCAGCAAAAAATTCGAAGAAGGAATGAAACGCCTTGGTGTTGGTGAAGATATCCCGAGTGATTATCCATTCTACAATGCACAAATTTCAAATAAGAACTTAGATAATGAAATATTGTTAGCTGACTCAGGTTATGGCCAAGGTGAAATATTAATCAATCCTGTTCAAATTCTTTCAATATACAGCGCATTAGAGAACAAAGGTAATGTGAATGCACCACATGTACTCAAAGATACGAAAAATAAAGTCTGGAAGAAGAACATCATTTCCCAGGAAAATATTAAATTGTTAACAGACGGTATGCAACAAGTCGTGAACAAAACACATAGAGAAGATATTTATAGATCATATGCCAACTTAGTTGGTAAATCAGGTACAGCTGAACTCAAGATGAAACAAGGTGAGACAGGACAACAAATAGGTTGGTTCATTTCATATGATAAAGATAATCCAAATATAATGATGGCTATTAATGTGAAAGATGTACAAGATAAAGGCATGGCAAGTTACAATGCCAAAATATCTGGAAAAGTGTATGACGATTTATATGATAACGGTAAGAAAACGTATCGTATTGATAAATAA"
95113scoremodel = AffineGapScoreModel (EDNAFULL, gap_open= - 5 , gap_extend= - 1 );
96114
97- res = pairalign (GlobalAlignment (), s1, s2, scoremodel) # run pairwise alignment
115+ res = pairalign (GlobalAlignment (), mecA, mecA1, scoremodel)
116+ # run pairwise alignment
117+ ```
118+
98119
99- res
120+ #### Running Alignment on FASTX files
121+ In this next example, we'll repeat the same alignment,
122+ but read in the files directly from the FASTA files containing the gene.
123+ Running the alignment on strings is straightforward with short sequences, but when comparing entire genes, simply reading in the file is easier.
124+ ``` julia
125+ using BioSequences
126+ using FASTX
127+
128+ # Write a function to get sequence out of a fasta file with 1 record
129+ function fasta_sequence (fasta_path)
130+ record = open (FASTA. Reader, fasta_path) do reader
131+ first (reader)
132+ end
133+ seq = LongDNA {4} (String (FASTX. sequence (record)))
134+ return (seq)
135+ end
136+
137+ mecA_fasta = fasta_sequence (" assets/mecA.fasta" )
138+ mecA1_fasta = fasta_sequence (" assets/mecA1.fasta" )
139+
140+ res_fasta = pairalign (GlobalAlignment (), mecA_fasta, mecA1_fasta, scoremodel) # run pairwise alignment
100141```
101142
102143
103- ### Understanding how alignments are represented
144+ ### Understanding how Alignments are Represented
104145The output of an alignment is a series of ` AlignmentAnchor ` objects.
105146This data structure gives information on the position of the start of the alignment,
106147sections where nucleotides match, as well as where there may be deletions or insertions.
@@ -119,4 +160,178 @@ this package uses position 0 to refer to the first position.
119160The next nucleotides are a match in the query and reference sequences.
120161The last 8 nucleotides in the alignment are deleted in the query sequence.
121162
122- To learn more about the output of the alignment created using BioAlignments.jl, see [ here] ( https://biojulia.dev/BioAlignments.jl/stable/alignments/ ) .
163+ To learn more about the output of the alignment created using BioAlignments.jl, see [ here] ( https://biojulia.dev/BioAlignments.jl/stable/alignments/ ) .
164+
165+
166+ #### Interpreting the Example Output
167+ ```
168+ # run pairwise alignment
169+
170+ res
171+ PairwiseAlignmentResult{Int64, String, String}:
172+ score: 6375
173+ seq: 1 ATGAAAAAGATAAAA-ATTGTTCCA-CTT-ATTTTAAT-A-----GTTGTAGTTGTCGGG 51
174+ |||||||| || || || | ||| | | ||| |||| | |||| ||| | ||
175+ ref: 1 ATGAAAAA-ATTAATCATCG--CCATCGTGATTGTAATCATCGCTGTTG--GTT--CAGG 53
176+
177+ seq: 52 TTTGGTATATAT-TTTTATGCTTCAAAAGATAAAGAAATTAAT--AATACTATTGATGCA 108
178+ |||| | |||||||| || |||||||| |||| ||| || || ||||||||
179+ ref: 54 C---GTAT---TCTTTTATGCATCTAAAGATAA-GAAAATAAACGAA-ACAATTGATGCC 105
180+
181+ seq: 109 ATTGAAGATAAAAA--TTTCAAACAAGT-TTATAAAGATAGCAGTTAT--ATTTCTAAAA 163
182+ |||||||||||||| || || ||||| || |||| |||| | ||| || ||||||
183+ ref: 106 ATTGAAGATAAAAACGTT--AAGCAAGTCTT-TAAAAATAGTACTTACCAAT--CTAAAA 160
184+
185+ seq: 164 GCGATAATGGTGAAGTAGAAATGACTGAACGTCCGATAAAAATATATAATAGTTTAGGCG 223
186+ |||||||||||||||||||||||| || || || || || || ||| | ||| | ||||
187+ ref: 161 ACGATAATGGTGAAGTAGAAATGACAGACCGCCCTATTAAGATTTATGACAGTCTCGGCG 220
188+
189+ seq: 224 TTAAAGATATAAACATTCAGGATCGTAAAATAAAAAAAGTATCTAAAAATAAAAAACGAG 283
190+ | |||||||| |||||| | |||||| | || ||||| || || ||||| ||||||| ||
191+ ref: 221 TCAAAGATATCAACATTAAAGATCGTGATATCAAAAAGGTTTCGAAAAACAAAAAACAAG 280
192+
193+ seq: 284 T-AGATGCTCAA--TATAAAATTAAAACAAACTACGGTAACATTGATCGCAACGTTCAAT 340
194+ | | | || || ||| || || |||| || ||||| || ||| |||| ||||| |||
195+ ref: 281 TCACA-GC--AAAGTATGAACTTCAAACGAATTACGGCAAAATTAATCGTGACGTTAAAT 337
196+
197+ seq: 341 TTAATTTTGTTAAAGAAGAT---GGTATGTGGAAGTTAGATTGGGATCATA--GCGTCAT 395
198+ | || ||| ||||||||||| | || ||||| || |||||| |||| | || |||
199+ ref: 338 TAAACTTTATTAAAGAAGATAAAG--AT-TGGAAATTGGATTGGAATCAAAATGC--CAT 392
200+
201+ seq: 396 TATTCCAGGAATGCAGAAAGACCAAAGCATACA-TATTGAA--AATTTAAAATCAGAACG 452
202+ ||||||||| ||| ||||| | ||| ||| || ||||||| | || |||||||||||
203+ ref: 393 TATTCCAGGCATGAAGAAAAATCAATCCAT-CAATATTGAACCA--TTGAAATCAGAACG 449
204+
205+ seq: 453 TGGTAAAATTTTAGACCGAAACAATGTGGAATTGGCCAATACAGGAACAGCATATGA-GA 511
206+ ||||| ||||||||| | |||||||| || || |||| |||||||||| || |||| |
207+ ref: 450 AGGTAAGATTTTAGACAGGAACAATGTAGAGTTAGCCACTACAGGAACAACACATGAAGT 509
208+
209+ seq: 512 TAGGCATCGTTCCAAAGAATGTATCTAAAAAAGATTATAAAGCAATCGCTAAAGAACTAA 571
210+ | || || ||||| || ||||| || | || ||||| |||||||||||| ||| ||
211+ ref: 510 T-GGTATTGTTCCTAATAATGTTTCCACAAGTGATTACAAAGCAATCGCT---GAA--AA 563
212+
213+ seq: 572 GT-A----TTTCTGAA--GACTATATCAAACAACAAATGGATCAAAATTGGGT-ACAAGA 623
214+ || | |||| ||| | ||||| ||||| |||| || || ||||||| | ||||
215+ ref: 564 GTTAGACCTTTCAGAATCG--TATATTAAACAGCAAACAGAACAGGATTGGGTTA-AAGA 620
216+
217+ seq: 624 TGATACCTTCGTTCCACTTAAAACCGTTAAAAAAATGGATGAATATTTAAGTGA-TTTCG 682
218+ |||||| ||||| || || || || ||| || | ||| || || |||||| || ||| |
219+ ref: 621 TGATACATTCGTCCCTCTCAAGACTGTTCAAGATATGAATCAAGATTTAAA-GAATTTTG 679
220+
221+ seq: 683 CAAAAAAATTTCATCTTACA--ACTAATGAAACAAAAAGTCGTAAC--TATCCTCTAGAA 738
222+ |||| | |||||| ||| || | |||||| ||||||| || ||||| || |||
223+ ref: 680 TTGAAAAGTATCATCTCACATCAC--AGGAAACAGAAAGTCG--ACAGTATCCGCTTGAA 735
224+
225+ seq: 739 AAAGCGACTTCACATCT-A-TTAGGTTATGTTGGTCCC-ATTAACTCTGAAGAATTAAAA 795
226+ |||| || | || || | || || |||||||| ||| ||||| || |||||||| ||
227+ ref: 736 GAAGCAACAACGCA-CTTACTT-GGATATGTTGG-CCCTATTAATTCAGAAGAATTGAAG 792
228+
229+ seq: 796 CAAAAAGAATATAAAGGCTATAAAGATGATGC-AGTTATTGGTAAAAAGGG-ACTCGAAA 853
230+ ||||||| || |||||| |||||| | ||||| | | |||||||||| || | ||||||
231+ ref: 793 CAAAAAGCATTTAAAGGTTATAAAAAGGATGCCA-TCGTTGGTAAAAAAGGTA-TCGAAA 850
232+
233+ seq: 854 AACTTTACGATAAAAAGCTCCAACATGAAGATGGCTATCGTGTCACAATCGTTGACGATA 913
234+ |||| ||||||||| | || ||| || |||| || || ||||||||||| |||| ||||
235+ ref: 851 AACTATACGATAAAGACCTTCAAAATAAAGACGGATACCGTGTCACAATAATTGATGATA 910
236+
237+ seq: 914 ATAGCAATACA---ATCGCACATACATTAATAGAGAAAAAGAAAAAAGATGGCAAAGATA 970
238+ ||| ||| | || | |||||||||||||||||||||||| ||| ||||||||||
239+ ref: 911 ATA---ATAAAGTTATTG---ATACATTAATAGAGAAAAAGAAAATAGACGGCAAAGATA 964
240+
241+ seq: 971 TTCAACTAACTATTGATGCTAAAGTTCAAAAGAGTATTTATAACAACATGAAAAATGATT 1030
242+ || || |||| |||||||||| ||| ||||| ||||||||||||||||||||| |||| |
243+ ref: 965 TTAAATTAACCATTGATGCTAGAGTCCAAAAAAGTATTTATAACAACATGAAAGATGACT 1024
244+
245+ seq: 1031 ATGGCTCAGGTACTGCTATCCACCCTCAAACAGGTGAATTATTAGCACTTGTAAGCACAC 1090
246+ | || || || |||||||| || || ||||| |||||| | ||||||||||| ||||| |
247+ ref: 1025 ACGGTTCGGGGACTGCTATTCATCCACAAACTGGTGAACTCTTAGCACTTGTCAGCACGC 1084
248+
249+ seq: 1091 CTTCATATGACGTCTATCCATTTATGTATGGCATGAGTAACGAAGAATATAATAAATTAA 1150
250+ | || ||||| || |||||||||||| |||| ||||| | ||||| ||||| |||||||
251+ ref: 1085 CATCTTATGATGTTTATCCATTTATGAATGGAATGAGCGATGAAGATTATAAGAAATTAA 1144
252+
253+ seq: 1151 CCGAAGATAAAAAAGAACCTCTGCTCAACAAGTTCCAGATTACAACTTCACCAGGTTCAA 1210
254+ | |||||| | ||||| || || || || |||||||| ||||| || ||||||||||| |
255+ ref: 1145 CTGAAGATGATAAAGAGCCACTCCTTAATAAGTTCCAAATTACGACATCACCAGGTTCGA 1204
256+
257+ seq: 1211 CTCAAAAAATATTAACAGCAATGATTGGGTTAAATAACAAAACATTAGACGATAAAACAA 1270
258+ ||||||||||||||||||| |||||||| ||||| || || |||||||||| |||||||
259+ ref: 1205 CTCAAAAAATATTAACAGCCATGATTGGCTTAAACAATAAGACATTAGACGGCAAAACAA 1264
260+
261+ seq: 1271 GTTATAAAATCGATGGTAAAGGTTGGCAAAAAGATAAATCTTGGGGTGGTTACAACGTTA 1330
262+ |||||||||| |||| ||||||||||||||||||||||||||||||| ||||||||||
263+ ref: 1265 GTTATAAAATTAATGGAAAAGGTTGGCAAAAAGATAAATCTTGGGGTGACTACAACGTTA 1324
264+
265+ seq: 1331 CAAGATATGAAGTGGTAAATG--GTAATATCGACTTAAAACAAGCAATAGAATCATCAGA 1388
266+ ||||||| ||||| || |||| | ||||||||||||||||||| || |||||||||||
267+ ref: 1325 CAAGATACGAAGTTGTGAATGCCG--ATATCGACTTAAAACAAGCTATTGAATCATCAGA 1382
268+
269+ seq: 1389 TAACATTTTCTTTGCTAGAGTAGCACTCGAATTAGGCAGTAAGAAATTTGAAAAAGGCAT 1448
270+ ||| || |||||||| ||||| ||||| ||||||||||| || ||||| ||| |||| ||
271+ ref: 1383 TAATATCTTCTTTGCGAGAGTTGCACTTGAATTAGGCAGCAAAAAATTCGAAGAAGGAAT 1442
272+
273+ seq: 1449 GAAAAAACTAGGTGTTGGTGAAGATATACCAAGTGATTATCCATTTTATAATGCTCAAAT 1508
274+ |||| || ||||||||||||||||| || |||||||||||||| || ||||| |||||
275+ ref: 1443 GAAACGCCTTGGTGTTGGTGAAGATATCCCGAGTGATTATCCATTCTACAATGCACAAAT 1502
276+
277+ seq: 1509 TTCAAACAAAAATTTAGATAATGAAATATTATTAGCTGATTCAGGTTACGGACAAGGTGA 1568
278+ |||||| || || ||||||||||||||||| |||||||| |||||||| || ||||||||
279+ ref: 1503 TTCAAATAAGAACTTAGATAATGAAATATTGTTAGCTGACTCAGGTTATGGCCAAGGTGA 1562
280+
281+ seq: 1569 AATACTGATTAACCCAGTACAGATCCTTTCAATCTATAGCGCATTAGAAAATAATGGCAA 1628
282+ |||| | || || || || || || |||||||| || ||||||||||| || || || ||
283+ ref: 1563 AATATTAATCAATCCTGTTCAAATTCTTTCAATATACAGCGCATTAGAGAACAAAGGTAA 1622
284+
285+ seq: 1629 TATTAACGCACCTCACT-TATTAAAAGACACGAAAAACAAAGTTTGGAAGAAAAATATTA 1687
286+ | | || ||||| || | || | ||||| |||||||| ||||| |||||||| || || |
287+ ref: 1623 TGTGAATGCACCACA-TGTACTCAAAGATACGAAAAATAAAGTCTGGAAGAAGAACATCA 1681
288+
289+ seq: 1688 TTTCCAAAGAAAATATCAA-TCTATTAACTGATGGTATGCAACAAGTCGTAAATAAAACA 1746
290+ ||||| | |||||||| || | | ||||| || ||||||||||||||||| || ||||||
291+ ref: 1682 TTTCCCAGGAAAATATTAAAT-TGTTAACAGACGGTATGCAACAAGTCGTGAACAAAACA 1740
292+
293+ seq: 1747 CATAAAGAAGATATTTATAGATCTTATGCAAACTTAATTGGCAAATCCGGTACTGCAGAA 1806
294+ |||| |||||||||||||||||| ||||| |||||| |||| ||||| ||||| || |||
295+ ref: 1741 CATAGAGAAGATATTTATAGATCATATGCCAACTTAGTTGGTAAATCAGGTACAGCTGAA 1800
296+
297+ seq: 1807 CTCAAAATGAAACAAGGAGAAACTGG-CAGACAAATTGGGTGGTTTATATCATATGATAA 1865
298+ ||||| ||||||||||| || || || || |||||| || ||||| || |||||||||||
299+ ref: 1801 CTCAAGATGAAACAAGGTGAGACAGGACA-ACAAATAGGTTGGTTCATTTCATATGATAA 1859
300+
301+ seq: 1866 AGATAATCCAAACATGATGATGGCTATTAATGTTAAAGATGTACAAGATAAAGGAATGGC 1925
302+ |||||||||||| || ||||||||||||||||| |||||||||||||||||||| |||||
303+ ref: 1860 AGATAATCCAAATATAATGATGGCTATTAATGTGAAAGATGTACAAGATAAAGGCATGGC 1919
304+
305+ seq: 1926 TAGCTACAATGCCAAAATCTCAGGTAAAGTGTATGATGAGCTATATGAGAACGGTAATAA 1985
306+ || |||||||||||||| || || ||||||||||| || ||||||| |||||||| ||
307+ ref: 1920 AAGTTACAATGCCAAAATATCTGGAAAAGTGTATGACGATTTATATGATAACGGTAAGAA 1979
308+
309+ seq: 1986 AAAATACGATATAGATGAATAACAAAACAGTGAAGCAATCCGTAACGATGGTTGCTTCAC 2045
310+ || |||| ||
311+ ref: 1980 AA--------------------------------------CGTATCG------------- 1988
312+
313+ seq: 2046 TGTTTTATTATGAATTATTAATAAGTGCTGTTACTTCTCCCTTAAATACAATTTCTTCAT 2105
314+ |||| || || |||||
315+ ref: 1988 -----TATT--GA--TA--AATAA------------------------------------ 2001
316+
317+ seq: 2106 TT 2107
318+
319+ ref: 2001 -- 2001
320+ ```
321+
322+ The score returned is entirely dependent on the scoring scheme
323+ (how we penalized gaps, gap extensions and rewarded matches).
324+ It is not an absolute number that we can compare from alignment to alignment.
325+ In our example, our score was influenced by -5 for the start of a gap, and -1 for a gap extension.
326+ If these values were to change, we would get a different score.
327+ However, generally, longer alignments produce larger scores (as there are more bases being compared).
328+
329+ Overall, the two sequences are homologous over most of their length.
330+ There are many matches, but there are frequent small indels and substitutions.
331+ The biggest mismatch is in a section toward the end,
332+ where there are large stretches that are missing in the reference sequence (mecA1).
333+
334+ ### Citations
335+
336+ [ 1] : Rolo J, Worning P, Boye Nielsen J, Sobral R, Bowden R, Bouchami O, Damborg P, Guardabassi L, Perreten V, Westh H, Tomasz A, de Lencastre H, Miragaia M. Evidence for the evolutionary steps leading to mecA-mediated β-lactam resistance in staphylococci. PLoS Genet. 2017 Apr 10;13(4): e1006674 . doi: 10.1371/journal.pgen.1006674. PMID: 28394942; PMCID: PMC5402963. [ link to the source] ( 10.1371/journal.pgen.1006674= ) .
337+
0 commit comments