Skip to content

Commit 3e3d08b

Browse files
fix formatting
1 parent de6cb4b commit 3e3d08b

File tree

1 file changed

+85
-69
lines changed

1 file changed

+85
-69
lines changed

rosalind/10-cons.md

Lines changed: 85 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -1,76 +1,92 @@
1+
+++
2+
using Dates
3+
date = Date("2026-03-02")
4+
title = "Problem 10: Consensus and Profile"
5+
rss_descr = "Solving Rosalind problem CONS — finding a consensus string from a collection of DNA strings — using base Julia, DataFrames, and matrix operations"
6+
+++
7+
18
# Consensus and Profile
29

310
🤔 [Problem link](https://rosalind.info/problems/cons/)
411

5-
!!! warning "The Problem".
6-
7-
A matrix is a rectangular table of values divided into rows and columns.
8-
An m×n matrix has m rows and n columns.
9-
Given a matrix A, we write Ai,j.
10-
to indicate the value found at the intersection of row i and column j.
11-
12-
Say that we have a collection of DNA strings,
13-
all having the same length n.
14-
Their profile matrix is a 4×n matrix P in which P1,
15-
j represents the number of times that 'A' occurs in the jth position of one of the strings,
16-
P2,j represents the number of times that C occurs in the jth position,
17-
and so on (see below).
18-
19-
A consensus string c is a string of length n
20-
formed from our collection by taking the most common symbol at each position;
21-
the jth symbol of c therefore corresponds to the symbol having the maximum value
22-
in the j-th column of the profile matrix.
23-
Of course, there may be more than one most common symbol,
24-
leading to multiple possible consensus strings.
25-
26-
### DNA Strings
27-
A T C C A G C T
28-
G G G C A A C T
29-
A T G G A T C T
30-
A A G C A A C C
31-
T T G G A A C T
32-
A T G C C A T T
33-
A T G G C A C T
34-
35-
### Profile
36-
37-
A 5 1 0 0 5 5 0 0
38-
C 0 0 1 4 2 0 6 1
39-
G 1 1 6 3 0 1 0 0
40-
T 1 5 0 0 0 1 1 6
41-
42-
Consensus A T G C A A C T
43-
44-
Given:
45-
A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
46-
47-
Return:
48-
A consensus string and profile matrix for the collection.
49-
(If several possible consensus strings exist,
50-
then you may return any one of them.)
51-
52-
Sample Dataset
53-
>Rosalind_1
54-
ATCCAGCT
55-
>Rosalind_2
56-
GGGCAACT
57-
>Rosalind_3
58-
ATGGATCT
59-
>Rosalind_4
60-
AAGCAACC
61-
>Rosalind_5
62-
TTGGAACT
63-
>Rosalind_6
64-
ATGCCATT
65-
>Rosalind_7
66-
ATGGCACT
67-
68-
Sample Output
69-
ATGCAACT
70-
A: 5 1 0 0 5 5 0 0
71-
C: 0 0 1 4 2 0 6 1
72-
G: 1 1 6 3 0 1 0 0
73-
T: 1 5 0 0 0 1 1 6
12+
> **The Problem**
13+
>
14+
> A matrix is a rectangular table of values divided into rows and columns.
15+
> An m×n matrix has m rows and n columns.
16+
> Given a matrix A, we write Ai,j.
17+
> to indicate the value found at the intersection of row i and column j.
18+
19+
> Say that we have a collection of DNA strings,
20+
> all having the same length n.
21+
> Their profile matrix is a 4×n matrix P in which P1,
22+
> j represents the number of times that 'A' occurs in the jth position of one of the strings,
23+
> P2,j represents the number of times that C occurs in the jth position,
24+
> and so on (see below).
25+
26+
> A consensus string c is a string of length n
27+
> formed from our collection by taking the most common symbol at each position;
28+
> the jth symbol of c therefore corresponds to the symbol having the maximum value
29+
> in the j-th column of the profile matrix.
30+
> Of course, there may be more than one most common symbol,
31+
> leading to multiple possible consensus strings.
32+
>
33+
> ### DNA Strings
34+
> ```
35+
> A T C C A G C T
36+
> G G G C A A C T
37+
> A T G G A T C T
38+
> A A G C A A C C
39+
> T T G G A A C T
40+
> A T G C C A T T
41+
> A T G G C A C T
42+
> ```
43+
>
44+
> ### Profile
45+
> ```
46+
> A 5 1 0 0 5 5 0 0
47+
> C 0 0 1 4 2 0 6 1
48+
> G 1 1 6 3 0 1 0 0
49+
> T 1 5 0 0 0 1 1 6
50+
> ```
51+
>
52+
> ### Consensus
53+
> ```A T G C A A C T```
54+
>
55+
> **Given:**
56+
> A collection of at most 10 DNA strings of equal length (at most 1 kbp) in FASTA format.
57+
>
58+
> **Return:**
59+
> A consensus string and profile matrix for the collection.
60+
> (If several possible consensus strings exist,
61+
> then you may return any one of them.)
62+
>
63+
> **Sample Dataset***
64+
>
65+
> ```
66+
> >Rosalind_1
67+
> ATCCAGCT
68+
> >Rosalind_2
69+
> GGGCAACT
70+
> >Rosalind_3
71+
> ATGGATCT
72+
> >Rosalind_4
73+
> AAGCAACC
74+
> >Rosalind_5
75+
> TTGGAACT
76+
> >Rosalind_6
77+
> ATGCCATT
78+
> >Rosalind_7
79+
> ATGGCACT
80+
> ```
81+
>
82+
> **Sample Output**
83+
> ```
84+
> ATGCAACT
85+
> A: 5 1 0 0 5 5 0 0
86+
> C: 0 0 1 4 2 0 6 1
87+
> G: 1 1 6 3 0 1 0 0
88+
> T: 1 5 0 0 0 1 1 6
89+
> ```
7490
7591
7692
The first thing we will need to do is read in the input fasta.

0 commit comments

Comments
 (0)