Skip to content

Commit 4158491

Browse files
fix typos and grammar issues
1 parent fd8f16d commit 4158491

File tree

1 file changed

+13
-10
lines changed

1 file changed

+13
-10
lines changed

docs/src/rosalind/10-cons.md

Lines changed: 13 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -122,14 +122,14 @@ end
122122
records = parse_fasta(fake_file)
123123
```
124124

125-
Once the fasta is read in, we can iterate over each read and store its nucleotide sequence in a data matrix.
125+
Once the fasta is read in, we can iterate over each sequence/record and store its nucleotide sequence in a data matrix.
126126

127127
From there, we can generate the profile matrix.
128-
We'll need to sum the number of times each nucleotide appears at a particular row of the data matrix.
128+
We'll need to sum the number of times each nucleotide appears at a particular column of the data matrix.
129129

130130
Then, we can identify the most common nucleotide at each column of the data matrix,
131-
which represents each index of the consensus string.
132-
After we have done this for all columns of the data matrix,
131+
which represent each index of the consensus string.
132+
After doing this for all columns of the data matrix,
133133
we can generate the consensus string.
134134

135135

@@ -141,10 +141,10 @@ function consensus(fasta_string)
141141
# extract strings from fasta
142142
records = parse_fasta(fasta_string)
143143

144-
# make a vector of just strings
144+
# make a vector of sequence strings
145145
data_vector = last.(records)
146146

147-
# convert data_vector to matrix where each column is a char and each row is a string
147+
# convert data_vector to matrix where each column is a character position and each row is a string
148148
data_matrix = reduce(vcat, permutedims.(collect.(data_vector)))
149149

150150
# make profile matrix
@@ -160,7 +160,7 @@ function consensus(fasta_string)
160160
consensus_df = DataFrame(consensus_matrix, ["A", "C", "G", "T"])
161161

162162

163-
# make column with nucleotide with max value
163+
# make column with nucleotide with the max value
164164
# argmax returns the index or key of the first one encountered
165165
nuc_max_df = transform(consensus_df, AsTable(:) => ByRow(argmax) => :MaxColName)
166166

@@ -178,14 +178,17 @@ as some nucleotides may appear the same number of times
178178
in each column of the data matrix.
179179

180180
If this is the case,
181-
the function we are using (`argmax`) returns the nucleotide with the most occurences that it first encounters.
181+
the function we are using (`argmax`) returns the nucleotide with the most occurrences that it first encounters.
182182

183183
The way our function is written,
184184
we first scan for 'A', 'C', then 'G' and 'T',
185185
so the final consensus string will be biased towards more A's, then C's, G's and T's.
186-
This simply based on which nucleotide counts it will encounter first in the profile matrix.
186+
This is simply based on which nucleotide counts it will encounter first in the profile matrix.
187187

188-
In the example below, there are equal number of reads indicating that the consensus string could be either `AAAAAAAA` or `GGGGGGGG`. However, because our solution scans for `A` first, the consensus string returned will be `AAAAAAAA`.
188+
In the example below, there are equal number of sequences that are all `A`'s and `G`'s,
189+
so the consensus string could be either `AAAAAAAA` or `GGGGGGGG`.
190+
However, because our solution scans for `A` first,
191+
the consensus string returned will be `AAAAAAAA`.
189192

190193
```julia
191194
fake_file2 = IOBuffer("""

0 commit comments

Comments
 (0)