Skip to content

Commit 3fd96b9

Browse files
add rough draft alignments
1 parent c0207c3 commit 3fd96b9

1 file changed

Lines changed: 75 additions & 0 deletions

File tree

cookbook/02-alignments.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
+++
2+
title = "Pairwise alignment"
3+
rss_descr = "Align a gene against a reference genome using BioAlignments.jl"
4+
+++
5+
6+
# Pairwise Alignment
7+
8+
On the most basic level, aligners take two sequences and use algorithms to try and "line them up"
9+
and look for regions of similarity.
10+
11+
Pairwise alignment differs from multiple sequence alignment (MSA) because.
12+
it only aligns two sequences, while MSA's align three or more.
13+
In a pairwise alignment, there is one reference sequence, and one query sequence,
14+
though this may not always be specified by the user.
15+
16+
17+
### Running the Alignment
18+
There are two main parameters for determining how we wish to perform our alignment:
19+
the alignment type and score/cost model.
20+
21+
The alignment type specifies the alignment range (is the alignment local or global?)
22+
and the score/cost model explains how to score insertions and deletions.
23+
24+
#### Alignment Types
25+
Currently, four types of alignments are supported:
26+
- GlobalAlignment: global-to-global alignment
27+
- Aligns sequences end-to-end
28+
- Best for sequences that are already very similar
29+
- SemiGlobalAlignment: local-to-global alignment
30+
- a modification of global alignment that allows the user to specify that gaps will be penalty-free at the beginning of one of the sequences and/or at the end of one of the sequences (more information can be found [here](https://www.cs.cmu.edu/~durand/03-711/2023/Lectures/20231001_semi-global.pdf)).
31+
- LocalAlignment: local-to-local alignment
32+
- Identifies high-similarity, conserved sub-regions within divergent sequences
33+
- Can occur anywhere in the alignment matrix
34+
- OverlapAlignment: end-free alignment
35+
- a modification of global alignment where gaps at the beginning or end of sequences are permitted
36+
37+
Alignment type can also be a distance of two sequences:
38+
- EditDistance
39+
- LevenshteinDistance
40+
- HammingDistance
41+
42+
The alignment type should be selected based on what is already known about the sequences the user is comparing
43+
(Are they very similar and we're looking for a couple of small differences?
44+
Are we expecting the query to be a nearly exact match within the reference?).
45+
and what you may be optimizing for
46+
(Speed for a quick and dirty analysis?
47+
Or do we want to use more resources to do a fine-grained comparison?).
48+
49+
Now that we have a good understanding of how `pairalign` works,
50+
51+
```julia
52+
res = pairalign(GlobalAlignment(), s1, s2, scoremodel) # run pairwise alignment
53+
54+
```
55+
56+
57+
### Understanding how alignments are represented
58+
The output of an alignment is a series of `AlignmentAnchor` objects.
59+
This data structure gives information on the position of the start of the alignment,
60+
sections where nucleotides match, as well as where there may be deletions or insertions.
61+
62+
Below is an example Alignment:
63+
```julia
64+
julia> Alignment([
65+
AlignmentAnchor(0, 4, 0, OP_START),
66+
AlignmentAnchor(4, 8, 4, OP_MATCH),
67+
AlignmentAnchor(4, 12, 8, OP_DELETE)
68+
])
69+
```
70+
In this example, the alignment starts at the 0 position for the query sequence and 4th position for the reference sequence.
71+
The next nucleotides are a match in the query and reference sequence.
72+
The last 8 nucleotides in the alignment are missing/deleted in the query sequence.
73+
74+
To understand more about the output of the alignment created using BioAlignments.jl,
75+
more information can be found [here](https://biojulia.dev/BioAlignments.jl/stable/alignments/).

0 commit comments

Comments
 (0)