“open” button – opens files with extensions: .txt, .fasta for the table of sequences to be align; .aln, .blalignment to display the aligned sequences.
“save” button – saves the aligned sequences without names to files with extensions: .blalignment or .pdf, .eps.
“close” button – closes file and clean the window of aligned sequences.
“print” button – prints or saves the aligned sequences with names into pdf or eps file.
The blalignment file format is a text file format to open/save the aligned DNA/Protein sequences.
- Table of DNA or protein sequences to align
- Multiple alignment
- Phylogenetic tree
Table of DNA and protein sequences to align
To open a sequence file (.txt, .fasta) in the table use the “open” button. The file name will appear in the column “Name” and the sequence in the column “Sequence”
Use the “add” button to add an empty row. Then, paste a name and a sequence into the corresponding columns. To delete a row, select this row and press the “delete” button. The multiple rows can also be selected and deleted.
Multiple alignment can be performed for DNA or protein sequences using two custom algorithms (BioLabDonkey1,2), classical Needleman-Wunsch algorithm or “Translated DNA” algorithm.
In contrast to the Needleman-Wunsch algorithm, the custom algorithms provide correct alignment independent of the size of the sequences indels. These custom algorithms are faster than the Needleman-Wunsch algorithm for large sequences having high homology.
For the Needleman-Wunsch algorithm there are two options: the basic score system or similarity matrices for proteins (default).
The ORF sequences can be aligned using the “Translated DNA” algorithm. For this, set an open reading frame start in the corresponding popup menu (frame 1, 2 or 3). The algorithm translates the ORF sequences, aligns the protein sequences and then, converts back the aligned sequences to DNA sequences.
To align protein sequences using BioLabDonkey1,2 algorithms set the “AA substitutions” checkbox and the minimal identity in corresponding popup menu.For the Needleman-Wunsch algorithm set the basic score system or similarity matrices. Then, press the “align” button.
To change the color scheme of aligned sequences use the corresponding protein or DNA color schemes popup menus.
The part of aligned sequences in a block can be realigned using different algorithms. Set the algorithm for the realignment in the corresponding popup menu. Then, select a part of any line in the block and press the “realign” button. The realignment will not be performed if the selection is not a part of a line in one block.
The sequences similarity can be represented as a phylogenetic tree. Press the “Phylogenetic tree” button to generate and see the phylogenetic tree. The tree type can be set as UPGMA or the custom “Similarity tree”. The tree can be shifted to the left/right side using the “tree shift” slider. The scale of the tree can be changed using the “tree fold” slider.
The values of pairwise identity distances between the sequence pairs are calculated as following:
N = ((Ni / Nt1) + (Ni / Nt2)) / 2
Ni – number of identical aa/bp between two sequences
Nt1 – total number of aa/bp in sequence 1
Nt2 – total number of aa/bp in sequence 2
Distance = 1 / N
The pairwise identity distances can be recalculated for different identity minimums (set the “minimal identity” popup menu), as well as by taking into account amino acids substitutions (set the “AA substitutions” checkbox).
The length of a branche between two points (two connected orange circles) represents the identity distance between two sequences. For each connected sequence pairs in the tree the distance between the sequences is the smallest distance (highest identity) in comparison to the distances of these sequences to all other sequences.