Main functions of BioLabDobkey

The main functions (part3) of DNA analysis of BioLabDonkey version 5.19 (BioLabDNA version 1.19).

The main functions (part2) of DNA analysis of BioLabDonkey version 5.19 (BioLabDNA version 1.19).

The main functions (part1) of DNA analysis of BioLabDonkey version 5.19 (BioLabDNA version 1.19).

The automatic and manual DNA cloning in BioLabDonkey version 5.19.

Assembly

Important. The assembly can be efficiently performed for up to several hundreds reads, not more!

“open” button – opens files with extensions: .ab1, .abi, .txt.
“save” button – saves the selected contig consensus to file with .txt extension.
“delete” button – delete the selected read files in the files table.
“clear” button – delete all read files in the files table.
“print” button – use this button to print the assembly graphic map or to save it into pdf or eps file.

  • Reads files table
  • Assembly – graphic map, assembled reads, consensus and coverage

Reads files table
Press the “open” button to open a file or multiple read files (.txt, .abi, .abi). The file names will appear in the column “File Name”. To delete the files select the rows and press the “delete” button. To clear the files table press the “clear” button.

Assembly – graphic map, assembled reads, consensus and coverage
To start assembly select the minimal reads overlap value, then press the “assemble” button. Play with this value to get an appropriate result. To cancel assembly press the “cancel” button.
The assembly result will appear as a set of contigs in the contigs table. Select a contig in the table to see the assembly map. In the map each arrow represents a sequence file. The green arrow corresponds to an original file sequence, and the red arrow represents a reverse complement sequence. Click on an arrow to see the corresponding file in the files table.

For .ab1, .abi files a double click on the selected file row opens a new window with ABI chromatogram. 
If the contig length is less than 3000 bp the reads sequences and consensus are displayed. For longer contigs click and drag on the assembly map to see the reads sequences and consensus in the corresponding region. To shift the view frame to the left or right side use the “<” and “>” buttons. To return to whole contig view use the “return” button.

Below the assembly map, each vertical red/green bars represents an assembly coverage for each nucleotide. The red bars mean nucleotide variations. On the assembly map these variations are displayed as vertical dash red lines. Place the mouse pointer over a bar to see the coverage number and the nucleotide variation. This functionality is available when the contig frame is less than 3000 bp. 
Press the “save” button to save a contig consensus sequence to .txt file. To open a consensus sequence in the “DNA1”, “DNA2” or “Alignment” Tab press corresponding buttons. 

Prediction of protein secondary structure in BioLabDonkey

Prediction of protein secondary structures as sequence pattern search The secondary structures sequences can be extracted from PDB files and the patterns generated from these sequences can be used for the match search in a target sequence.

Problem with the prediction using patterns The protein patterns database is not feasible to generate and to use because of too many pattern variants. For alpha helix pattern of 8 aa long (about two turns) the maximal number of pattern variants is 8 in power of 20 (10 in power of 18).

Solution to the problem of high number of patterns Amino acids can be grouped according to their physicochemical properties – hydrophobicity, negative/ positive charge and etc. 20 amino acids can be split into 8 groups and for 8 aa alpha helix pattern of new code the maximal number of variants is 16777216 (8 in power of 8). The real number of alpha helix patterns should be smaller than that. The following grouping was used in BioLabDonkey Version 1.0:

Standard codeNew code
1. V, I, L W, F, C– very hydrophobic –W
2. A, M – less hydrophobic –V
3. N, Q, S, T, Y – polar neutral – O
4. D, E– negatively charged –N
5. K, R– positively charged – P
6. HB
7. GF
8. PS

For this grouping the prediction had more false positive results. The better outcome is seen for the following grouping implemented in the update of BioLabDonkey, in Version 1.1 (from 23.10.2019):

Standard codeNew code
1. V, I, L, F, C, M– hydrophobic –W
2. AV
3. S, T – polar, hydroxylic – O
4. D, E– negatively charged –N
5. K, R– positively charged – P
6. HB
7. GF
8. PS
9. W, Y – polar, aromatic – A
10. N, Q -polar, acidic – Q

Generation of database for alpha helices, beta strands and turns from PDB Alpha helix . The minimal pattern size for alpha helix was set to 8 aa – about two turns. The octamer is considered as a minimum for a stable alpha helix. The octamer patterns were extracted from alpha helix regions in PDB. Beta strand. The beta strand sequences were taken as they are present in PDB. Turn. The turn patterns were set as the sequences of 4 aa long, including glycine or proline.

The database was generated from the following organisms: Saccharomyces cerevisiae,
Helicobacter pylori,
Klebsiella pneumoniae,
E.coli,
Mycobacterium tuberculosis,
Pseudomonas aeruginosa, Salmonela typhimurium,
Staphylococcus aureus,
Streptococcus pneumoniae,
Vibrio cholerae,
Bacillus subtilis,
Homo sapiens

The number of generated patterns in BioLabDonkey database Version 1.1 (from 23.10.2019): alpha helix – 200155 , beta strand – 26054

From the mechanism of cotranslational folding, as a helix formation can happen inside the ribosomal exit tunnel, before the beta sheets, the alpha helices were searched first. The sequence regions not occupied by the alpha helices were searched for beta strands. For turn patterns the sequence regions free of alpha helices or beta strands were tested.

Evaluation of the prediction accuracy (random examples)

1. Comparison of the prediction with the secondary structure from pdb when the patterns database does not include the patterns from this pdb. The good prediction is expected to have as less as possible both false negative and the false positive secondary structures.

D-ornithine/D-lysine decarboxylase from Salmonella typhimurium

Xenopus laevis MHC I complex

2. Comparison of the prediction with the secondary structure from pdb when the patterns database include the patterns from this pdb. The good prediction is expected to have as less as possible the false positives secondary structures.

F41 fragment of flagellin of Salmonella typhimirium

3.Comparison of the prediction with the results of other algorithms – machine learning–based techniques.

Comparison with Jpred4 ” (no similarity to sequences with known PDB) for “NgrC protein” (from Providencia stuartii plasmid pTC2 )

Comparison with Jpred4 (no similarity to sequences with known PDB) for “hypothetical protein” (from Providenciastuartiiplasmid pTC2 )

Privacy Policy

Collection and Use of Personal Information
No personal information (data) that can be used to identify or contact a single person is collected.

Collection and Use of Non-Personal Information
No non-personal information (data) is collected. User can generate database txt files for DNA features annotation and protein secondary structure prediction. These files are stored inside the program and can be exported/imported by user.

Cookies and Other Technologies
No cookies and other technologies are collected.

Disclosure to Third Parties and Service Providers
There is no disclosure to third parties and service providers

The Existence of Automated Profiling
The program does not have anything for user profiling.

Third‑Party Sites and Services
The program contains links to the following public websites related to the DNA, RNA and protein analysis: https://www.ncbi.nlm.nih.gov/Taxonomy/Utils/wprintgc.cgi , http://www.insdc.org/documents/feature-table , https://blast.ncbi.nlm.nih.gov/Blast.cgi

.

Ⓒ 2019 BioLabDonkey, Valeriy Tarasov