Prediction of protein secondary structure in BioLabDonkey

Prediction of protein secondary structures as sequence pattern search The secondary structures sequences can be extracted from PDB files and the patterns generated from these sequences can be used for the match search in a target sequence.

Problem with the prediction using patterns The protein patterns database is not feasible to generate and to use because of too many pattern variants. For alpha helix pattern of 8 aa long (about two turns) the maximal number of pattern variants is 8 in power of 20 (10 in power of 18).

Solution to the problem of high number of patterns Amino acids can be grouped according to their physicochemical properties – hydrophobicity, negative/ positive charge and etc. 20 amino acids can be split into 8 groups and for 8 aa alpha helix pattern of new code the maximal number of variants is 16777216 (8 in power of 8). The real number of alpha helix patterns should be smaller than that. The following grouping was used in BioLabDonkey Version 1.0:

Standard codeNew code
1. V, I, L W, F, C– very hydrophobic –W
2. A, M – less hydrophobic –V
3. N, Q, S, T, Y – polar neutral – O
4. D, E– negatively charged –N
5. K, R– positively charged – P
6. HB
7. GF
8. PS

For this grouping the prediction had more false positive results. The better outcome is seen for the following grouping implemented in the update of BioLabDonkey, in Version 1.1 (from 23.10.2019):

Standard codeNew code
1. V, I, L, F, C, M– hydrophobic –W
2. AV
3. S, T – polar, hydroxylic – O
4. D, E– negatively charged –N
5. K, R– positively charged – P
6. HB
7. GF
8. PS
9. W, Y – polar, aromatic – A
10. N, Q -polar, acidic – Q

Generation of database for alpha helices, beta strands and turns from PDB Alpha helix . The minimal pattern size for alpha helix was set to 8 aa – about two turns. The octamer is considered as a minimum for a stable alpha helix. The octamer patterns were extracted from alpha helix regions in PDB. Beta strand. The beta strand sequences were taken as they are present in PDB. Turn. The turn patterns were set as the sequences of 4 aa long, including glycine or proline.

The database was generated from the following organisms: Saccharomyces cerevisiae,
Helicobacter pylori,
Klebsiella pneumoniae,
E.coli,
Mycobacterium tuberculosis,
Pseudomonas aeruginosa, Salmonela typhimurium,
Staphylococcus aureus,
Streptococcus pneumoniae,
Vibrio cholerae,
Bacillus subtilis,
Homo sapiens

The number of generated patterns in BioLabDonkey database Version 1.1 (from 23.10.2019): alpha helix – 200155 , beta strand – 26054

From the mechanism of cotranslational folding, as a helix formation can happen inside the ribosomal exit tunnel, before the beta sheets, the alpha helices were searched first. The sequence regions not occupied by the alpha helices were searched for beta strands. For turn patterns the sequence regions free of alpha helices or beta strands were tested.

Evaluation of the prediction accuracy (random examples)

1. Comparison of the prediction with the secondary structure from pdb when the patterns database does not include the patterns from this pdb. The good prediction is expected to have as less as possible both false negative and the false positive secondary structures.

D-ornithine/D-lysine decarboxylase from Salmonella typhimurium

Xenopus laevis MHC I complex

2. Comparison of the prediction with the secondary structure from pdb when the patterns database include the patterns from this pdb. The good prediction is expected to have as less as possible the false positives secondary structures.

F41 fragment of flagellin of Salmonella typhimirium

3.Comparison of the prediction with the results of other algorithms – machine learning–based techniques.

Comparison with Jpred4 ” (no similarity to sequences with known PDB) for “NgrC protein” (from Providencia stuartii plasmid pTC2 )

Comparison with Jpred4 (no similarity to sequences with known PDB) for “hypothetical protein” (from Providenciastuartiiplasmid pTC2 )

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s