I have developed a new abstract strategy game – PathFolding. If you have time and would like to train your spatial logic skills, challenge your attention and visual memory play it. There are two versions, for Mac – PathFoldingPro, and for iPhone/iPad – PathFolding. Here are two direct links
Prediction of protein secondary structures as sequence pattern search The secondary structures sequences can be extracted from PDB files and the patterns generated from these sequences can be used for the match search in a target sequence.
Problem with the prediction using patterns The protein patterns database is not feasible to generate and to use because of too many pattern variants. For alpha helix pattern of 8 aa long (about two turns) the maximal number of pattern variants is 8 in power of 20 (10 in power of 18).
Solution to the problem of high number of patterns Amino acids can be grouped according to their physicochemical properties – hydrophobicity, negative/ positive charge and etc. 20 amino acids can be split into 8 groups and for 8 aa alpha helix pattern of new code the maximal number of variants is 16777216 (8 in power of 8). The real number of alpha helix patterns should be smaller than that. The following grouping was used in BioLabDonkey Version 1.0:
Standard code
New code
1. V, I, LW, F, C
– very hydrophobic –
W
2. A, M
– less hydrophobic –
V
3. N, Q, S, T, Y
– polar neutral –
O
4. D, E
– negatively charged –
N
5. K, R
– positively charged –
P
6. H
–
B
7. G
–
F
8. P
–
S
For this grouping the prediction had more false positive results. The better outcome is seen for the following grouping implemented in the update of BioLabDonkey, in Version 1.1 (from 23.10.2019):
Standard code
New code
1. V, I, L, F, C, M
– hydrophobic –
W
2. A
–
V
3. S, T
– polar, hydroxylic –
O
4. D, E
– negatively charged –
N
5. K, R
– positively charged –
P
6. H
–
B
7. G
–
F
8. P
–
S
9. W, Y
– polar, aromatic –
A
10. N, Q
-polar, acidic –
Q
Generation of database for alpha helices, beta strands and turns from PDBAlpha helix . The minimal pattern size for alpha helix was set to 8 aa – about two turns. The octamer is considered as a minimum for a stable alpha helix. The octamer patterns were extracted from alpha helix regions in PDB. Beta strand. The beta strand sequences were taken as they are present in PDB. Turn. The turn patterns were set as the sequences of 4 aa long, including glycine or proline.
The database was generated from the following organisms: Saccharomyces cerevisiae, Helicobacter pylori, Klebsiella pneumoniae, E.coli, Mycobacterium tuberculosis, Pseudomonas aeruginosa, Salmonela typhimurium, Staphylococcus aureus, Streptococcus pneumoniae, Vibrio cholerae, Bacillus subtilis, Homo sapiens
The number of generated patterns in BioLabDonkey database Version 1.1 (from 23.10.2019): alpha helix – 200155 , beta strand – 26054
From the mechanism of cotranslational folding, as a helix formation can happen inside the ribosomal exit tunnel, before the beta sheets, the alpha helices were searched first. The sequence regions not occupied by the alpha helices were searched for beta strands. For turn patterns the sequence regions free of alpha helices or beta strands were tested.
Evaluation of the prediction accuracy (random examples)
1. Comparison of the prediction with the secondary structure from pdb when the patterns database does not include the patterns from this pdb. The good prediction is expected to have as less as possible both false negative and the false positive secondary structures.
D-ornithine/D-lysine decarboxylase from Salmonella typhimurium
Xenopus laevis MHC I complex
2. Comparison of the prediction with the secondary structure from pdb when the patterns database include the patterns from this pdb. The good prediction is expected to have as less as possible the false positives secondary structures.
F41 fragment of flagellin of Salmonella typhimirium
3.Comparison of the prediction with the results of other algorithms – machine learning–based techniques.
Comparison with Jpred4 ” (no similarity to sequences with known PDB) for “NgrC protein” (from Providencia stuartii plasmid pTC2 )
Comparison with Jpred4 (no similarity to sequences with known PDB) for “hypothetical protein” (from Providenciastuartiiplasmid pTC2 )
Collection and Use of Personal Information No personal information (data) that can be used to identify or contact a single person is collected.
Collection and Use of Non-Personal Information No non-personal information (data) is collected. User can generate database txt files for DNA features annotation and protein secondary structure prediction. These files are stored inside the program and can be exported/imported by user.
Cookies and Other Technologies No cookies and other technologies are collected.
Disclosure to Third Parties and Service Providers There is no disclosure to third parties and service providers
The Existence of Automated Profiling The program does not have anything for user profiling.