Conceptual Framework for Invariant Protein Fragment Library
Sapna V. M1, Roshan Makam2, Keshava M3, Sudhanva Narayna4

1Sapna V. M, Research Scholar, Assistant Professor, Department of Biotechnology, PES University, 100 Feet Ring Road, BSK 3rd Stage, Bangalore (Karnataka), India.

2Roshan Makam, Department of Biotechnology, PES University, 100 Feet Ring Road, BSK 3rd Stage, Bangalore (Karnataka), India.

3Keshava M, Department of Biotechnology, PES University, 100 Feet Ring Road, BSK 3rd Stage, Bangalore (Karnataka), India.

4Sudhanva Narayna, Department of Biotechnology, PES University, 100 Feet Ring Road, BSK 3rd Stage, Bangalore (Karnataka), India. 

Manuscript received on 04 December 2019 | Revised Manuscript received on 12 December 2019 | Manuscript Published on 31 December 2019 | PP: 240-247 | Volume-9 Issue-2S December 2019 | Retrieval Number: B10341292S19/2019©BEIESP | DOI: 10.35940/ijitee.B1034.1292S19

Open Access | Editorial and Publishing Policies | Cite | Mendeley | Indexing and Abstracting
© The Authors. Blue Eyes Intelligence Engineering and Sciences Publication (BEIESP). This is an open-access article under the CC-BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/)

Abstract: Proteins are essential and are present in all life forms and determining its structure is cumbersome, laborious and time consuming. Hence, over 3-4 decades, researchers have been using computational techniques such as template and template free based protein structure prediction from its sequence. This research focuses on developing a conceptual basis for establishing an invariant fragment library which can be used for protein structure prediction. Based on 20 amino acids, fragments can be classified into lengths of 3 to 41 size. Further, they can be classified based on the identical number of amino acids present in the fragment. This encompasses theoretically the number of fragments that can exist and in no way represent the actual possible fragments that can exist in nature. Invariant fragments are ones which are rigid in structure 3-dimensionally and do not change. A formula was arrived at to determine all possible permutations that can exist for length 3 to 41 based on the 20 amino acids. 100 proteins from the Protein Data Bank were downloaded, broken into fragments of 3 to 41 resulting in a total of 6102,102 fragments using Asynchronous Distributed Processing. Then identical fragments in sequence were superimposed and Root Mean Square Deviation (RMSD) values were obtained resulting in roughly 3.2% of the original framgnets.. t-score and z-scores were obtained from which Skewness, Kurtosis and Excess Kurtosis were determined. For invariance, skewness cutoff was set at + 0.1 and using the excess kurtosis, fragments whose distribution were either leptokurtic or platykurtic and were within + 1 standard deviation of the mean value were considered as invariant i.e., if there were no outliers in the distribution and if most of the t-score or z-score values were centered around its average value. Using these cutoff values, fragments were classified and deposited into an invariant fragment library. Roughly 3,81,799 invariant fragments were obtained which is roughly 6.3% of the total number of initial fragments. This would be way less than the number of fragments that one has to either use in homology or de-novo modelling thereby reducing the design space. Further work is underway to set up the entire invariant fragment library which can then be used to predict protein structure by template-based approach.

Keywords: Proteins, Fragments, Invariant, Library.
Scope of the Article: Patterns and Frameworks