Please use this identifier to cite or link to this item:
Title: Machine Learning Based Prediction and Computational Investigations on Protein Methyltransferases involved in Human Malignancies
Authors: Yadav, Arvind Kumar
Singh, Tiratha Raj [Guided by]
Gupta, Pradeep Kumar [Guided by]
Keywords: Bioinformatics
Computational biology
Machine learning
Issue Date: 2022
Publisher: Jaypee University of Information Technology, Solan, H.P.
Abstract: This study is focused on the prediction and association of the protein methyltransferases (PMTs)in malignancies. PMTs are the groups of enzymes that help to catalyze the transfer of a methyl group from universal methyl donor S-adenosyl-L-methionine (SAM)to its substrates. This group of enzymes plays a significant role in the epigenetic regulation of gene expression through the methylation of various substrates. PMTs target the lysine or arginine residues for the methylation of their protein substrates. Based on methylation activity it is divided into two major classes such as protein lysine methyltransferases (PKMTs) and protein arginine methyltransferases (PKMTs).Over the years, protein methylation has appeared as an important post-translational modification (PTM) event and is involved in various cellular processes. Dysregulation of methyltransferases is involved in different types of human cancers. However, in light of the well-recognized significance of PMTs, it becomes crucial to have reliable and fast methods for identifying these proteins. In this thesis, a machine-learning-based method was developed for the identification of PMTs.Various sequence-based features were calculated and model training was performed by using several machine-learning algorithms. A ten-fold cross-validation method was applied to train the models. The proposed SVM-based CKSAAP model was identified as the best model for the prediction of PMTs. The best model achieved the highest accuracy of 87.94% with balance Sensitivity (88.8%) and Specificity (87.11%) with MCC of 0.759 and AUROC of 0.945. The best model was implemented in standalone software of PMTPred that will facilitate to predict PMTs. In the recent decade, protein lysine methylation events got more attention from researchers globally. SMYD2 is a protein of the SMYD (SET and MYND domain) family having lysine methyltransferase activity that methylates both histones and non-histones proteins. Numerous tumor suppressor non-histone proteins such as p53, RB1, ERα, and PTEN get methylate through SMYD2 and lead the cancer formation.The emerging evidence supports the association of SMYD2 in the progression of cancers but remains mostly unknown. Therefore further in this study, we computationally analyzed the potential association of SMYD2 in multiple tumors using TCGA data. The results elucidated that a higher expression of SMYD2 was present in tumor tissues as compared to normal tissues in most cancers. A significant association was observed between the SMYD2 gene expressions and the survival of cancer patients. The prognostic analysis showed a strong association of SMYD2 with cancers. We detected 15 missenses, 4 truncating mutations, and 5 others. Gene ontological properties and pathways were found to be significantly linked to the development of cancer. These data-driven results provide a relatively comprehensive insight into the understanding of the association of SMYD2 with cancer patients and its correlation with prognosis. There are many mutations in SMYD2, and some of them are thought to have a significant impact on the enzymatic activity of methyltransferase. Missense mutations are single amino acid point mutations that can have a variety of effects depending on the mutation site and the consequent amino acid substitution. So here, we assessed the nsSNPs in SMYD2 and investigated their structural and functional consequences using a rigorous computational method. Out of the 264 nsSNPs, three nsSNPs (H207D, C209W, and C209R) have the most deleterious impact. According to a molecular dynamics simulation (MDS) study, these mutations have a greater effect on the SMYD2 protein structure and function. Furthermore, SMYD2-specific inhibitors were identified to design targeted therapy strategies. A total of 98071 small natural compounds were taken and virtual screening was performed with SMYD2 as a target protein. Based on the binding energy cut-off of >= -11.7 Kcal.mol-1 total of 391 potential compounds were selected for ADMET analysis. Based on ADMET parameters, nine compounds were selected that fit the drug-likeness standard for docking analysis. Finally, three compounds (ZINC03844862, ZINC08490711, and ZINC08764231) were selected as probable inhibitors based on docking score and interaction analysis between protein and ligands. Then 100 ns MDS study was performed and the results revealed that selected compounds with the SMYD2 structure had a stable binding. Finally, these three compounds were identified as a potential lead compounds. The in vivo and in vitro research for these compounds could be viable leads for developing cancer treatments. Thus, from this thesis work we concluded that the developed prediction method, identified novel SMYD2-specific nsSNPs and ligands would be useful for improved understanding and for aiding better cancer therapeutics.
Description: PHD0257, Enrollment No. 176502
Appears in Collections:Ph.D. Theses

Files in This Item:
File Description SizeFormat 
PHD0257_ARVIND KUMAR YADAV_176502_BI_2022.pdf
  Restricted Access
11.63 MBAdobe PDFView/Open Request a copy

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.