Please use this identifier to cite or link to this item: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/9868
Title: Database Construction and Machine Learning Approach to Interogate the Microbiome for Different Diseases
Authors: Nadia
Gandotra, Ekta [Guided by]
Kumar, Narendra [Guided by]
Keywords: Microbiome
Metagenomics
Cancer
Machine learning
Issue Date: 2023
Publisher: Jaypee University of Information Technology, Solan, H.P.
Abstract: The microbiome impacts many physiological functions, including homeostasis, inflammation, and other biochemical process. Dysbiosis (imbalance of friendly and pathogenic bacteria) of the microbiome thus has a variety of impacts on different pathways, possibly causing cancer. Human bodies are continually filled with transient and resident microbial cells and their by-products, including potentially harmful metabolites. According to recent studies, the microbiome may play a role in many diseases. Every person has a unique microbiome, which is influenced by their living conditions, dietary preferences, and environmental factors. It is vital that the microbiome dataset for the diseases be extended. Intestinal tissue healing and innate immunity depend on the nucleotide-binding domain-containing leucine-rich repeat-containing proteins (NLR protein). Most recently, it was incorporated into the group of innate immunity effector molecules. It is the largest family of proteins that helps regulate intestinal microbiota. It is crucial to the health of the gut microbiota and has recently been linked to the emergence of colitis-associated cancer (CAC) and ulcerative colitis (UC). Although these proteins played a key role in several cellular processes, despite the fact that the NLR proteins family is not well characterized, very few of these family proteins have been identified through experimental validation. Concerning these research gaps, the proposed thesis work has been conducted and the objectives are defined in the three different chapters (Chapters 2, 3 & 4). In the first objective, we developed a comprehensive microbiome dataset named Human OncoBiome Database (HOBD) that has data on various malignancies (Liver Cancer, Oral Cancer, Colorectal Cancer, and Breast Cancer). The HOBD has all the bacterial information with its taxonomic classification and other information involved in several malignancies. The database provides an attractive and easy-to-use Graphical user interface (GUI) so that any user can download the data respective to the species concerning disease information. The database is freely available (http://www.juit.ac.in/hcmd/home), so worldwide users can download the data and utilize it for research purposes. In the second objective, we have developed a skin disease database called Human Skin Microbiome Database (HSMD). The database has comprehensive information on several diseases such as Atopic Dermatitis, Acne, Leprosy, Eczema, Rosacea and Psoriasis. It provides full information on disease-related bacteria with its taxonomic, genomic classification and phylogenetic analysis. It is manually curated data with an interactive GUI where the user can search the query and find the result. The users can also download all the disease-related data in a single file. The database is freely available atwww.bioinfodermatome.com. As NLR protein is the largest protein family that helps to regulate intestinal microbiota. And till now, only 22 in humans and 34 in mice are known. Hence in the third objective, we have developed a machine learning-based method to classify the NLR vs. non-NLR proteins. We have retrieved the NLR and non NLR proteins from the NCBI and Uniprot database, which has 390 NLR proteins. Then we calculated several sequence-based features using ProtR and evolutionary-based features using the POSSUM package. Then various ML-based methods such as LIBSVM, SMO, and Random Forest algorithms with 5-fold cross-validation were used for the model prediction. Then after training and testing, we selected the ML model using RF as the best classifier for Amino Acid Composition (AAC) and PSE PSSM-based model, respectively, accuracy rates of 90.91% and 93.94%. Finally, this model is proposed to the scientific community so they classify the NLR proteins and further use them for their research purpose. In this research work, we have focused on the microbiome analysis and provided two comprehensive databases related to cancer and skin diseases. A ML based method has also been developed which can distinguish between NLR and non-NLR proteins and can classify potential NLR proteins from their amino acid sequences.
Description: Enrolment No. 176501 [PDH0267]
URI: http://ir.juit.ac.in:8080/jspui/jspui/handle/123456789/9868
Appears in Collections:Ph.D. Theses

Files in This Item:
File Description SizeFormat 
PHDT_NADIA_176501_BI_2023.pdf
  Restricted Access
9.65 MBAdobe PDFView/Open Request a copy


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.