Project 1. Development of a flagship large artificial intelligence and machine learning ready health research dataset characteristic of Tulane strength and Louisianan local populations.
We propose to collect and prepare a comprehensive artificial intelligence and machine learning-ready dataset containing exquisite imaging, clinical, epidemiological, social, and behavioral information as well as various omic profiles (genome, epigenome, transcriptome, microbiome, metabolome, single-cell transcriptome) in a large cohort (~17,000) of both sexes, various US ethnicities, particularly those underrepresented in biomedical and behavioral research, and under various health, physiological and pathophysiological conditions. This project will be anchored upon our experience and the extensive data and samples accumulated from our past P50 Specialized Center of Research grant (1P50AR055081), various R01 and U19 projects (U19AG055373) and the large Louisiana Osteoporosis Study cohort.
Project 2. Integration of multi-omics and non-omics factors for complex disease studies, with a training and mentoring component for future generations of interdisciplinary data scientists.
Integration of multi-omics and other non-omics factors has been promising but challenging. Results from our project R01GM109068 demonstrate that integrative analysis of these factors leads to improved risk gene detection and disease diagnosis. Therefore, we build on and significantly expand the previous project (R01GM109068) with significant progress. We will address the difficult challenges in the integration of multi-omics and non-omics factors and validate our approach with one of the most comprehensive databases from our Louisiana Osteoporosis Study and other center projects. Our ultimate goal is to facilitate a paradigm shift in multi-omics research – from linear to nonlinear, correlation to causal inference, and whole cell to cell-specific analyses – by incorporating the interactome into integration analyses of comprehensive omics and non-omic factors.
Project 3. Development of innovative proteomics, phosphoproteomics, and network analyses for identification of biomarkers for diseases mechanisms, diagnosis, prevention, and treatment.
High throughput mass spectrometry-based proteomics enables the comprehensive study of proteoforms (protein forms) and protein post-translational modifications. MS plays an essential role in elucidating proteoform functions, identifying proteomic biomarkers, and discovering drug targets for complex diseases. However, the complexity of MS data poses many computational challenges in its application in disease studies. Based on funded R01s (R01GM118470, R01CA247863, R01GM141123, R01GM124018), we propose to develop new algorithms and machine learning AI models for MS-based proteoform characterization and cell clustering as well as apply MS to study protein phosphorylation studies of complex diseases.
Project 4. Development of cutting-edge explainable multi-omics integration and biomarker identification artificial intelligence system via state-of-the-art reinforcement learning.
We aim to develop an explainable AI framework to first effectively integrate multiple omics with limited labels by capturing the intrinsic structures and cross-omics correlation, then identify important biomarkers from different omics data types through multi-agent reinforcement learning. We propose a multi-omics integration model in a semi-supervised scheme to provide conceptual interpretations, so that we provide consistent explanations in the inference stage for disease prediction, with novel state-of-the-art interpretable methods. We will further develop a deep reinforcement learning (RL) framework to effectively identify the best combination of biomarkers (as well as non-omics factors) associated with a clinical outcome of interest. By integrating sequential decision-making and the representation power of deep neural networks, deep RL can effectively explore the correlation among features, thus significantly reducing search time and improving predictive accuracy compared with traditional feature selection methods.
Project 5. Development of an artificial intelligence ready longitudinal data set of microbiome and metabolome in relationship to health and related clinic outcome.
Over 700 million adults worldwide have chronic kidney disease (CKD), at substantially increased risk of end stage kidney disease and cardiovascular disease (CVD) with gut microbiota and altered microbial metabolite byproducts playing a pivotal role. We will leverage existing data and samples from the NIH-funded Sodium Lowering and Urinary ProtEin Reduction trial, a recent longitudinal randomized clinical trial, and utilize metagenomic sequencing and mass spectrometry to comprehensively investigate gut microbiota and derived metabolites altered by diet (such as sodium intake). We will examine the role of diet-responsive microbiota and metabolites on adverse CKD outcomes using the rich resources available to us (including ~1,800 participants having metabolomic data with up to 18 years of follow-up for clinical outcomes). Our data demonstrate differences in patterns of microbial diversity, specific metabolites and genera with change in kidney function in response to diet (such as salt levels). The findings will be validated in African-Americans and other cohorts.