Search | VHL Regional Portal

1.

Comparative transcriptomic and phenotypic analysis of induced pluripotent stem cell hepatocyte-like cells and primary human hepatocytes.

Gandhi, Neeti; Wills, Lauren; Akers, Kyle; Su, Yiqi; Niccum, Parker; Murali, T M; Rajagopalan, Padmavathy.

Cell Tissue Res ; 396(1): 119-139, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38369646

ABSTRACT

Primary human hepatocytes (PHHs) are used extensively for in vitro liver cultures to study hepatic functions. However, limited availability and invasive retrieval prevent their widespread use. Induced pluripotent stem cells exhibit significant potential since they can be obtained non-invasively and differentiated into hepatic lineages, such as hepatocyte-like cells (iHLCs). However, there are concerns about their fetal phenotypic characteristics and their hepatic functions compared to PHHs in culture. Therefore, we performed an RNA-sequencing (RNA-seq) analysis to understand pathways that are either up- or downregulated in each cell type. Analysis of the RNA-seq data showed an upregulation in the bile secretion pathway where genes such as AQP9 and UGT1A1 were higher expressed in PHHs compared to iHLCs by 455- and 15-fold, respectively. Upon immunostaining, bile canaliculi were shown to be present in PHHs. The TCA cycle in PHHs was upregulated compared to iHLCs. Cellular analysis showed a 2-2.5-fold increase in normalized urea production in PHHs compared to iHLCs. In addition, drug metabolism pathways, including cytochrome P450 (CYP450) and UDP-glucuronosyltransferase enzymes, were upregulated in PHHs compared to iHLCs. Of note, CYP2E1 gene expression was significantly higher (21,810-fold) in PHHs. Acetaminophen and ethanol were administered to PHH and iHLC cultures to investigate differences in biotransformation. CYP450 activity of baseline and toxicant-treated samples was significantly higher in PHHs compared to iHLCs. Our analysis revealed that iHLCs have substantial differences from PHHs in critical hepatic functions. These results have highlighted the differences in gene expression and hepatic functions between PHHs and iHLCs to motivate future investigation.

Subject(s)

Induced Pluripotent Stem Cells , Humans , Induced Pluripotent Stem Cells/metabolism , Hepatocytes , Liver , Cell Differentiation , Gene Expression Profiling

2.

Predictive models of long COVID.

Antony, Blessy; Blau, Hannah; Casiraghi, Elena; Loomba, Johanna J; Callahan, Tiffany J; Laraway, Bryan J; Wilkins, Kenneth J; Antonescu, Corneliu C; Valentini, Giorgio; Williams, Andrew E; Robinson, Peter N; Reese, Justin T; Murali, T M.

EBioMedicine ; 96: 104777, 2023 Oct.

Article in English | MEDLINE | ID: mdl-37672869

ABSTRACT

BACKGROUND: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. METHODS: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). FINDINGS: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. INTERPRETATION: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. FUNDING: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.

Subject(s)

COVID-19 , Post-Acute COVID-19 Syndrome , Humans , COVID-19 Drug Treatment , Machine Learning , Obesity

3.

Computational Construction of Toxicant Signaling Networks.

Law, Jeffrey N; Orbach, Sophia M; Weston, Bronson R; Steele, Peter A; Rajagopalan, Padmavathy; Murali, T M.

Chem Res Toxicol ; 36(8): 1267-1277, 2023 08 21.

Article in English | MEDLINE | ID: mdl-37471124

ABSTRACT

Humans and animals are regularly exposed to compounds that may have adverse effects on health. The Toxicity Forecaster (ToxCast) program was developed to use high throughput screening assays to quickly screen chemicals by measuring their effects on many biological end points. Many of these assays test for effects on cellular receptors and transcription factors (TFs), under the assumption that a toxicant may perturb normal signaling pathways in the cell. We hypothesized that we could reconstruct the intermediate proteins in these pathways that may be directly or indirectly affected by the toxicant, potentially revealing important physiological processes not yet tested for many chemicals. We integrate data from ToxCast with a human protein interactome to build toxicant signaling networks that contain physical and signaling protein interactions that may be affected as a result of toxicant exposure. To build these networks, we developed the EdgeLinker algorithm, which efficiently finds short paths in the interactome that connect the receptors to TFs for each toxicant. We performed multiple evaluations and found evidence suggesting that these signaling networks capture biologically relevant effects of toxicants. To aid in dissemination and interpretation, interactive visualizations of these networks are available at http://graphspace.org.

Subject(s)

Drug-Related Side Effects and Adverse Reactions , High-Throughput Screening Assays , Animals , Humans , Algorithms , Signal Transduction

4.

Generalisable long COVID subtypes: findings from the NIH N3C and RECOVER programmes.

Reese, Justin T; Blau, Hannah; Casiraghi, Elena; Bergquist, Timothy; Loomba, Johanna J; Callahan, Tiffany J; Laraway, Bryan; Antonescu, Corneliu; Coleman, Ben; Gargano, Michael; Wilkins, Kenneth J; Cappelletti, Luca; Fontana, Tommaso; Ammar, Nariman; Antony, Blessy; Murali, T M; Caufield, J Harry; Karlebach, Guy; McMurry, Julie A; Williams, Andrew; Moffitt, Richard; Banerjee, Jineta; Solomonides, Anthony E; Davis, Hannah; Kostka, Kristin; Valentini, Giorgio; Sahner, David; Chute, Christopher G; Madlock-Brown, Charisse; Haendel, Melissa A; Robinson, Peter N.

EBioMedicine ; 87: 104413, 2023 Jan.

Article in English | MEDLINE | ID: mdl-36563487

ABSTRACT

BACKGROUND: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. METHODS: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. FINDINGS: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. INTERPRETATION: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. FUNDING: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.

Subject(s)

COVID-19 , Post-Acute COVID-19 Syndrome , Humans , Disease Progression , SARS-CoV-2

5.

Integrating multimodal data through interpretable heterogeneous ensembles.

Li, Yan Chak; Wang, Linhua; Law, Jeffrey N; Murali, T M; Pandey, Gaurav.

Bioinform Adv ; 2(1): vbac065, 2022.

Article in English | MEDLINE | ID: mdl-36158455

ABSTRACT

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability and implementation: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

6.

Integrating multimodal data through interpretable heterogeneous ensembles.

Li, Yan Chak; Wang, Linhua; Law, Jeffrey N; Murali, T M; Pandey, Gaurav.

bioRxiv ; 2022 Jul 25.

Article in English | MEDLINE | ID: mdl-35923321

ABSTRACT

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities, but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms, and uses effective heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data, and mortality due to COVID-19 from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen (BUN) and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration . Contact: gaurav.pandey@mssm.edu.

7.

Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs.

Reese, Justin T; Blau, Hannah; Bergquist, Timothy; Loomba, Johanna J; Callahan, Tiffany; Laraway, Bryan; Antonescu, Corneliu; Casiraghi, Elena; Coleman, Ben; Gargano, Michael; Wilkins, Kenneth J; Cappelletti, Luca; Fontana, Tommaso; Ammar, Nariman; Antony, Blessy; Murali, T M; Karlebach, Guy; McMurry, Julie A; Williams, Andrew; Moffitt, Richard; Banerjee, Jineta; Solomonides, Anthony E; Davis, Hannah; Kostka, Kristin; Valentini, Giorgio; Sahner, David; Chute, Christopher G; Madlock-Brown, Charisse; Haendel, Melissa A; Robinson, Peter N.

medRxiv ; 2022 Jul 20.

Article in English | MEDLINE | ID: mdl-35665012

ABSTRACT

Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

8.

Interpretable network propagation with application to expanding the repertoire of human proteins that interact with SARS-CoV-2.

Law, Jeffrey N; Akers, Kyle; Tasnina, Nure; Santina, Catherine M Della; Deutsch, Shay; Kshirsagar, Meghana; Klein-Seetharaman, Judith; Crovella, Mark; Rajagopalan, Padmavathy; Kasif, Simon; Murali, T M.

Gigascience ; 10(12)2021 12 29.

Article in English | MEDLINE | ID: mdl-34966926

ABSTRACT

BACKGROUND: Network propagation has been widely used for nearly 20 years to predict gene functions and phenotypes. Despite the popularity of this approach, little attention has been paid to the question of provenance tracing in this context, e.g., determining how much any experimental observation in the input contributes to the score of every prediction. RESULTS: We design a network propagation framework with 2 novel components and apply it to predict human proteins that directly or indirectly interact with SARS-CoV-2 proteins. First, we trace the provenance of each prediction to its experimentally validated sources, which in our case are human proteins experimentally determined to interact with viral proteins. Second, we design a technique that helps to reduce the manual adjustment of parameters by users. We find that for every top-ranking prediction, the highest contribution to its score arises from a direct neighbor in a human protein-protein interaction network. We further analyze these results to develop functional insights on SARS-CoV-2 that expand on known biology such as the connection between endoplasmic reticulum stress, HSPA5, and anti-clotting agents. CONCLUSIONS: We examine how our provenance-tracing method can be generalized to a broad class of network-based algorithms. We provide a useful resource for the SARS-CoV-2 community that implicates many previously undocumented proteins with putative functional relationships to viral infection. This resource includes potential drugs that can be opportunistically repositioned to target these proteins. We also discuss how our overall framework can be extended to other, newly emerging viruses.

Subject(s)

COVID-19 , SARS-CoV-2 , Algorithms , Humans , Protein Interaction Maps , Proteins/metabolism

9.

Modeling and analysis of the macronutrient signaling network in budding yeast.

Jalihal, Amogh P; Kraikivski, Pavel; Murali, T M; Tyson, John J.

Mol Biol Cell ; 32(21): ar20, 2021 11 01.

Article in English | MEDLINE | ID: mdl-34495680

ABSTRACT

Adaptive modulation of the global cellular growth state of unicellular organisms is crucial for their survival in fluctuating nutrient environments. Because these organisms must be able to respond reliably to ever varying and unpredictable nutritional conditions, their nutrient signaling networks must have a certain inbuilt robustness. In eukaryotes, such as the budding yeast Saccharomyces cerevisiae, distinct nutrient signals are relayed by specific plasma membrane receptors to signal transduction pathways that are interconnected in complex information-processing networks, which have been well characterized. However, the complexity of the signaling network confounds the interpretation of the overall regulatory "logic" of the control system. Here, we propose a literature-curated molecular mechanism of the integrated nutrient signaling network in budding yeast, focusing on early temporal responses to carbon and nitrogen signaling. We build a computational model of this network to reconcile literature-curated quantitative experimental data with our proposed molecular mechanism. We evaluate the robustness of our estimates of the model's kinetic parameter values. We test the model by comparing predictions made in mutant strains with qualitative experimental observations made in the same strains. Finally, we use the model to predict nutrient-responsive transcription factor activities in a number of mutant strains undergoing complex nutrient shifts.

Subject(s)

Eating/physiology , Nutrients/metabolism , Saccharomyces cerevisiae/metabolism , Carrier Proteins/metabolism , Cell Cycle/physiology , Computational Biology/methods , Gene Expression/genetics , Gene Expression Regulation, Fungal/genetics , Models, Theoretical , Nitrogen/metabolism , Saccharomyces cerevisiae Proteins/metabolism , Signal Transduction/physiology , Transcription Factors/metabolism , Transcriptome/genetics

10.

Protein sequence models for prediction and comparative analysis of the SARS-CoV-2 -human interactome.

Kshirsagar, Meghana; Tasnina, Nure; Ward, Michael D; Law, Jeffrey N; Murali, T M; Lavista Ferres, Juan M; Bowman, Gregory R; Klein-Seetharaman, Judith.

Pac Symp Biocomput ; 26: 154-165, 2021.

Article in English | MEDLINE | ID: mdl-33691013

ABSTRACT

Viruses such as the novel coronavirus, SARS-CoV-2, that is wreaking havoc on the world, depend on interactions of its own proteins with those of the human host cells. Relatively small changes in sequence such as between SARS-CoV and SARS-CoV-2 can dramatically change clinical phenotypes of the virus, including transmission rates and severity of the disease. On the other hand, highly dissimilar virus families such as Coronaviridae, Ebola, and HIV have overlap in functions. In this work we aim to analyze the role of protein sequence in the binding of SARS-CoV-2 virus proteins towards human proteins and compare it to that of the above other viruses. We build supervised machine learning models, using Generalized Additive Models to predict interactions based on sequence features and find that our models perform well with an AUC-PR of 0.65 in a class-skew of 1:10. Analysis of the novel predictions using an independent dataset showed statistically significant enrichment. We further map the importance of specific amino-acid sequence features in predicting binding and summarize what combinations of sequences from the virus and the host is correlated with an interaction. By analyzing the sequence-based embeddings of the interactomes from different viruses and clustering them together we find some functionally similar proteins from different viruses. For example, vif protein from HIV-1, vp24 from Ebola and orf3b from SARS-CoV all function as interferon antagonists. Furthermore, we can differentiate the functions of similar viruses, for example orf3a's interactions are more diverged than orf7b interactions when comparing SARS-CoV and SARS-CoV-2.

Subject(s)

COVID-19 , SARS-CoV-2 , Amino Acid Sequence , Computational Biology , Humans , Proteins

11.

Special Issue: 9th International Computational Advances in Bio and Medical Sciences (ICCABS 2019).

Mandoiu, Ion; Murali, T M; Narasimhan, Giri; Rajasekaran, Sanguthevar; Skums, Pavel; Zelikovsky, Alexander.

J Comput Biol ; 28(2): 115-116, 2021 02.

Article in English | MEDLINE | ID: mdl-33539275

Subject(s)

Autism Spectrum Disorder/diagnosis , Computational Biology/methods , Florida , High-Throughput Nucleotide Sequencing , Humans , Machine Learning

12.

Accurate and efficient gene function prediction using a multi-bacterial network.

Law, Jeffrey N; Kale, Shiv D; Murali, T M.

Bioinformatics ; 37(6): 800-806, 2021 05 05.

Article in English | MEDLINE | ID: mdl-33063084

ABSTRACT

MOTIVATION: Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. RESULTS: We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. AVAILABILITY AND IMPLEMENTATION: An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Bacteria , Genome , Algorithms , Bacteria/genetics , Base Sequence , Phenotype

13.

Genetic interactions derived from high-throughput phenotyping of 6589 yeast cell cycle mutants.

Gallegos, Jenna E; Adames, Neil R; Rogers, Mark F; Kraikivski, Pavel; Ibele, Aubrey; Nurzynski-Loth, Kevin; Kudlow, Eric; Murali, T M; Tyson, John J; Peccoud, Jean.

NPJ Syst Biol Appl ; 6(1): 11, 2020 05 06.

Article in English | MEDLINE | ID: mdl-32376972

ABSTRACT

Over the last 30 years, computational biologists have developed increasingly realistic mathematical models of the regulatory networks controlling the division of eukaryotic cells. These models capture data resulting from two complementary experimental approaches: low-throughput experiments aimed at extensively characterizing the functions of small numbers of genes, and large-scale genetic interaction screens that provide a systems-level perspective on the cell division process. The former is insufficient to capture the interconnectivity of the genetic control network, while the latter is fraught with irreproducibility issues. Here, we describe a hybrid approach in which the 630 genetic interactions between 36 cell-cycle genes are quantitatively estimated by high-throughput phenotyping with an unprecedented number of biological replicates. Using this approach, we identify a subset of high-confidence genetic interactions, which we use to refine a previously published mathematical model of the cell cycle. We also present a quantitative dataset of the growth rate of these mutants under six different media conditions in order to inform future cell cycle models.

Subject(s)

Cell Cycle/genetics , Saccharomyces cerevisiae/genetics , Cell Division/genetics , Computational Biology/methods , Epistasis, Genetic/genetics , Gene Expression Regulation, Fungal/genetics , Gene Regulatory Networks/genetics , High-Throughput Screening Assays/methods , Models, Theoretical , Saccharomyces cerevisiae Proteins/genetics

14.

Benchmarking algorithms for gene regulatory network inference from single-cell transcriptomic data.

Pratapa, Aditya; Jalihal, Amogh P; Law, Jeffrey N; Bharadwaj, Aditya; Murali, T M.

Nat Methods ; 17(2): 147-154, 2020 02.

Article in English | MEDLINE | ID: mdl-31907445

ABSTRACT

We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.

Subject(s)

Algorithms , Gene Regulatory Networks , Single-Cell Analysis/methods , Transcriptome , Datasets as Topic , Sequence Analysis, RNA/methods

15.

Hypergraph-based connectivity measures for signaling pathway topologies.

Franzese, Nicholas; Groce, Adam; Murali, T M; Ritz, Anna.

PLoS Comput Biol ; 15(10): e1007384, 2019 10.

Article in English | MEDLINE | ID: mdl-31652258

ABSTRACT

Characterizing cellular responses to different extrinsic signals is an active area of research, and curated pathway databases describe these complex signaling reactions. Here, we revisit a fundamental question in signaling pathway analysis: are two molecules "connected" in a network? This question is the first step towards understanding the potential influence of molecules in a pathway, and the answer depends on the choice of modeling framework. We examined the connectivity of Reactome signaling pathways using four different pathway representations. We find that Reactome is very well connected as a graph, moderately well connected as a compound graph or bipartite graph, and poorly connected as a hypergraph (which captures many-to-many relationships in reaction networks). We present a novel relaxation of hypergraph connectivity that iteratively increases connectivity from a node while preserving the hypergraph topology. This measure, B-relaxation distance, provides a parameterized transition between hypergraph connectivity and graph connectivity. B-relaxation distance is sensitive to the presence of small molecules that participate in many functionally unrelated reactions in the network. We also define a score that quantifies one pathway's downstream influence on another, which can be calculated as B-relaxation distance gradually relaxes the connectivity constraint in hypergraphs. Computing this score across all pairs of 34 Reactome pathways reveals pairs of pathways with statistically significant influence. We present two such case studies, and we describe the specific reactions that contribute to the large influence score. Finally, we investigate the ability for connectivity measures to capture functional relationships among proteins, and use the evidence channels in the STRING database as a benchmark dataset. STRING interactions whose proteins are B-connected in Reactome have statistically significantly higher scores than interactions connected in the bipartite graph representation. Our method lays the groundwork for other generalizations of graph-theoretic concepts to hypergraphs in order to facilitate signaling pathway analysis.

Subject(s)

Signal Transduction/physiology , Algorithms , Computer Simulation , Databases, Factual/statistics & numerical data , Models, Statistical , Proteins

16.

Reconstructing signaling pathways using regular language constrained paths.

Wagner, Mitchell J; Pratapa, Aditya; Murali, T M.

Bioinformatics ; 35(14): i624-i633, 2019 07 15.

Article in English | MEDLINE | ID: mdl-31510694

ABSTRACT

MOTIVATION: High-quality curation of the proteins and interactions in signaling pathways is slow and painstaking. As a result, many experimentally detected interactions are not annotated to any pathways. A natural question that arises is whether or not it is possible to automatically leverage existing pathway annotations to identify new interactions for inclusion in a given pathway. RESULTS: We present RegLinker, an algorithm that achieves this purpose by computing multiple short paths from pathway receptors to transcription factors within a background interaction network. The key idea underlying RegLinker is the use of regular language constraints to control the number of non-pathway interactions that are present in the computed paths. We systematically evaluate RegLinker and five alternative approaches against a comprehensive set of 15 signaling pathways and demonstrate that RegLinker recovers withheld pathway proteins and interactions with the best precision and recall. We used RegLinker to propose new extensions to the pathways. We discuss the literature that supports the inclusion of these proteins in the pathways. These results show the broad potential of automated analysis to attenuate difficulties of traditional manual inquiry. AVAILABILITY AND IMPLEMENTATION: https://github.com/Murali-group/RegLinker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Language , Signal Transduction , Algorithms , Publications

17.

Large-scale protein function prediction using heterogeneous ensembles.

Wang, Linhua; Law, Jeffrey; Kale, Shiv D; Murali, T M; Pandey, Gaurav.

F1000Res ; 72018.

Article in English | MEDLINE | ID: mdl-30450194

ABSTRACT

Heterogeneous ensembles are an effective approach in scenarios where the ideal data type and/or individual predictor are unclear for a given problem. These ensembles have shown promise for protein function prediction (PFP), but their ability to improve PFP at a large scale is unclear. The overall goal of this study is to critically assess this ability of a variety of heterogeneous ensemble methods across a multitude of functional terms, proteins and organisms. Our results show that these methods, especially Stacking using Logistic Regression, indeed produce more accurate predictions for a variety of Gene Ontology terms differing in size and specificity. To enable the application of these methods to other related problems, we have publicly shared the HPC-enabled code underlying this work as LargeGOPred ( https://github.com/GauravPandeyLab/LargeGOPred).

Subject(s)

Bacterial Proteins/genetics , Gene Ontology , Logistic Models , Machine Learning

18.

Automating the PathLinker app for Cytoscape.

Huang, Li Jun; Law, Jeffrey N; Murali, T M.

F1000Res ; 7: 727, 2018.

Article in English | MEDLINE | ID: mdl-30057757

ABSTRACT

PathLinker is a graph-theoretic algorithm originally developed to reconstruct the interactions in a signaling pathway of interest. It efficiently computes multiple short paths within a background protein interaction network from the receptors to transcription factors (TFs) in a pathway. Since December 2015, PathLinker has been available as an app for Cytoscape. This paper describes how we automated the app to use the CyRest infrastructure and how users can incorporate PathLinker into their software pipelines.

19.

Transcriptomic Analysis of Hepatic Cells in Multicellular Organotypic Liver Models.

Tegge, Allison N; Rodrigues, Richard R; Larkin, Adam L; Vu, Lucas; Murali, T M; Rajagopalan, Padmavathy.

Sci Rep ; 8(1): 11306, 2018 07 27.

Article in English | MEDLINE | ID: mdl-30054499

ABSTRACT

Liver homeostasis requires the presence of both parenchymal and non-parenchymal cells (NPCs). However, systems biology studies of the liver have primarily focused on hepatocytes. Using an organotypic three-dimensional (3D) hepatic culture, we report the first transcriptomic study of liver sinusoidal endothelial cells (LSECs) and Kupffer cells (KCs) cultured with hepatocytes. Through computational pathway and interaction network analyses, we demonstrate that hepatocytes, LSECs and KCs have distinct expression profiles and functional characteristics. Our results show that LSECs in the presence of KCs exhibit decreased expression of focal adhesion kinase (FAK) signaling, a pathway linked to LSEC dedifferentiation. We report the novel result that peroxisome proliferator-activated receptor alpha (PPARα) is transcribed in LSECs. The expression of downstream processes corroborates active PPARα signaling in LSECs. We uncover transcriptional evidence in LSECs for a feedback mechanism between PPARα and farnesoid X-activated receptor (FXR) that maintains bile acid homeostasis; previously, this feedback was known occur only in HepG2 cells. We demonstrate that KCs in 3D liver models display expression patterns consistent with an anti-inflammatory phenotype when compared to monocultures. These results highlight the distinct roles of LSECs and KCs in maintaining liver function and emphasize the need for additional mechanistic studies of NPCs in addition to hepatocytes in liver-mimetic microenvironments.

Subject(s)

Hepatocytes/metabolism , Liver/metabolism , PPAR alpha/genetics , Receptors, Cytoplasmic and Nuclear/genetics , Transcriptome/genetics , Bile Acids and Salts/metabolism , Endothelial Cells/cytology , Endothelial Cells/metabolism , Gene Expression Profiling , Hep G2 Cells , Hepatocytes/cytology , Homeostasis/genetics , Humans , Kupffer Cells/cytology , Kupffer Cells/metabolism , Liver/cytology

20.

CrossPlan: systematic planning of genetic crosses to validate mathematical models.

Pratapa, Aditya; Adames, Neil; Kraikivski, Pavel; Franzese, Nicholas; Tyson, John J; Peccoud, Jean; Murali, T M.

Bioinformatics ; 34(13): 2237-2244, 2018 07 01.

Article in English | MEDLINE | ID: mdl-29432533

ABSTRACT

Motivation: Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test. Results: We present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. We prove that the CrossPlan problem is NP-complete. We develop an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. We also extend our solution to incorporate other experimental conditions such as a delay factor that decides the availability of a mutant and genetic markers to confirm gene deletions. The experimental flow that underlies our work is quite generic and our ILP-based algorithm is easy to modify. Hence, our framework should be relevant in plant and animal systems as well. Availability and implementation: CrossPlan code is freely available under GNU General Public Licence v3.0 at https://github.com/Murali-group/crossplan. Supplementary information: Supplementary data are available at Bioinformatics online.

Subject(s)

Computational Biology/methods , Crosses, Genetic , Models, Theoretical , Mutation , Programming, Linear , Software , Algorithms , Cell Division/genetics , Gene Regulatory Networks , Models, Biological , Saccharomycetales/genetics

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL