Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 61
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Cell Tissue Res ; 396(1): 119-139, 2024 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-38369646

RESUMEN

Primary human hepatocytes (PHHs) are used extensively for in vitro liver cultures to study hepatic functions. However, limited availability and invasive retrieval prevent their widespread use. Induced pluripotent stem cells exhibit significant potential since they can be obtained non-invasively and differentiated into hepatic lineages, such as hepatocyte-like cells (iHLCs). However, there are concerns about their fetal phenotypic characteristics and their hepatic functions compared to PHHs in culture. Therefore, we performed an RNA-sequencing (RNA-seq) analysis to understand pathways that are either up- or downregulated in each cell type. Analysis of the RNA-seq data showed an upregulation in the bile secretion pathway where genes such as AQP9 and UGT1A1 were higher expressed in PHHs compared to iHLCs by 455- and 15-fold, respectively. Upon immunostaining, bile canaliculi were shown to be present in PHHs. The TCA cycle in PHHs was upregulated compared to iHLCs. Cellular analysis showed a 2-2.5-fold increase in normalized urea production in PHHs compared to iHLCs. In addition, drug metabolism pathways, including cytochrome P450 (CYP450) and UDP-glucuronosyltransferase enzymes, were upregulated in PHHs compared to iHLCs. Of note, CYP2E1 gene expression was significantly higher (21,810-fold) in PHHs. Acetaminophen and ethanol were administered to PHH and iHLC cultures to investigate differences in biotransformation. CYP450 activity of baseline and toxicant-treated samples was significantly higher in PHHs compared to iHLCs. Our analysis revealed that iHLCs have substantial differences from PHHs in critical hepatic functions. These results have highlighted the differences in gene expression and hepatic functions between PHHs and iHLCs to motivate future investigation.


Asunto(s)
Células Madre Pluripotentes Inducidas , Humanos , Células Madre Pluripotentes Inducidas/metabolismo , Hepatocitos , Hígado , Diferenciación Celular , Perfilación de la Expresión Génica
2.
Nat Methods ; 17(2): 147-154, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31907445

RESUMEN

We present a systematic evaluation of state-of-the-art algorithms for inferring gene regulatory networks from single-cell transcriptional data. As the ground truth for assessing accuracy, we use synthetic networks with predictable trajectories, literature-curated Boolean models and diverse transcriptional regulatory networks. We develop a strategy to simulate single-cell transcriptional data from synthetic and Boolean networks that avoids pitfalls of previously used methods. Furthermore, we collect networks from multiple experimental single-cell RNA-seq datasets. We develop an evaluation framework called BEELINE. We find that the area under the precision-recall curve and early precision of the algorithms are moderate. The methods are better in recovering interactions in synthetic networks than Boolean models. The algorithms with the best early precision values for Boolean models also perform well on experimental datasets. Techniques that do not require pseudotime-ordered cells are generally more accurate. Based on these results, we present recommendations to end users. BEELINE will aid the development of gene regulatory network inference algorithms.


Asunto(s)
Algoritmos , Redes Reguladoras de Genes , Análisis de la Célula Individual/métodos , Transcriptoma , Conjuntos de Datos como Asunto , Análisis de Secuencia de ARN/métodos
3.
Chem Res Toxicol ; 36(8): 1267-1277, 2023 08 21.
Artículo en Inglés | MEDLINE | ID: mdl-37471124

RESUMEN

Humans and animals are regularly exposed to compounds that may have adverse effects on health. The Toxicity Forecaster (ToxCast) program was developed to use high throughput screening assays to quickly screen chemicals by measuring their effects on many biological end points. Many of these assays test for effects on cellular receptors and transcription factors (TFs), under the assumption that a toxicant may perturb normal signaling pathways in the cell. We hypothesized that we could reconstruct the intermediate proteins in these pathways that may be directly or indirectly affected by the toxicant, potentially revealing important physiological processes not yet tested for many chemicals. We integrate data from ToxCast with a human protein interactome to build toxicant signaling networks that contain physical and signaling protein interactions that may be affected as a result of toxicant exposure. To build these networks, we developed the EdgeLinker algorithm, which efficiently finds short paths in the interactome that connect the receptors to TFs for each toxicant. We performed multiple evaluations and found evidence suggesting that these signaling networks capture biologically relevant effects of toxicants. To aid in dissemination and interpretation, interactive visualizations of these networks are available at http://graphspace.org.


Asunto(s)
Efectos Colaterales y Reacciones Adversas Relacionados con Medicamentos , Ensayos Analíticos de Alto Rendimiento , Animales , Humanos , Algoritmos , Transducción de Señal
4.
Bioinformatics ; 37(6): 800-806, 2021 05 05.
Artículo en Inglés | MEDLINE | ID: mdl-33063084

RESUMEN

MOTIVATION: Nearly 40% of the genes in sequenced genomes have no experimentally or computationally derived functional annotations. To fill this gap, we seek to develop methods for network-based gene function prediction that can integrate heterogeneous data for multiple species with experimentally based functional annotations and systematically transfer them to newly sequenced organisms on a genome-wide scale. However, the large sizes of such networks pose a challenge for the scalability of current methods. RESULTS: We develop a label propagation algorithm called FastSinkSource. By formally bounding its rate of progress, we decrease the running time by a factor of 100 without sacrificing accuracy. We systematically evaluate many approaches to construct multi-species bacterial networks and apply FastSinkSource and other state-of-the-art methods to these networks. We find that the most accurate and efficient approach is to pre-compute annotation scores for species with experimental annotations, and then to transfer them to other organisms. In this manner, FastSinkSource runs in under 3 min for 200 bacterial species. AVAILABILITY AND IMPLEMENTATION: An implementation of our framework and all data used in this research are available at https://github.com/Murali-group/multi-species-GOA-prediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Bacterias , Genoma , Algoritmos , Bacterias/genética , Secuencia de Bases , Fenotipo
5.
Bioinformatics ; 35(14): i624-i633, 2019 07 15.
Artículo en Inglés | MEDLINE | ID: mdl-31510694

RESUMEN

MOTIVATION: High-quality curation of the proteins and interactions in signaling pathways is slow and painstaking. As a result, many experimentally detected interactions are not annotated to any pathways. A natural question that arises is whether or not it is possible to automatically leverage existing pathway annotations to identify new interactions for inclusion in a given pathway. RESULTS: We present RegLinker, an algorithm that achieves this purpose by computing multiple short paths from pathway receptors to transcription factors within a background interaction network. The key idea underlying RegLinker is the use of regular language constraints to control the number of non-pathway interactions that are present in the computed paths. We systematically evaluate RegLinker and five alternative approaches against a comprehensive set of 15 signaling pathways and demonstrate that RegLinker recovers withheld pathway proteins and interactions with the best precision and recall. We used RegLinker to propose new extensions to the pathways. We discuss the literature that supports the inclusion of these proteins in the pathways. These results show the broad potential of automated analysis to attenuate difficulties of traditional manual inquiry. AVAILABILITY AND IMPLEMENTATION: https://github.com/Murali-group/RegLinker. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Lenguaje , Transducción de Señal , Algoritmos , Publicaciones
6.
PLoS Comput Biol ; 15(10): e1007384, 2019 10.
Artículo en Inglés | MEDLINE | ID: mdl-31652258

RESUMEN

Characterizing cellular responses to different extrinsic signals is an active area of research, and curated pathway databases describe these complex signaling reactions. Here, we revisit a fundamental question in signaling pathway analysis: are two molecules "connected" in a network? This question is the first step towards understanding the potential influence of molecules in a pathway, and the answer depends on the choice of modeling framework. We examined the connectivity of Reactome signaling pathways using four different pathway representations. We find that Reactome is very well connected as a graph, moderately well connected as a compound graph or bipartite graph, and poorly connected as a hypergraph (which captures many-to-many relationships in reaction networks). We present a novel relaxation of hypergraph connectivity that iteratively increases connectivity from a node while preserving the hypergraph topology. This measure, B-relaxation distance, provides a parameterized transition between hypergraph connectivity and graph connectivity. B-relaxation distance is sensitive to the presence of small molecules that participate in many functionally unrelated reactions in the network. We also define a score that quantifies one pathway's downstream influence on another, which can be calculated as B-relaxation distance gradually relaxes the connectivity constraint in hypergraphs. Computing this score across all pairs of 34 Reactome pathways reveals pairs of pathways with statistically significant influence. We present two such case studies, and we describe the specific reactions that contribute to the large influence score. Finally, we investigate the ability for connectivity measures to capture functional relationships among proteins, and use the evidence channels in the STRING database as a benchmark dataset. STRING interactions whose proteins are B-connected in Reactome have statistically significantly higher scores than interactions connected in the bipartite graph representation. Our method lays the groundwork for other generalizations of graph-theoretic concepts to hypergraphs in order to facilitate signaling pathway analysis.


Asunto(s)
Transducción de Señal/fisiología , Algoritmos , Simulación por Computador , Bases de Datos Factuales/estadística & datos numéricos , Modelos Estadísticos , Proteínas
7.
Bioinformatics ; 34(13): 2237-2244, 2018 07 01.
Artículo en Inglés | MEDLINE | ID: mdl-29432533

RESUMEN

Motivation: Mathematical models of cellular processes can systematically predict the phenotypes of novel combinations of multi-gene mutations. Searching for informative predictions and prioritizing them for experimental validation is challenging since the number of possible combinations grows exponentially in the number of mutations. Moreover, keeping track of the crosses needed to make new mutants and planning sequences of experiments is unmanageable when the experimenter is deluged by hundreds of potentially informative predictions to test. Results: We present CrossPlan, a novel methodology for systematically planning genetic crosses to make a set of target mutants from a set of source mutants. We base our approach on a generic experimental workflow used in performing genetic crosses in budding yeast. We prove that the CrossPlan problem is NP-complete. We develop an integer-linear-program (ILP) to maximize the number of target mutants that we can make under certain experimental constraints. We apply our method to a comprehensive mathematical model of the protein regulatory network controlling cell division in budding yeast. We also extend our solution to incorporate other experimental conditions such as a delay factor that decides the availability of a mutant and genetic markers to confirm gene deletions. The experimental flow that underlies our work is quite generic and our ILP-based algorithm is easy to modify. Hence, our framework should be relevant in plant and animal systems as well. Availability and implementation: CrossPlan code is freely available under GNU General Public Licence v3.0 at https://github.com/Murali-group/crossplan. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Cruzamientos Genéticos , Modelos Teóricos , Mutación , Programación Lineal , Programas Informáticos , Algoritmos , División Celular/genética , Redes Reguladoras de Genes , Modelos Biológicos , Saccharomycetales/genética
8.
Nucleic Acids Res ; 45(D1): D432-D439, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27899583

RESUMEN

Analysis of signaling pathways and their crosstalk is a cornerstone of systems biology. Thousands of papers have been published on these topics. Surprisingly, there is no database that carefully and explicitly documents crosstalk between specific pairs of signaling pathways. We have developed XTalkDB (http://www.xtalkdb.org) to fill this very important gap. XTalkDB contains curated information for 650 pairs of pathways from over 1600 publications. In addition, the database reports the molecular components (e.g. proteins, hormones, microRNAs) that mediate crosstalk between a pair of pathways and the species and tissue in which the crosstalk was observed. The XTalkDB website provides an easy-to-use interface for scientists to browse crosstalk information by querying one or more pathways or molecules of interest.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Genéticas , Transducción de Señal , Programas Informáticos , Regulación de la Expresión Génica , Ligandos , Unión Proteica , Proteínas , Biología de Sistemas/métodos , Navegador Web
9.
Bioinformatics ; 33(19): 3134-3136, 2017 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-28957495

RESUMEN

SUMMARY: Networks have become ubiquitous in systems biology. Visualization is a crucial component in their analysis. However, collaborations within research teams in network biology are hampered by software systems that are either specific to a computational algorithm, create visualizations that are not biologically meaningful, or have limited features for sharing networks and visualizations. We present GraphSpace, a web-based platform that fosters team science by allowing collaborating research groups to easily store, interact with, layout and share networks. AVAILABILITY AND IMPLEMENTATION: Anyone can upload and share networks at http://graphspace.org. In addition, the GraphSpace code is available at http://github.com/Murali-group/graphspace if a user wants to run his or her own server. CONTACT: murali@cs.vt.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Biología de Sistemas/métodos , Algoritmos , Biología Computacional , Comunicación Interdisciplinaria
10.
Bioinformatics ; 32(2): 242-51, 2016 Jan 15.
Artículo en Inglés | MEDLINE | ID: mdl-26400040

RESUMEN

MOTIVATION: Cells communicate with their environment via signal transduction pathways. On occasion, the activation of one pathway can produce an effect downstream of another pathway, a phenomenon known as crosstalk. Existing computational methods to discover such pathway pairs rely on simple overlap statistics. RESULTS: We present Xtalk, a path-based approach for identifying pairs of pathways that may crosstalk. Xtalk computes the statistical significance of the average length of multiple short paths that connect receptors in one pathway to the transcription factors in another. By design, Xtalk reports the precise interactions and mechanisms that support the identified crosstalk. We applied Xtalk to signaling pathways in the KEGG and NCI-PID databases. We manually curated a gold standard set of 132 crosstalking pathway pairs and a set of 140 pairs that did not crosstalk, for which Xtalk achieved an area under the receiver operator characteristic curve of 0.65, a 12% improvement over the closest competing approach. The area under the receiver operator characteristic curve varied with the pathway, suggesting that crosstalk should be evaluated on a pathway-by-pathway level. We also analyzed an extended set of 658 pathway pairs in KEGG and to a set of more than 7000 pathway pairs in NCI-PID. For the top-ranking pairs, we found substantial support in the literature (81% for KEGG and 78% for NCI-PID). We provide examples of networks computed by Xtalk that accurately recovered known mechanisms of crosstalk. AVAILABILITY AND IMPLEMENTATION: The XTALK software is available at http://bioinformatics.cs.vt.edu/~murali/software. Crosstalk networks are available at http://graphspace.org/graphs?tags=2015-bioinformatics-xtalk. CONTACT: ategge@vt.edu, murali@cs.vt.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Modelos Biológicos , Transducción de Señal , Programas Informáticos , Factores de Transcripción/metabolismo , Humanos
11.
Bioinformatics ; 29(5): 622-9, 2013 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-23314326

RESUMEN

MOTIVATION: Many techniques have been developed to compute the response network of a cell. A recent trend in this area is to compute response networks of small size, with the rationale that only part of a pathway is often changed by disease and that interpreting small subnetworks is easier than interpreting larger ones. However, these methods may not uncover the spectrum of pathways perturbed in a particular experiment or disease. RESULTS: To avoid these difficulties, we propose to use algorithms that reconcile case-control DNA microarray data with a molecular interaction network by modifying per-gene differential expression P-values such that two genes connected by an interaction show similar changes in their gene expression values. We provide a novel evaluation of four methods from this class of algorithms. We enumerate three desirable properties that this class of algorithms should address. These properties seek to maintain that the returned gene rankings are specific to the condition being studied. Moreover, to ease interpretation, highly ranked genes should participate in coherent network structures and should be functionally enriched with relevant biological pathways. We comprehensively evaluate the extent to which each algorithm addresses these properties on a compendium of gene expression data for 54 diverse human diseases. We show that the reconciled gene rankings can identify novel disease-related functions that are missed by analyzing expression data alone. AVAILABILITY: C++ software implementing our algorithms is available in the NetworkReconciliation package as part of the Biorithm software suite under the GNU General Public License: http://bioinformatics.cs.vt.edu/∼murali/software/biorithm-docs.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Mapeo de Interacción de Proteínas , Enfermedad de Alzheimer/genética , Enfermedad de Alzheimer/metabolismo , Transporte Biológico , Encéfalo/metabolismo , Redes Reguladoras de Genes , Glucosa/metabolismo , Humanos , Enfermedad de Huntington/genética , Enfermedad de Huntington/metabolismo , Insulina/fisiología , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos
12.
Annu Rev Biomed Eng ; 15: 55-70, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23862675

RESUMEN

Tissue engineering and molecular systems biology are inherently interdisciplinary fields that have been developed independently so far. In this review, we first provide a brief introduction to tissue engineering and to molecular systems biology. Next, we highlight some prominent applications of systems biology techniques in tissue engineering. Finally, we outline research directions that can successfully blend these two fields. Through these examples, we propose that experimental and computational advances in molecular systems biology can lead to predictive models of bioengineered tissues that enhance our understanding of bioengineered systems. In turn, the unique challenges posed by tissue engineering will usher in new experimental techniques and computational advances in systems biology.


Asunto(s)
Biología de Sistemas/métodos , Ingeniería de Tejidos/métodos , Algoritmos , Animales , Bioingeniería/métodos , Biología Computacional/métodos , Humanos , Microfluídica/métodos , Mapeo de Interacción de Proteínas/métodos
13.
BMC Microbiol ; 13: 224, 2013 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-24099000

RESUMEN

BACKGROUND: Fungi are the second most abundant type of human pathogens. Invasive fungal pathogens are leading causes of life-threatening infections in clinical settings. Toxicity to the host and drug-resistance are two major deleterious issues associated with existing antifungal agents. Increasing a host's tolerance and/or immunity to fungal pathogens has potential to alleviate these problems. A host's tolerance may be improved by modulating the immune system such that it responds more rapidly and robustly in all facets, ranging from the recognition of pathogens to their clearance from the host. An understanding of biological processes and genes that are perturbed during attempted fungal exposure, colonization, and/or invasion will help guide the identification of endogenous immunomodulators and/or small molecules that activate host-immune responses such as specialized adjuvants. RESULTS: In this study, we present computational techniques and approaches using publicly available transcriptional data sets, to predict immunomodulators that may act against multiple fungal pathogens. Our study analyzed data sets derived from host cells exposed to five fungal pathogens, namely, Alternaria alternata, Aspergillus fumigatus, Candida albicans, Pneumocystis jirovecii, and Stachybotrys chartarum. We observed statistically significant associations between host responses to A. fumigatus and C. albicans. Our analysis identified biological processes that were consistently perturbed by these two pathogens. These processes contained both immune response-inducing genes such as MALT1, SERPINE1, ICAM1, and IL8, and immune response-repressing genes such as DUSP8, DUSP6, and SPRED2. We hypothesize that these genes belong to a pool of common immunomodulators that can potentially be activated or suppressed (agonized or antagonized) in order to render the host more tolerant to infections caused by A. fumigatus and C. albicans. CONCLUSIONS: Our computational approaches and methodologies described here can now be applied to newly generated or expanded data sets for further elucidation of additional drug targets. Moreover, identified immunomodulators may be used to generate experimentally testable hypotheses that could help in the discovery of broad-spectrum immunotherapeutic interventions. All of our results are available at the following supplementary website: http://bioinformatics.cs.vt.edu/~murali/supplements/2013-kidane-bmc.


Asunto(s)
Descubrimiento de Drogas/métodos , Hongos/inmunología , Perfilación de la Expresión Génica , Interacciones Huésped-Patógeno , Factores Inmunológicos/aislamiento & purificación , Biología Computacional , Humanos
14.
EBioMedicine ; 96: 104777, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37672869

RESUMEN

BACKGROUND: The cause and symptoms of long COVID are poorly understood. It is challenging to predict whether a given COVID-19 patient will develop long COVID in the future. METHODS: We used electronic health record (EHR) data from the National COVID Cohort Collaborative to predict the incidence of long COVID. We trained two machine learning (ML) models - logistic regression (LR) and random forest (RF). Features used to train predictors included symptoms and drugs ordered during acute infection, measures of COVID-19 treatment, pre-COVID comorbidities, and demographic information. We assigned the 'long COVID' label to patients diagnosed with the U09.9 ICD10-CM code. The cohorts included patients with (a) EHRs reported from data partners using U09.9 ICD10-CM code and (b) at least one EHR in each feature category. We analysed three cohorts: all patients (n = 2,190,579; diagnosed with long COVID = 17,036), inpatients (149,319; 3,295), and outpatients (2,041,260; 13,741). FINDINGS: LR and RF models yielded median AUROC of 0.76 and 0.75, respectively. Ablation study revealed that drugs had the highest influence on the prediction task. The SHAP method identified age, gender, cough, fatigue, albuterol, obesity, diabetes, and chronic lung disease as explanatory features. Models trained on data from one N3C partner and tested on data from the other partners had average AUROC of 0.75. INTERPRETATION: ML-based classification using EHR information from the acute infection period is effective in predicting long COVID. SHAP methods identified important features for prediction. Cross-site analysis demonstrated the generalizability of the proposed methodology. FUNDING: NCATS U24 TR002306, NCATS UL1 TR003015, Axle Informatics Subcontract: NCATS-P00438-B, NIH/NIDDK/OD, PSR2015-1720GVALE_01, G43C22001320007, and Director, Office of Science, Office of Basic Energy Sciences of the U.S. Department of Energy Contract No. DE-AC02-05CH11231.


Asunto(s)
COVID-19 , Síndrome Post Agudo de COVID-19 , Humanos , Tratamiento Farmacológico de COVID-19 , Aprendizaje Automático , Obesidad
15.
EBioMedicine ; 87: 104413, 2023 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-36563487

RESUMEN

BACKGROUND: Stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, long COVID is incompletely understood and characterised by a wide range of manifestations that are difficult to analyse computationally. Additionally, the generalisability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. METHODS: We present a method for computationally modelling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning. FINDINGS: We found six clusters of PASC patients, each with distinct profiles of phenotypic abnormalities, including clusters with distinct pulmonary, neuropsychiatric, and cardiovascular abnormalities, and a cluster associated with broad, severe manifestations and increased mortality. There was significant association of cluster membership with a range of pre-existing conditions and measures of severity during acute COVID-19. We assigned new patients from other healthcare centres to clusters by maximum semantic similarity to the original patients, and showed that the clusters were generalisable across different hospital systems. The increased mortality rate originally identified in one cluster was consistently observed in patients assigned to that cluster in other hospital systems. INTERPRETATION: Semantic phenotypic clustering provides a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC. FUNDING: NIH (TR002306/OT2HL161847-01/OD011883/HG010860), U.S.D.O.E. (DE-AC02-05CH11231), Donald A. Roux Family Fund at Jackson Laboratory, Marsico Family at CU Anschutz.


Asunto(s)
COVID-19 , Síndrome Post Agudo de COVID-19 , Humanos , Progresión de la Enfermedad , SARS-CoV-2
16.
BMC Bioinformatics ; 13 Suppl 3: S9, 2012 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-22536907

RESUMEN

BACKGROUND: The normal functioning of a living cell is characterized by complex interaction networks involving many different types of molecules. Associations detected between diseases and perturbations in well-defined pathways within such interaction networks have the potential to illuminate the molecular mechanisms underlying disease progression and response to treatment. RESULTS: In this paper, we present a computational method that compares expression profiles of genes in cancer samples to samples from normal tissues in order to detect perturbations of pre-defined pathways in the cancer. In contrast to many previous methods, our scoring function approach explicitly takes into account the interactions between the gene products in a pathway. Moreover, we compute the sub-pathway that has the highest score, as opposed to merely computing the score for the entire pathway. We use a permutation test to assess the statistical significance of the most perturbed sub-pathway. We apply our method to 20 pathways in the Netpath database and to the Global Cancer Map of gene expression in 18 cancers. We demonstrate that our method yields more sensitive results than alternatives that do not consider interactions or measure the perturbation of a pathway as a whole. We perform a sensitivity analysis to show that our approach is robust to modest changes in the input data. Our method confirms numerous well-known connections between pathways and cancers. CONCLUSIONS: Our results indicate that integrating differential gene expression with the interaction structure in a pathway is a powerful approach for detecting links between a cancer and the pathways perturbed in it. Our results also suggest that even well-studied pathways may be perturbed only partially in any given cancer. Further analysis of cancer-specific sub-pathways may shed new light on the similarities and differences between cancers.


Asunto(s)
Algoritmos , Neoplasias/genética , Neoplasias/metabolismo , Mapas de Interacción de Proteínas , Transducción de Señal , Regulación Neoplásica de la Expresión Génica , Redes Reguladoras de Genes , Humanos
17.
PLoS Comput Biol ; 7(9): e1002164, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21966263

RESUMEN

HIV Dependency Factors (HDFs) are a class of human proteins that are essential for HIV replication, but are not lethal to the host cell when silenced. Three previous genome-wide RNAi experiments identified HDF sets with little overlap. We combine data from these three studies with a human protein interaction network to predict new HDFs, using an intuitive algorithm called SinkSource and four other algorithms published in the literature. Our algorithm achieves high precision and recall upon cross validation, as do the other methods. A number of HDFs that we predict are known to interact with HIV proteins. They belong to multiple protein complexes and biological processes that are known to be manipulated by HIV. We also demonstrate that many predicted HDF genes show significantly different programs of expression in early response to SIV infection in two non-human primate species that differ in AIDS progression. Our results suggest that many HDFs are yet to be discovered and that they have potential value as prognostic markers to determine pathological outcome and the likelihood of AIDS development. More generally, if multiple genome-wide gene-level studies have been performed at independent labs to study the same biological system or phenomenon, our methodology is applicable to interpret these studies simultaneously in the context of molecular interaction networks and to ask if they reinforce or contradict each other.


Asunto(s)
Bases de Datos de Proteínas , Infecciones por VIH/metabolismo , Interacciones Huésped-Patógeno/fisiología , Modelos Estadísticos , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Algoritmos , Animales , Chlorocebus aethiops , Análisis por Conglomerados , Progresión de la Enfermedad , VIH/fisiología , Proteínas del Virus de la Inmunodeficiencia Humana/química , Proteínas del Virus de la Inmunodeficiencia Humana/metabolismo , Humanos , Macaca nemestrina , Proteínas/metabolismo , Proteómica , Interferencia de ARN , Reproducibilidad de los Resultados , Síndrome de Inmunodeficiencia Adquirida del Simio/metabolismo , Virus de la Inmunodeficiencia de los Simios/fisiología , Replicación Viral
18.
bioRxiv ; 2022 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-35923321

RESUMEN

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities, but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms, and uses effective heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data, and mortality due to COVID-19 from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen (BUN) and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration . Contact: gaurav.pandey@mssm.edu.

19.
Bioinform Adv ; 2(1): vbac065, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-36158455

RESUMEN

Motivation: Integrating multimodal data represents an effective approach to predicting biomedical characteristics, such as protein functions and disease outcomes. However, existing data integration approaches do not sufficiently address the heterogeneous semantics of multimodal data. In particular, early and intermediate approaches that rely on a uniform integrated representation reinforce the consensus among the modalities but may lose exclusive local information. The alternative late integration approach that can address this challenge has not been systematically studied for biomedical problems. Results: We propose Ensemble Integration (EI) as a novel systematic implementation of the late integration approach. EI infers local predictive models from the individual data modalities using appropriate algorithms and uses heterogeneous ensemble algorithms to integrate these local models into a global predictive model. We also propose a novel interpretation method for EI models. We tested EI on the problems of predicting protein function from multimodal STRING data and mortality due to coronavirus disease 2019 (COVID-19) from multimodal data in electronic health records. We found that EI accomplished its goal of producing significantly more accurate predictions than each individual modality. It also performed better than several established early integration methods for each of these problems. The interpretation of a representative EI model for COVID-19 mortality prediction identified several disease-relevant features, such as laboratory test (blood urea nitrogen and calcium) and vital sign measurements (minimum oxygen saturation) and demographics (age). These results demonstrated the effectiveness of the EI framework for biomedical data integration and predictive modeling. Availability and implementation: Code and data are available at https://github.com/GauravPandeyLab/ensemble_integration. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

20.
medRxiv ; 2022 Jul 20.
Artículo en Inglés | MEDLINE | ID: mdl-35665012

RESUMEN

Accurate stratification of patients with post-acute sequelae of SARS-CoV-2 infection (PASC, or long COVID) would allow precision clinical management strategies. However, the natural history of long COVID is incompletely understood and characterized by an extremely wide range of manifestations that are difficult to analyze computationally. In addition, the generalizability of machine learning classification of COVID-19 clinical outcomes has rarely been tested. We present a method for computationally modeling PASC phenotype data based on electronic healthcare records (EHRs) and for assessing pairwise phenotypic similarity between patients using semantic similarity. Our approach defines a nonlinear similarity function that maps from a feature space of phenotypic abnormalities to a matrix of pairwise patient similarity that can be clustered using unsupervised machine learning procedures. Using k-means clustering of this similarity matrix, we found six distinct clusters of PASC patients, each with distinct profiles of phenotypic abnormalities. There was a significant association of cluster membership with a range of pre-existing conditions and with measures of severity during acute COVID-19. Two of the clusters were associated with severe manifestations and displayed increased mortality. We assigned new patients from other healthcare centers to one of the six clusters on the basis of maximum semantic similarity to the original patients. We show that the identified clusters were generalizable across different hospital systems and that the increased mortality rate was consistently observed in two of the clusters. Semantic phenotypic clustering can provide a foundation for assigning patients to stratified subgroups for natural history or therapy studies on PASC.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA