RESUMO
The analysis of omic data depends on machine-readable information about protein interactions, modifications, and activities as found in protein interaction networks, databases of post-translational modifications, and curated models of gene and protein function. These resources typically depend heavily on human curation. Natural language processing systems that read the primary literature have the potential to substantially extend knowledge resources while reducing the burden on human curators. However, machine-reading systems are limited by high error rates and commonly generate fragmentary and redundant information. Here, we describe an approach to precisely assemble molecular mechanisms at scale using multiple natural language processing systems and the Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA identifies full and partial overlaps in information extracted from published papers and pathway databases, uses predictive models to improve the reliability of machine reading, and thereby assembles individual pieces of information into non-redundant and broadly usable mechanistic knowledge. Using INDRA to create high-quality corpora of causal knowledge we show it is possible to extend protein-protein interaction databases and explain co-dependencies in the Cancer Dependency Map.
Assuntos
Mineração de Dados , Processamento de Linguagem Natural , Humanos , Reprodutibilidade dos Testes , Bases de Dados FactuaisRESUMO
Apoptosis is a highly regulated form of cell death that controls normal homeostasis as well as the antitumor activity of many chemotherapeutic agents. Commitment to death via the mitochondrial apoptotic pathway requires activation of the mitochondrial pore-forming proteins BAK or BAX. Activation can be effected by the activator BH3-only proteins BID or BIM, which have been considered to be functionally redundant in this role. Herein, we show that significant activation preferences exist between these proteins: BID preferentially activates BAK while BIM preferentially activates BAX. Furthermore, we find that cells lacking BAK are relatively resistant to agents that require BID activation for maximal induction of apoptosis, including topoisomerase inhibitors and TRAIL. Consequently, patients with tumors that harbor a loss of BAK1 exhibit an inferior response to topoisomerase inhibitor treatment in the clinic. Therefore, BID and BIM have nonoverlapping roles in the induction of apoptosis via BAK and BAX, affecting chemotherapy response.
Assuntos
Proteínas Reguladoras de Apoptose/metabolismo , Proteína Agonista de Morte Celular de Domínio Interatuante com BH3/metabolismo , Proteínas de Membrana/metabolismo , Neoplasias Ovarianas/genética , Proteínas Proto-Oncogênicas/metabolismo , Proteína Killer-Antagonista Homóloga a bcl-2/metabolismo , Proteína X Associada a bcl-2/metabolismo , Apoptose/efeitos dos fármacos , Proteínas Reguladoras de Apoptose/genética , Proteína Agonista de Morte Celular de Domínio Interatuante com BH3/genética , Proteína 11 Semelhante a Bcl-2 , Feminino , Regulação Neoplásica da Expressão Gênica , Células HeLa , Humanos , Proteínas de Membrana/genética , Mitocôndrias/efeitos dos fármacos , Mitocôndrias/genética , Mitocôndrias/metabolismo , Estadiamento de Neoplasias , Neoplasias Ovarianas/tratamento farmacológico , Neoplasias Ovarianas/patologia , Proteínas Proto-Oncogênicas/genética , Inibidores da Topoisomerase/administração & dosagem , Ativação Transcricional/efeitos dos fármacos , Proteína Killer-Antagonista Homóloga a bcl-2/genética , Proteína X Associada a bcl-2/genéticaRESUMO
SUMMARY: INDRA-IPM (Interactive Pathway Map) is a web-based pathway map modeling tool that combines natural language processing with automated model assembly and visualization. INDRA-IPM contextualizes models with expression data and exports them to standard formats. AVAILABILITY AND IMPLEMENTATION: INDRA-IPM is available at: http://pathwaymap.indra.bio. Source code is available at http://github.com/sorgerlab/indra_pathway_map. The underlying web service API is available at http://api.indra.bio:8000. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Computadores , Software , Processamento de Linguagem NaturalRESUMO
Signal transduction networks allow eukaryotic cells to make decisions based on information about intracellular state and the environment. Biochemical noise significantly diminishes the fidelity of signaling: networks examined to date seem to transmit less than 1 bit of information. It is unclear how networks that control critical cell-fate decisions (e.g., cell division and apoptosis) can function with such low levels of information transfer. Here, we use theory, experiments, and numerical analysis to demonstrate an inherent trade-off between the information transferred in individual cells and the information available to control population-level responses. Noise in receptor-mediated apoptosis reduces information transfer to approximately 1 bit at the single-cell level but allows 3-4 bits of information to be transmitted at the population level. For processes such as eukaryotic chemotaxis, in which single cells are the functional unit, we find high levels of information transmission at a single-cell level. Thus, low levels of information transfer are unlikely to represent a physical limit. Instead, we propose that signaling networks exploit noise at the single-cell level to increase population-level information transfer, allowing extracellular ligands, whose levels are also subject to noise, to incrementally regulate phenotypic changes. This is particularly critical for discrete changes in fate (e.g., life vs. death) for which the key variable is the fraction of cells engaged. Our findings provide a framework for rationalizing the high levels of noise in metazoan signaling networks and have implications for the development of drugs that target these networks in the treatment of cancer and other diseases.
Assuntos
Modelos Biológicos , Transdução de Sinais/fisiologia , Fenômenos Biofísicos , Comunicação Celular , Simulação por Computador , Células HeLa , Humanos , Teoria da Informação , Canais Iônicos/efeitos dos fármacos , Canais Iônicos/fisiologia , Transdução de Sinais/efeitos dos fármacos , Biologia de Sistemas , Ligante Indutor de Apoptose Relacionado a TNF/farmacologia , Ligante Indutor de Apoptose Relacionado a TNF/fisiologiaRESUMO
BACKGROUND: For automated reading of scientific publications to extract useful information about molecular mechanisms it is critical that genes, proteins and other entities be correctly associated with uniform identifiers, a process known as named entity linking or "grounding." Correct grounding is essential for resolving relationships among mined information, curated interaction databases, and biological datasets. The accuracy of this process is largely dependent on the availability of machine-readable resources associating synonyms and abbreviations commonly found in biomedical literature with uniform identifiers. RESULTS: In a task involving automated reading of â¼215,000 articles using the REACH event extraction software we found that grounding was disproportionately inaccurate for multi-protein families (e.g., "AKT") and complexes with multiple subunits (e.g."NF- κB"). To address this problem we constructed FamPlex, a manually curated resource defining protein families and complexes as they are commonly encountered in biomedical text. In FamPlex the gene-level constituents of families and complexes are defined in a flexible format allowing for multi-level, hierarchical membership. To create FamPlex, text strings corresponding to entities were identified empirically from literature and linked manually to uniform identifiers; these identifiers were also mapped to equivalent entries in multiple related databases. FamPlex also includes curated prefix and suffix patterns that improve named entity recognition and event extraction. Evaluation of REACH extractions on a test corpus of â¼54,000 articles showed that FamPlex significantly increased grounding accuracy for families and complexes (from 15 to 71%). The hierarchical organization of entities in FamPlex also made it possible to integrate otherwise unconnected mechanistic information across families, subfamilies, and individual proteins. Applications of FamPlex to the TRIPS/DRUM reading system and the Biocreative VI Bioentity Normalization Task dataset demonstrated the utility of FamPlex in other settings. CONCLUSION: FamPlex is an effective resource for improving named entity recognition, grounding, and relationship resolution in automated reading of biomedical text. The content in FamPlex is available in both tabular and Open Biomedical Ontology formats at https://github.com/sorgerlab/famplex under the Creative Commons CC0 license and has been integrated into the TRIPS/DRUM and REACH reading systems.
Assuntos
Mineração de Dados/métodos , Proteínas/metabolismo , HumanosRESUMO
Word models (natural language descriptions of molecular mechanisms) are a common currency in spoken and written communication in biomedicine but are of limited use in predicting the behavior of complex biological networks. We present an approach to building computational models directly from natural language using automated assembly. Molecular mechanisms described in simple English are read by natural language processing algorithms, converted into an intermediate representation, and assembled into executable or network models. We have implemented this approach in the Integrated Network and Dynamical Reasoning Assembler (INDRA), which draws on existing natural language processing systems as well as pathway information in Pathway Commons and other online resources. We demonstrate the use of INDRA and natural language to model three biological processes of increasing scope: (i) p53 dynamics in response to DNA damage, (ii) adaptive drug resistance in BRAF-V600E-mutant melanomas, and (iii) the RAS signaling pathway. The use of natural language makes the task of developing a model more efficient and it increases model transparency, thereby promoting collaboration with the broader biology community.
Assuntos
Regulação Neoplásica da Expressão Gênica , Melanoma/genética , Modelos Genéticos , Processamento de Linguagem Natural , Redes Neurais de Computação , Neoplasias Cutâneas/genética , Antineoplásicos/uso terapêutico , Linhagem Celular Tumoral , Simulação por Computador , Dano ao DNA , Resistencia a Medicamentos Antineoplásicos/genética , Inibidores Enzimáticos/uso terapêutico , Humanos , Indóis/uso terapêutico , Idioma , Melanoma/tratamento farmacológico , Melanoma/metabolismo , Melanoma/patologia , Proteínas Proto-Oncogênicas B-raf/genética , Proteínas Proto-Oncogênicas B-raf/metabolismo , Proteínas Proto-Oncogênicas p21(ras)/genética , Proteínas Proto-Oncogênicas p21(ras)/metabolismo , Transdução de Sinais , Neoplasias Cutâneas/tratamento farmacológico , Neoplasias Cutâneas/metabolismo , Neoplasias Cutâneas/patologia , Sulfonamidas/uso terapêutico , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo , VemurafenibRESUMO
Most cellular processes rely on large multiprotein complexes that must assemble into a well-defined quaternary structure in order to function. A number of prominent examples, including the 20S core particle of the proteasome and the AAA+ family of ATPases, contain ring-like structures. Developing an understanding of the complex assembly pathways employed by ring-like structures requires a characterization of the problems these pathways have had to overcome as they evolved. In this work, we use computational models to uncover one such problem: a deadlocked plateau in the assembly dynamics. When the molecular interactions between subunits are too strong, this plateau leads to significant delays in assembly and a reduction in steady-state yield. Conversely, if the interactions are too weak, assembly delays are caused by the instability of crucial intermediates. Intermediate affinities thus maximize the efficiency of assembly for homomeric ring-like structures. In the case of heteromeric rings, we find that rings including at least one weak interaction can assemble efficiently and robustly. Estimation of affinities from solved structures of ring-like complexes indicates that heteromeric rings tend to contain a weak interaction, confirming our prediction. In addition to providing an evolutionary rationale for structural features of rings, our work forms the basis for understanding the complex assembly pathways of stacked rings like the proteasome and suggests principles that would aid in the design of synthetic ring-like structures that self-assemble efficiently.
Assuntos
Ligação Proteica , Modelos Moleculares , Conformação ProteicaRESUMO
Mathematical equations are fundamental to modeling biological networks, but as networks get large and revisions frequent, it becomes difficult to manage equations directly or to combine previously developed models. Multiple simultaneous efforts to create graphical standards, rule-based languages, and integrated software workbenches aim to simplify biological modeling but none fully meets the need for transparent, extensible, and reusable models. In this paper we describe PySB, an approach in which models are not only created using programs, they are programs. PySB draws on programmatic modeling concepts from little b and ProMot, the rule-based languages BioNetGen and Kappa and the growing library of Python numerical tools. Central to PySB is a library of macros encoding familiar biochemical actions such as binding, catalysis, and polymerization, making it possible to use a high-level, action-oriented vocabulary to construct detailed models. As Python programs, PySB models leverage tools and practices from the open-source software community, substantially advancing our ability to distribute and manage the work of testing biochemical hypotheses. We illustrate these ideas using new and previously published models of apoptosis.
Assuntos
Modelos Biológicos , Linguagens de Programação , Software , Apoptose/fisiologia , Simulação por Computador , Mitocôndrias/fisiologia , Proteínas Proto-Oncogênicas c-bcl-2/fisiologiaRESUMO
Introduction: The COVID-19 Disease Map project is a large-scale community effort uniting 277 scientists from 130 Institutions around the globe. We use high-quality, mechanistic content describing SARS-CoV-2-host interactions and develop interoperable bioinformatic pipelines for novel target identification and drug repurposing. Methods: Extensive community work allowed an impressive step forward in building interfaces between Systems Biology tools and platforms. Our framework can link biomolecules from omics data analysis and computational modelling to dysregulated pathways in a cell-, tissue- or patient-specific manner. Drug repurposing using text mining and AI-assisted analysis identified potential drugs, chemicals and microRNAs that could target the identified key factors. Results: Results revealed drugs already tested for anti-COVID-19 efficacy, providing a mechanistic context for their mode of action, and drugs already in clinical trials for treating other diseases, never tested against COVID-19. Discussion: The key advance is that the proposed framework is versatile and expandable, offering a significant upgrade in the arsenal for virus-host interactions and other complex pathologies.
Assuntos
COVID-19 , Humanos , SARS-CoV-2 , Reposicionamento de Medicamentos , Biologia de Sistemas , Simulação por ComputadorRESUMO
BACKGROUND: Mechanistic data is increasingly used in hazard identification of chemicals. However, the volume of data is large, challenging the efficient identification and clustering of relevant data. OBJECTIVES: We investigated whether evidence identification for hazard assessment can become more efficient and informed through an automated approach that combines machine reading of publications with network visualization tools. METHODS: We chose 13 chemicals that were evaluated by the International Agency for Research on Cancer (IARC) Monographs program incorporating the key characteristics of carcinogens (KCCs) approach. Using established literature search terms for KCCs, we retrieved and analyzed literature using Integrated Network and Dynamical Reasoning Assembler (INDRA). INDRA combines large-scale literature processing with pathway databases and extracts relationships between biomolecules, bioprocesses, and chemicals into statements (e.g., "benzene activates DNA damage"). These statements were subsequently assembled into networks and compared with the KCC evaluation by the IARC, to evaluate the informativeness of our approach. RESULTS: We found, in general, larger networks for those chemicals which the IARC has evaluated the evidence to be strong for KCC induction. Larger networks were not directly linked to publication count, given that we retrieved small networks for several chemicals with little support for KCC activation according to the IARC, despite the significant volume of literature for these specific chemicals. In addition, interpreting networks for genotoxicity and DNA repair showed concordance with the IARC KCC evaluation. DISCUSSION: Our method is an automated approach to condense mechanistic literature into searchable and interpretable networks based on an a priori ontology. The approach is no replacement of expert evaluation but, instead, provides an informed structure for experts to quickly identify which statements are made in which papers and how these could connect. We focused on the KCCs because these are supported by well-described search terms. The method needs to be tested in other frameworks as well to demonstrate its generalizability. https://doi.org/10.1289/EHP9112.
Assuntos
Carcinógenos , Neoplasias , Benzeno , Carcinógenos/toxicidade , Bases de Dados Factuais , Humanos , Neoplasias/induzido quimicamente , Neoplasias/epidemiologia , Medição de RiscoRESUMO
Individual cancers rely on distinct essential genes for their survival. The Cancer Dependency Map (DepMap) is an ongoing project to uncover these gene dependencies in hundreds of cancer cell lines. To make this drug discovery resource more accessible to the scientific community, we built an easy-to-use browser, shinyDepMap (https://labsyspharm.shinyapps.io/depmap). shinyDepMap combines CRISPR and shRNA data to determine, for each gene, the growth reduction caused by knockout/knockdown and the selectivity of this effect across cell lines. The tool also clusters genes with similar dependencies, revealing functional relationships. shinyDepMap can be used to (1) predict the efficacy and selectivity of drugs targeting particular genes; (2) identify maximally sensitive cell lines for testing a drug; (3) target hop, that is, navigate from an undruggable protein with the desired selectivity profile, such as an activated oncogene, to more druggable targets with a similar profile; and (4) identify novel pathways driving cancer cell growth and survival.
Assuntos
Biologia Computacional/métodos , Neoplasias/genética , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , Genes Essenciais , Humanos , Internet , Neoplasias/metabolismo , RNA Interferente Pequeno/genética , RNA Interferente Pequeno/metabolismo , SoftwareRESUMO
A bottleneck in high-throughput functional genomics experiments is identifying the most important genes and their relevant functions from a list of gene hits. Gene Ontology (GO) enrichment methods provide insight at the gene set level. Here, we introduce GeneWalk ( github.com/churchmanlab/genewalk ) that identifies individual genes and their relevant functions critical for the experimental setting under examination. After the automatic assembly of an experiment-specific gene regulatory network, GeneWalk uses representation learning to quantify the similarity between vector representations of each gene and its GO annotations, yielding annotation significance scores that reflect the experimental context. By performing gene- and condition-specific functional analysis, GeneWalk converts a list of genes into data-driven hypotheses.
Assuntos
Bases de Dados Genéticas , Redes Reguladoras de Genes , Animais , Biflavonoides , Encéfalo , Ontologia Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , RNA-Seq , TranscriptomaRESUMO
Making the knowledge contained in scientific papers machine-readable and formally computable would allow researchers to take full advantage of this information by enabling integration with other knowledge sources to support data analysis and interpretation. Here we describe Biofactoid, a web-based platform that allows scientists to specify networks of interactions between genes, their products, and chemical compounds, and then translates this information into a representation suitable for computational analysis, search and discovery. We also report the results of a pilot study to encourage the wide adoption of Biofactoid by the scientific community.
Assuntos
Biologia Computacional/métodos , Genômica/métodos , Biologia Computacional/instrumentação , Bases de Dados Factuais , Genômica/instrumentação , Projetos PilotoRESUMO
The mitogen-activated protein kinase (MAPK) pathway is a critical effector of oncogenic RAS signaling, and MAPK pathway inhibition may be an effective combination treatment strategy. We performed genome-scale loss-of-function CRISPR-Cas9 screens in the presence of a MEK1/2 inhibitor (MEKi) in KRAS-mutant pancreatic and lung cancer cell lines and identified genes that cooperate with MEK inhibition. While we observed heterogeneity in genetic modifiers of MEKi sensitivity across cell lines, several recurrent classes of synthetic lethal vulnerabilities emerged at the pathway level. Multiple members of receptor tyrosine kinase (RTK)-RAS-MAPK pathways scored as sensitizers to MEKi. In particular, we demonstrate that knockout, suppression, or degradation of SHOC2, a positive regulator of MAPK signaling, specifically cooperated with MEK inhibition to impair proliferation in RAS-driven cancer cells. The depletion of SHOC2 disrupted survival pathways triggered by feedback RTK signaling in response to MEK inhibition. Thus, these findings nominate SHOC2 as a potential target for combination therapy.