RESUMO
MOTIVATION: The interpretation of high-throughput datasets has remained one of the central challenges of computational biology over the past decade. Furthermore, as the amount of biological knowledge increases, it becomes more and more difficult to integrate this large body of knowledge in a meaningful manner. In this article, we propose a particular solution to both of these challenges. METHODS: We integrate available biological knowledge by constructing a network of molecular interactions of a specific kind: causal interactions. The resulting causal graph can be queried to suggest molecular hypotheses that explain the variations observed in a high-throughput gene expression experiment. We show that a simple scoring function can discriminate between a large number of competing molecular hypotheses about the upstream cause of the changes observed in a gene expression profile. We then develop an analytical method for computing the statistical significance of each score. This analytical method also helps assess the effects of random or adversarial noise on the predictive power of our model. RESULTS: Our results show that the causal graph we constructed from known biological literature is extremely robust to random noise and to missing or spurious information. We demonstrate the power of our causal reasoning model on two specific examples, one from a cancer dataset and the other from a cardiac hypertrophy experiment. We conclude that causal reasoning models provide a valuable addition to the biologist's toolkit for the interpretation of gene expression data. AVAILABILITY AND IMPLEMENTATION: R source code for the method is available upon request.
Assuntos
Neoplasias da Mama/genética , Cardiomegalia/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Algoritmos , Humanos , Modelos BiológicosRESUMO
The founder population of Newfoundland and Labrador (NL) is a unique genetic resource, in part due to its geographic and cultural isolation, where historical records describe a migration of European settlers, primarily from Ireland and England, to NL in the 18th and 19th centuries. Whilst its historical isolation, and increased prevalence of certain monogenic disorders are well appreciated, details of the fine-scale genetic structure and ancestry of the population are lacking. Understanding the genetic origins and background of functional, disease causing, genetic variants would aid genetic mapping efforts in the Province. Here, we leverage dense genome-wide SNP data on 1,807 NL individuals to reveal fine-scale genetic structure in NL that is clustered around coastal communities and correlated with Christian denomination. We show that the majority of NL European ancestry can be traced back to the south-east and south-west of Ireland and England, respectively. We date a substantial population size bottleneck approximately 10-15 generations ago in NL, associated with increased haplotype sharing and autozygosity. Our results reveal insights into the population history of NL and demonstrate evidence of a population conducive to further genetic studies and biomarker discovery.
Assuntos
Genética Populacional , População Branca , Humanos , Terra Nova e Labrador , Irlanda , Migração HumanaRESUMO
MOTIVATION: Models of regulatory networks become more difficult to construct and understand as they grow in size and complexity. Modelers naturally build large models from smaller components that each represent subsets of reactions within the larger network. To assist modelers in this process, we present model aggregation, which defines models in terms of components that are designed for the purpose of being combined. RESULTS: We have implemented a model editor that incorporates model aggregation, and we suggest supporting extensions to the Systems Biology Markup Language (SBML) Level 3. We illustrate aggregation with a model of the eukaryotic cell cycle 'engine' created from smaller pieces. AVAILABILITY: Java implementations are available in the JigCell Aggregation Connector. See http://jigcell.biol.vt.edu. CONTACT: shaffer@vt.edu
Assuntos
Biologia Computacional/métodos , Modelos Biológicos , Biologia de Sistemas/métodos , Redes Reguladoras de Genes , SoftwareRESUMO
We demonstrate how to model macromolecular regulatory networks with JigCell and the Parameter Estimation Toolkit (PET). These software tools are designed specifically to support the process typically used by systems biologists to model complex regulatory circuits. A detailed example illustrates how a model of the cell cycle in frog eggs is created and then refined through comparison of simulation output with experimental data. We show how parameter estimation tools automatically generate rate constants that fit a model to experimental data.
Assuntos
Biologia Computacional/métodos , Simulação por Computador , Redes Reguladoras de Genes , Modelos Biológicos , Software , Animais , Ciclo Celular , Xenopus laevis/fisiologiaRESUMO
Human pluripotent stem cells (hPSCs) generate a variety of disease-relevant cells that can be used to improve the translation of preclinical research. Despite the potential of hPSCs, their use for genetic screening has been limited by technical challenges. We developed a scalable and renewable Cas9 and sgRNA-hPSC library in which loss-of-function mutations can be induced at will. Our inducible mutant hPSC library can be used for multiple genome-wide CRISPR screens in a variety of hPSC-induced cell types. As proof of concept, we performed three screens for regulators of properties fundamental to hPSCs: their ability to self-renew and/or survive (fitness), their inability to survive as single-cell clones, and their capacity to differentiate. We identified the majority of known genes and pathways involved in these processes, as well as a plethora of genes with unidentified roles. This resource will increase the understanding of human development and genetics. This approach will be a powerful tool to identify disease-modifying genes and pathways.
Assuntos
Sistemas CRISPR-Cas/genética , Testes Genéticos/métodos , Genoma/genética , Células-Tronco Pluripotentes/metabolismo , HumanosRESUMO
Here we report Digital RNA with pertUrbation of Genes (DRUG-seq), a high-throughput platform for drug discovery. Pharmaceutical discovery relies on high-throughput screening, yet current platforms have limited readouts. RNA-seq is a powerful tool to investigate drug effects using transcriptome changes as a proxy, yet standard library construction is costly. DRUG-seq captures transcriptional changes detected in standard RNA-seq at 1/100th the cost. In proof-of-concept experiments profiling 433 compounds across 8 doses, transcription profiles generated from DRUG-seq successfully grouped compounds into functional clusters by mechanism of actions (MoAs) based on their intended targets. Perturbation differences reflected in transcriptome changes were detected for compounds engaging the same target, demonstrating the value of using DRUG-seq for understanding on and off-target activities. We demonstrate DRUG-seq captures common mechanisms, as well as differences between compound treatment and CRISPR on the same target. DRUG-seq provides a powerful tool for comprehensive transcriptome readout in a high-throughput screening environment.
Assuntos
Descoberta de Drogas/métodos , Perfilação da Expressão Gênica/métodos , Ensaios de Triagem em Larga Escala/métodos , Análise de Sequência de RNA , Linhagem Celular , Repetições Palindrômicas Curtas Agrupadas e Regularmente Espaçadas , HumanosRESUMO
CRISPR/Cas9 has revolutionized our ability to engineer genomes and conduct genome-wide screens in human cells1-3. Whereas some cell types are amenable to genome engineering, genomes of human pluripotent stem cells (hPSCs) have been difficult to engineer, with reduced efficiencies relative to tumour cell lines or mouse embryonic stem cells3-13. Here, using hPSC lines with stable integration of Cas9 or transient delivery of Cas9-ribonucleoproteins (RNPs), we achieved an average insertion or deletion (indel) efficiency greater than 80%. This high efficiency of indel generation revealed that double-strand breaks (DSBs) induced by Cas9 are toxic and kill most hPSCs. In previous studies, the toxicity of Cas9 in hPSCs was less apparent because of low transfection efficiency and subsequently low DSB induction3. The toxic response to DSBs was P53/TP53-dependent, such that the efficiency of precise genome engineering in hPSCs with a wild-type P53 gene was severely reduced. Our results indicate that Cas9 toxicity creates an obstacle to the high-throughput use of CRISPR/Cas9 for genome engineering and screening in hPSCs. Moreover, as hPSCs can acquire P53 mutations14, cell replacement therapies using CRISPR/Cas9-enginereed hPSCs should proceed with caution, and such engineered hPSCs should be monitored for P53 function.
Assuntos
Proteína 9 Associada à CRISPR/metabolismo , Sistemas CRISPR-Cas/genética , Engenharia Genética , Células-Tronco Pluripotentes/metabolismo , Proteína Supressora de Tumor p53/metabolismo , Inibidor de Quinase Dependente de Ciclina p21/genética , Inibidor de Quinase Dependente de Ciclina p21/metabolismo , Quebras de DNA de Cadeia Dupla , Deleção de Genes , Humanos , RNA Guia de Cinetoplastídeos/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Transcrição Gênica , Receptor fas/genética , Receptor fas/metabolismoRESUMO
Triglyceride accumulation is associated with obesity and type 2 diabetes. Genetic disruption of diacylglycerol acyltransferase 1 (DGAT1), which catalyzes the final reaction of triglyceride synthesis, confers dramatic resistance to high-fat diet induced obesity. Hence, DGAT1 is considered a potential therapeutic target for treating obesity and related metabolic disorders. However, the molecular events shaping the mechanism of action of DGAT1 pharmacological inhibition have not been fully explored yet. Here, we investigate the metabolic molecular mechanisms induced in response to pharmacological inhibition of DGAT1 using a recently developed computational systems biology approach, the Causal Reasoning Engine (CRE). The CRE algorithm utilizes microarray transcriptomic data and causal statements derived from the biomedical literature to infer upstream molecular events driving these transcriptional changes. The inferred upstream events (also called hypotheses) are aggregated into biological models using a set of analytical tools that allow for evaluation and integration of the hypotheses in context of their supporting evidence. In comparison to gene ontology enrichment analysis which pointed to high-level changes in metabolic processes, the CRE results provide detailed molecular hypotheses to explain the measured transcriptional changes. CRE analysis of gene expression changes in high fat habituated rats treated with a potent and selective DGAT1 inhibitor demonstrate that the majority of transcriptomic changes support a metabolic network indicative of reversal of high fat diet effects that includes a number of molecular hypotheses such as PPARG, HNF4A and SREBPs. Finally, the CRE-generated molecular hypotheses from DGAT1 inhibitor treated rats were found to capture the major molecular characteristics of DGAT1 deficient mice, supporting a phenotype of decreased lipid and increased insulin sensitivity.
Assuntos
Diacilglicerol O-Aciltransferase/antagonistas & inibidores , Inibidores Enzimáticos/farmacologia , Modelos Teóricos , Algoritmos , Animais , Comportamento Alimentar , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase , Ratos , Ratos Sprague-Dawley , Triglicerídeos/sangueRESUMO
Models of regulatory networks become more difficult to construct and understand as they grow in size and complexity. Large models are usually built up from smaller models, representing subsets of reactions within the larger network. To assist modelers in this composition process, we present a formal approach for model composition, a wizard-style program for implementing the approach, and suggested language extensions to the Systems Biology Markup Language to support model composition. To illustrate the features of our approach and how to use the JigCell Composition Wizard, we build up a model of the eukaryotic cell cycle "engine" from smaller pieces.
Assuntos
Quinases Ciclina-Dependentes/metabolismo , Modelos Biológicos , Biologia de Sistemas/métodos , Algoritmos , Ciclo Celular/fisiologia , Transdução de Sinais/fisiologia , Leveduras/metabolismo , Leveduras/fisiologiaRESUMO
Central to reconstruction of cis-regulatory networks is identification and classification of naturally occurring transcription factor-binding sites according to the genes that they control. We have examined salient characteristics of 9-mers that occur in various orders and combinations in the proximal promoters of human genes. In evaluations of a dataset derived with respect to experimentally defined transcription initiation sites, in some cases we observed a clear correspondence of highly ranked 9-mers with protein-binding sites in genomic DNA. Evaluations of the larger dataset, derived with respect to the 5' end of human ESTs, revealed that a subset of the highly ranked 9-mers corresponded to sites for several known transcription factor families (including CREB, ETS, EGR-1, SP1, KLF, MAZ, HIF-1, and STATs) that play important roles in the regulation of vertebrate genes. We identified several highly ranked CpG-containing 9-mers, defining sites for interactions with the CREB and ETS families of proteins, and identified potential target genes for these proteins. The results of the studies imply that the CpG-containing transcription factor-binding sites regulate the expression of genes with important roles in pathways leading to cell-type-specific gene expression and pathways controlled by the complex networks of signaling systems.