Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 107
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 23(1): 37, 2022 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-35021991

RESUMO

BACKGROUND: LINCS, "Library of Integrated Network-based Cellular Signatures", and IDG, "Illuminating the Druggable Genome", are both NIH projects and consortia that have generated rich datasets for the study of the molecular basis of human health and disease. LINCS L1000 expression signatures provide unbiased systems/omics experimental evidence. IDG provides compiled and curated knowledge for illumination and prioritization of novel drug target hypotheses. Together, these resources can support a powerful new approach to identifying novel drug targets for complex diseases, such as Parkinson's disease (PD), which continues to inflict severe harm on human health, and resist traditional research approaches. RESULTS: Integrating LINCS and IDG, we built the Knowledge Graph Analytics Platform (KGAP) to support an important use case: identification and prioritization of drug target hypotheses for associated diseases. The KGAP approach includes strong semantics interpretable by domain scientists and a robust, high performance implementation of a graph database and related analytical methods. Illustrating the value of our approach, we investigated results from queries relevant to PD. Approved PD drug indications from IDG's resource DrugCentral were used as starting points for evidence paths exploring chemogenomic space via LINCS expression signatures for associated genes, evaluated as target hypotheses by integration with IDG. The KG-analytic scoring function was validated against a gold standard dataset of genes associated with PD as elucidated, published mechanism-of-action drug targets, also from DrugCentral. IDG's resource TIN-X was used to rank and filter KGAP results for novel PD targets, and one, SYNGR3 (Synaptogyrin-3), was manually investigated further as a case study and plausible new drug target for PD. CONCLUSIONS: The synergy of LINCS and IDG, via KG methods, empowers graph analytics methods for the investigation of the molecular basis of complex diseases, and specifically for identification and prioritization of novel drug targets. The KGAP approach enables downstream applications via integration with resources similarly aligned with modern KG methodology. The generality of the approach indicates that KGAP is applicable to many disease areas, in addition to PD, the focus of this paper.


Assuntos
Doença de Parkinson , Biblioteca Gênica , Genoma , Humanos , Iluminação , Doença de Parkinson/tratamento farmacológico , Doença de Parkinson/genética , Reconhecimento Automatizado de Padrão
2.
Plant J ; 107(5): 1363-1386, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34160110

RESUMO

The photosynthetic capacity of mature leaves increases after several days' exposure to constant or intermittent episodes of high light (HL) and is manifested primarily as changes in chloroplast physiology. How this chloroplast-level acclimation to HL is initiated and controlled is unknown. From expanded Arabidopsis leaves, we determined HL-dependent changes in transcript abundance of 3844 genes in a 0-6 h time-series transcriptomics experiment. It was hypothesized that among such genes were those that contribute to the initiation of HL acclimation. By focusing on differentially expressed transcription (co-)factor genes and applying dynamic statistical modelling to the temporal transcriptomics data, a regulatory network of 47 predominantly photoreceptor-regulated transcription (co-)factor genes was inferred. The most connected gene in this network was B-BOX DOMAIN CONTAINING PROTEIN32 (BBX32). Plants overexpressing BBX32 were strongly impaired in acclimation to HL and displayed perturbed expression of photosynthesis-associated genes under LL and after exposure to HL. These observations led to demonstrating that as well as regulation of chloroplast-level acclimation by BBX32, CRYPTOCHROME1, LONG HYPOCOTYL5, CONSTITUTIVELY PHOTOMORPHOGENIC1 and SUPPRESSOR OF PHYA-105 are important. In addition, the BBX32-centric gene regulatory network provides a view of the transcriptional control of acclimation in mature leaves distinct from other photoreceptor-regulated processes, such as seedling photomorphogenesis.


Assuntos
Aclimatação/genética , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Proteínas de Transporte/metabolismo , Regulação da Expressão Gênica de Plantas , Transcriptoma , Aclimatação/efeitos da radiação , Arabidopsis/fisiologia , Arabidopsis/efeitos da radiação , Proteínas de Arabidopsis/genética , Teorema de Bayes , Proteínas de Transporte/genética , Cloroplastos/efeitos da radiação , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Luz , Fotossíntese/efeitos da radiação , Folhas de Planta/genética , Folhas de Planta/fisiologia , Folhas de Planta/efeitos da radiação
3.
Bioinformatics ; 37(21): 3865-3873, 2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34086846

RESUMO

MOTIVATION: Genome-wide association studies can reveal important genotype-phenotype associations; however, data quality and interpretability issues must be addressed. For drug discovery scientists seeking to prioritize targets based on the available evidence, these issues go beyond the single study. RESULTS: Here, we describe rational ranking, filtering and interpretation of inferred gene-trait associations and data aggregation across studies by leveraging existing curation and harmonization efforts. Each gene-trait association is evaluated for confidence, with scores derived solely from aggregated statistics, linking a protein-coding gene and phenotype. We propose a method for assessing confidence in gene-trait associations from evidence aggregated across studies, including a bibliometric assessment of scientific consensus based on the iCite relative citation ratio, and meanRank scores, to aggregate multivariate evidence.This method, intended for drug target hypothesis generation, scoring and ranking, has been implemented as an analytical pipeline, available as open source, with public datasets of results, and a web application designed for usability by drug discovery scientists. AVAILABILITY AND IMPLEMENTATION: Web application, datasets and source code via https://unmtid-shinyapps.net/tiga/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Iluminação , Genótipo , Polimorfismo de Nucleotídeo Único , Fenótipo
4.
J Med Internet Res ; 23(11): e31337, 2021 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-34581671

RESUMO

BACKGROUND: The COVID-19 pandemic has highlighted the inability of health systems to leverage existing system infrastructure in order to rapidly develop and apply broad analytical tools that could inform state- and national-level policymaking, as well as patient care delivery in hospital settings. The COVID-19 pandemic has also led to highlighted systemic disparities in health outcomes and access to care based on race or ethnicity, gender, income-level, and urban-rural divide. Although the United States seems to be recovering from the COVID-19 pandemic owing to widespread vaccination efforts and increased public awareness, there is an urgent need to address the aforementioned challenges. OBJECTIVE: This study aims to inform the feasibility of leveraging broad, statewide datasets for population health-driven decision-making by developing robust analytical models that predict COVID-19-related health care resource utilization across patients served by Indiana's statewide Health Information Exchange. METHODS: We leveraged comprehensive datasets obtained from the Indiana Network for Patient Care to train decision forest-based models that can predict patient-level need of health care resource utilization. To assess these models for potential biases, we tested model performance against subpopulations stratified by age, race or ethnicity, gender, and residence (urban vs rural). RESULTS: For model development, we identified a cohort of 96,026 patients from across 957 zip codes in Indiana, United States. We trained the decision models that predicted health care resource utilization by using approximately 100 of the most impactful features from a total of 1172 features created. Each model and stratified subpopulation under test reported precision scores >70%, accuracy and area under the receiver operating curve scores >80%, and sensitivity scores approximately >90%. We noted statistically significant variations in model performance across stratified subpopulations identified by age, race or ethnicity, gender, and residence (urban vs rural). CONCLUSIONS: This study presents the possibility of developing decision models capable of predicting patient-level health care resource utilization across a broad, statewide region with considerable predictive performance. However, our models present statistically significant variations in performance across stratified subpopulations of interest. Further efforts are necessary to identify root causes of these biases and to rectify them.


Assuntos
COVID-19 , Troca de Informação em Saúde , Humanos , Pandemias , Aceitação pelo Paciente de Cuidados de Saúde , SARS-CoV-2 , Estados Unidos
5.
BMC Bioinformatics ; 20(1): 306, 2019 Jun 10.
Artigo em Inglês | MEDLINE | ID: mdl-31238875

RESUMO

BACKGROUND: Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous graphs, an important current challenge is extending this methodology for richly heterogeneous graphs and knowledge domains. The biomedical sciences are such a domain, reflecting the complexity of biology, with entities such as genes, proteins, drugs, diseases, and phenotypes, and relationships such as gene co-expression, biochemical regulation, and biomolecular inhibition or activation. Therefore, the semantics of edges and nodes are critical for representation learning and knowledge discovery in real world biomedical problems. RESULTS: In this paper, we propose the edge2vec model, which represents graphs considering edge semantics. An edge-type transition matrix is trained by an Expectation-Maximization approach, and a stochastic gradient descent model is employed to learn node embedding on a heterogeneous graph via the trained transition matrix. edge2vec is validated on three biomedical domain tasks: biomedical entity classification, compound-gene bioactivity prediction, and biomedical information retrieval. Results show that by considering edge-types into node embedding learning in heterogeneous graphs, edge2vec significantly outperforms state-of-the-art models on all three tasks. CONCLUSIONS: We propose this method for its added value relative to existing graph analytical methodology, and in the real world context of biomedical knowledge discovery applicability.


Assuntos
Informática/métodos , Conhecimento , Aprendizagem , Algoritmos , Pesquisa Biomédica , Humanos , Redes Neurais de Computação , Semântica
6.
Bioinformatics ; 34(5): 884-886, 2018 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-29126246

RESUMO

Summary: Every year, a large number of novel algorithms are introduced to the scientific community for a myriad of applications, but using these across different research groups is often troublesome, due to suboptimal implementations and specific dependency requirements. This does not have to be the case, as public cloud computing services can easily house tractable implementations within self-contained dependency environments, making the methods easily accessible to a wider public. We have taken 14 popular methods, the majority related to expression data or promoter analysis, developed these up to a good implementation standard and housed the tools in isolated Docker containers which we integrated into the CyVerse Discovery Environment, making these easily usable for a wide community as part of the CyVerse UK project. Availability and implementation: The integrated apps can be found at http://www.cyverse.org/discovery-environment, while the raw code is available at https://github.com/cyversewarwick and the corresponding Docker images are housed at https://hub.docker.com/r/cyversewarwick/. Contact: info@cyverse.warwick.ac.uk or D.L.Wild@warwick.ac.uk. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Computação em Nuvem , Biologia Computacional/métodos , Regulação da Expressão Gênica , Regiões Promotoras Genéticas , Software , Algoritmos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos
7.
Plant Cell ; 28(2): 345-66, 2016 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-26842464

RESUMO

In Arabidopsis thaliana, changes in metabolism and gene expression drive increased drought tolerance and initiate diverse drought avoidance and escape responses. To address regulatory processes that link these responses, we set out to identify genes that govern early responses to drought. To do this, a high-resolution time series transcriptomics data set was produced, coupled with detailed physiological and metabolic analyses of plants subjected to a slow transition from well-watered to drought conditions. A total of 1815 drought-responsive differentially expressed genes were identified. The early changes in gene expression coincided with a drop in carbon assimilation, and only in the late stages with an increase in foliar abscisic acid content. To identify gene regulatory networks (GRNs) mediating the transition between the early and late stages of drought, we used Bayesian network modeling of differentially expressed transcription factor (TF) genes. This approach identified AGAMOUS-LIKE22 (AGL22), as key hub gene in a TF GRN. It has previously been shown that AGL22 is involved in the transition from vegetative state to flowering but here we show that AGL22 expression influences steady state photosynthetic rates and lifetime water use. This suggests that AGL22 uniquely regulates a transcriptional network during drought stress, linking changes in primary metabolism and the initiation of stress responses.


Assuntos
Ácido Abscísico/metabolismo , Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Reguladores de Crescimento de Plantas/metabolismo , Fatores de Transcrição/metabolismo , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/fisiologia , Proteínas de Arabidopsis/genética , Teorema de Bayes , Análise por Conglomerados , Secas , Redes Reguladoras de Genes , Mutação , Fenótipo , Fotossíntese/fisiologia , Estresse Fisiológico , Fatores de Transcrição/genética
8.
BMC Bioinformatics ; 19(1): 265, 2018 07 16.
Artigo em Inglês | MEDLINE | ID: mdl-30012095

RESUMO

BACKGROUND: Netpredictor is an R package for prediction of missing links in any given unipartite or bipartite network. The package provides utilities to compute missing links in a bipartite and well as unipartite networks using Random Walk with Restart and Network inference algorithm and a combination of both. The package also allows computation of Bipartite network properties, visualization of communities for two different sets of nodes, and calculation of significant interactions between two sets of nodes using permutation based testing. The application can also be used to search for top-K shortest paths between interactome and use enrichment analysis for disease, pathway and ontology. The R standalone package (including detailed introductory vignettes) and associated R Shiny web application is available under the GPL-2 Open Source license and is freely available to download. RESULTS: We compared different algorithms performance in different small datasets and found random walk supersedes rest of the algorithms. The package is developed to perform network based prediction of unipartite and bipartite networks and use the results to understand the functionality of proteins in an interactome using enrichment analysis. CONCLUSION: The rapid application development envrionment like shiny, helps non programmers to develop fast rich visualization apps and we beleieve it would continue to grow in future with further enhancements. We plan to update our algorithms in the package in near future and help scientist to analyse data in a much streamlined fashion.


Assuntos
Algoritmos , Sistemas de Liberação de Medicamentos , Ontologia Genética , Mapas de Interação de Proteínas , Software
9.
Plant Cell ; 27(11): 3038-64, 2015 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-26566919

RESUMO

Transcriptional reprogramming is integral to effective plant defense. Pathogen effectors act transcriptionally and posttranscriptionally to suppress defense responses. A major challenge to understanding disease and defense responses is discriminating between transcriptional reprogramming associated with microbial-associated molecular pattern (MAMP)-triggered immunity (MTI) and that orchestrated by effectors. A high-resolution time course of genome-wide expression changes following challenge with Pseudomonas syringae pv tomato DC3000 and the nonpathogenic mutant strain DC3000hrpA- allowed us to establish causal links between the activities of pathogen effectors and suppression of MTI and infer with high confidence a range of processes specifically targeted by effectors. Analysis of this information-rich data set with a range of computational tools provided insights into the earliest transcriptional events triggered by effector delivery, regulatory mechanisms recruited, and biological processes targeted. We show that the majority of genes contributing to disease or defense are induced within 6 h postinfection, significantly before pathogen multiplication. Suppression of chloroplast-associated genes is a rapid MAMP-triggered defense response, and suppression of genes involved in chromatin assembly and induction of ubiquitin-related genes coincide with pathogen-induced abscisic acid accumulation. Specific combinations of promoter motifs are engaged in fine-tuning the MTI response and active transcriptional suppression at specific promoter configurations by P. syringae.


Assuntos
Arabidopsis/imunologia , Terapia de Imunossupressão , Moléculas com Motivos Associados a Patógenos/metabolismo , Imunidade Vegetal/genética , Folhas de Planta/imunologia , Pseudomonas syringae/fisiologia , Transcrição Gênica , Arabidopsis/genética , Arabidopsis/microbiologia , Sequência de Bases , Cromatina/metabolismo , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Ontologia Genética , Redes Reguladoras de Genes , Genes de Plantas , Dados de Sequência Molecular , Motivos de Nucleotídeos/genética , Doenças das Plantas/genética , Doenças das Plantas/imunologia , Doenças das Plantas/microbiologia , Folhas de Planta/genética , Folhas de Planta/microbiologia , Regiões Promotoras Genéticas/genética , Pseudomonas syringae/crescimento & desenvolvimento , Fatores de Transcrição/metabolismo
10.
Stat Appl Genet Mol Biol ; 15(1): 83-6, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26910751

RESUMO

The integration of multi-dimensional datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct--but often complementary--information. However, the large amount of data adds burden to any inference task. Flexible Bayesian methods may reduce the necessity for strong modelling assumptions, but can also increase the computational burden. We present an improved implementation of a Bayesian correlated clustering algorithm, that permits integrated clustering to be routinely performed across multiple datasets, each with tens of thousands of items. By exploiting GPU based computation, we are able to improve runtime performance of the algorithm by almost four orders of magnitude. This permits analysis across genomic-scale data sets, greatly expanding the range of applications over those originally possible. MDI is available here: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Algoritmos , Análise por Conglomerados , Cadeias de Markov , Método de Monte Carlo , Software , Biologia de Sistemas/métodos
11.
Bioinformatics ; 31(12): i97-105, 2015 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-26072515

RESUMO

MOTIVATION: The ability to jointly learn gene regulatory networks (GRNs) in, or leverage GRNs between related species would allow the vast amount of legacy data obtained in model organisms to inform the GRNs of more complex, or economically or medically relevant counterparts. Examples include transferring information from Arabidopsis thaliana into related crop species for food security purposes, or from mice into humans for medical applications. Here we develop two related Bayesian approaches to network inference that allow GRNs to be jointly inferred in, or leveraged between, several related species: in one framework, network information is directly propagated between species; in the second hierarchical approach, network information is propagated via an unobserved 'hypernetwork'. In both frameworks, information about network similarity is captured via graph kernels, with the networks additionally informed by species-specific time series gene expression data, when available, using Gaussian processes to model the dynamics of gene expression. RESULTS: Results on in silico benchmarks demonstrate that joint inference, and leveraging of known networks between species, offers better accuracy than standalone inference. The direct propagation of network information via the non-hierarchical framework is more appropriate when there are relatively few species, while the hierarchical approach is better suited when there are many species. Both methods are robust to small amounts of mislabelling of orthologues. Finally, the use of Saccharomyces cerevisiae data and networks to inform inference of networks in the budding yeast Schizosaccharomyces pombe predicts a novel role in cell cycle regulation for Gas1 (SPAC19B12.02c), a 1,3-beta-glucanosyltransferase. AVAILABILITY AND IMPLEMENTATION: MATLAB code is available from http://go.warwick.ac.uk/systemsbiology/software/.


Assuntos
Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Algoritmos , Teorema de Bayes , Ciclo Celular/genética , Simulação por Computador , Modelos Genéticos , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Software
12.
Stat Appl Genet Mol Biol ; 14(3): 307-10, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26030796

RESUMO

Here we introduce the causal structure identification (CSI) package, a Gaussian process based approach to inferring gene regulatory networks (GRNs) from multiple time series data. The standard CSI approach infers a single GRN via joint learning from multiple time series datasets; the hierarchical approach (HCSI) infers a separate GRN for each dataset, albeit with the networks constrained to favor similar structures, allowing for the identification of context specific networks. The software is implemented in MATLAB and includes a graphical user interface (GUI) for user friendly inference. Finally the GUI can be connected to high performance computer clusters to facilitate analysis of large genomic datasets.


Assuntos
Perfilação da Expressão Gênica/métodos , Software , Teorema de Bayes , Regulação da Expressão Gênica , Redes Reguladoras de Genes
13.
Plant Cell ; 24(9): 3530-57, 2012 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23023172

RESUMO

Transcriptional reprogramming forms a major part of a plant's response to pathogen infection. Many individual components and pathways operating during plant defense have been identified, but our knowledge of how these different components interact is still rudimentary. We generated a high-resolution time series of gene expression profiles from a single Arabidopsis thaliana leaf during infection by the necrotrophic fungal pathogen Botrytis cinerea. Approximately one-third of the Arabidopsis genome is differentially expressed during the first 48 h after infection, with the majority of changes in gene expression occurring before significant lesion development. We used computational tools to obtain a detailed chronology of the defense response against B. cinerea, highlighting the times at which signaling and metabolic processes change, and identify transcription factor families operating at different times after infection. Motif enrichment and network inference predicted regulatory interactions, and testing of one such prediction identified a role for TGA3 in defense against necrotrophic pathogens. These data provide an unprecedented level of detail about transcriptional changes during a defense response and are suited to systems biology analyses to generate predictive models of the gene regulatory networks mediating the Arabidopsis response to B. cinerea.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiologia , Regulação da Expressão Gênica de Plantas/genética , Genoma de Planta/genética , Doenças das Plantas/imunologia , Arabidopsis/imunologia , Arabidopsis/metabolismo , Arabidopsis/microbiologia , Botrytis/crescimento & desenvolvimento , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Modelos Genéticos , Mutação , Motivos de Nucleotídeos , Análise de Sequência com Séries de Oligonucleotídeos , Doenças das Plantas/microbiologia , Imunidade Vegetal , Folhas de Planta/genética , Folhas de Planta/metabolismo , Folhas de Planta/microbiologia , Regiões Promotoras Genéticas/genética , Transdução de Sinais , Fatores de Tempo , Fatores de Transcrição/genética , Transcriptoma
14.
Plant J ; 75(1): 26-39, 2013 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23578292

RESUMO

A model is presented describing the gene regulatory network surrounding three similar NAC transcription factors that have roles in Arabidopsis leaf senescence and stress responses. ANAC019, ANAC055 and ANAC072 belong to the same clade of NAC domain genes and have overlapping expression patterns. A combination of promoter DNA/protein interactions identified using yeast 1-hybrid analysis and modelling using gene expression time course data has been applied to predict the regulatory network upstream of these genes. Similarities and divergence in regulation during a variety of stress responses are predicted by different combinations of upstream transcription factors binding and also by the modelling. Mutant analysis with potential upstream genes was used to test and confirm some of the predicted interactions. Gene expression analysis in mutants of ANAC019 and ANAC055 at different times during leaf senescence has revealed a distinctly different role for each of these genes. Yeast 1-hybrid analysis is shown to be a valuable tool that can distinguish clades of binding proteins and be used to test and quantify protein binding to predicted promoter motifs.


Assuntos
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Botrytis/fisiologia , Regulação da Expressão Gênica de Plantas , Estresse Fisiológico , Arabidopsis/fisiologia , Proteínas de Arabidopsis/metabolismo , Senescência Celular , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Mutação , Análise de Sequência com Séries de Oligonucleotídeos , Doenças das Plantas/microbiologia , Folhas de Planta/genética , Folhas de Planta/fisiologia , Plantas Geneticamente Modificadas , Regiões Promotoras Genéticas/genética , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Técnicas do Sistema de Duplo-Híbrido
15.
Bioinformatics ; 29(5): 580-7, 2013 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-23314126

RESUMO

MOTIVATION: The problem of ab initio protein folding is one of the most difficult in modern computational biology. The prediction of residue contacts within a protein provides a more tractable immediate step. Recently introduced maximum entropy-based correlated mutation measures (CMMs), such as direct information, have been successful in predicting residue contacts. However, most correlated mutation studies focus on proteins that have large good-quality multiple sequence alignments (MSA) because the power of correlated mutation analysis falls as the size of the MSA decreases. However, even with small autogenerated MSAs, maximum entropy-based CMMs contain information. To make use of this information, in this article, we focus not on general residue contacts but contacts between residues in ß-sheets. The strong constraints and prior knowledge associated with ß-contacts are ideally suited for prediction using a method that incorporates an often noisy CMM. RESULTS: Using contrastive divergence, a statistical machine learning technique, we have calculated a maximum entropy-based CMM. We have integrated this measure with a new probabilistic model for ß-contact prediction, which is used to predict both residue- and strand-level contacts. Using our model on a standard non-redundant dataset, we significantly outperform a 2D recurrent neural network architecture, achieving a 5% improvement in true positives at the 5% false-positive rate at the residue level. At the strand level, our approach is competitive with the state-of-the-art single methods achieving precision of 61.0% and recall of 55.4%, while not requiring residue solvent accessibility as an input. AVAILABILITY: http://www2.warwick.ac.uk/fac/sci/systemsbiology/research/software/


Assuntos
Inteligência Artificial , Modelos Estatísticos , Estrutura Secundária de Proteína , Entropia , Modelos Moleculares , Mutação , Redes Neurais de Computação , Dobramento de Proteína , Proteínas/química , Proteínas/genética , Alinhamento de Sequência , Análise de Sequência de Proteína
16.
Plant Cell ; 23(3): 873-94, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21447789

RESUMO

Leaf senescence is an essential developmental process that impacts dramatically on crop yields and involves altered regulation of thousands of genes and many metabolic and signaling pathways, resulting in major changes in the leaf. The regulation of senescence is complex, and although senescence regulatory genes have been characterized, there is little information on how these function in the global control of the process. We used microarray analysis to obtain a high-resolution time-course profile of gene expression during development of a single leaf over a 3-week period to senescence. A complex experimental design approach and a combination of methods were used to extract high-quality replicated data and to identify differentially expressed genes. The multiple time points enable the use of highly informative clustering to reveal distinct time points at which signaling and metabolic pathways change. Analysis of motif enrichment, as well as comparison of transcription factor (TF) families showing altered expression over the time course, identify clear groups of TFs active at different stages of leaf development and senescence. These data enable connection of metabolic processes, signaling pathways, and specific TF activity, which will underpin the development of network models to elucidate the process of senescence.


Assuntos
Proteínas de Arabidopsis/análise , Arabidopsis/genética , Regulação da Expressão Gênica de Plantas , Folhas de Planta/metabolismo , Análise de Variância , Arabidopsis/crescimento & desenvolvimento , Arabidopsis/metabolismo , Proteínas de Arabidopsis/genética , Clorofila/análise , Análise por Conglomerados , Perfilação da Expressão Gênica , Análise em Microsséries/métodos , Modelos Biológicos , Família Multigênica , Reguladores de Crescimento de Plantas/análise , Folhas de Planta/genética , Folhas de Planta/crescimento & desenvolvimento , Regiões Promotoras Genéticas , RNA de Plantas/genética , Fatores de Transcrição/metabolismo
17.
Tuberculosis (Edinb) ; 146: 102500, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38432118

RESUMO

Tuberculosis (TB) is still a major global health challenge, killing over 1.5 million people each year, and hence, there is a need to identify and develop novel treatments for Mycobacterium tuberculosis (M. tuberculosis). The prevalence of infections caused by nontuberculous mycobacteria (NTM) is also increasing and has overtaken TB cases in the United States and much of the developed world. Mycobacterium abscessus (M. abscessus) is one of the most frequently encountered NTM and is difficult to treat. We describe the use of drug-disease association using a semantic knowledge graph approach combined with machine learning models that has enabled the identification of several molecules for testing anti-mycobacterial activity. We established that niclosamide (M. tuberculosis IC90 2.95 µM; M. abscessus IC90 59.1 µM) and tribromsalan (M. tuberculosis IC90 76.92 µM; M. abscessus IC90 147.4 µM) inhibit M. tuberculosis and M. abscessus in vitro. To investigate the mode of action, we determined the transcriptional response of M. tuberculosis and M. abscessus to both compounds in axenic log phase, demonstrating a broad effect on gene expression that differed from known M. tuberculosis inhibitors. Both compounds elicited transcriptional responses indicative of respiratory pathway stress and the dysregulation of fatty acid metabolism.


Assuntos
Infecções por Mycobacterium não Tuberculosas , Mycobacterium abscessus , Mycobacterium tuberculosis , Salicilanilidas , Tuberculose , Humanos , Mycobacterium tuberculosis/genética , Infecções por Mycobacterium não Tuberculosas/microbiologia , Niclosamida/farmacologia , Reposicionamento de Medicamentos , Micobactérias não Tuberculosas/genética , Tuberculose/tratamento farmacológico , Tuberculose/microbiologia
18.
Bioinformatics ; 28(12): i233-41, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22689766

RESUMO

MOTIVATION: The generation of time series transcriptomic datasets collected under multiple experimental conditions has proven to be a powerful approach for disentangling complex biological processes, allowing for the reverse engineering of gene regulatory networks (GRNs). Most methods for reverse engineering GRNs from multiple datasets assume that each of the time series were generated from networks with identical topology. In this study, we outline a hierarchical, non-parametric Bayesian approach for reverse engineering GRNs using multiple time series that can be applied in a number of novel situations including: (i) where different, but overlapping sets of transcription factors are expected to bind in the different experimental conditions; that is, where switching events could potentially arise under the different treatments and (ii) for inference in evolutionary related species in which orthologous GRNs exist. More generally, the method can be used to identify context-specific regulation by leveraging time series gene expression data alongside methods that can identify putative lists of transcription factors or transcription factor targets. RESULTS: The hierarchical inference outperforms related (but non-hierarchical) approaches when the networks used to generate the data were identical, and performs comparably even when the networks used to generate data were independent. The method was subsequently used alongside yeast one hybrid and microarray time series data to infer potential transcriptional switches in Arabidopsis thaliana response to stress. The results confirm previous biological studies and allow for additional insights into gene regulation under various abiotic stresses. AVAILABILITY: The methods outlined in this article have been implemented in Matlab and are available on request.


Assuntos
Teorema de Bayes , Redes Reguladoras de Genes , Estatísticas não Paramétricas , Algoritmos , Arabidopsis/genética , Regulação da Expressão Gênica , Modelos Teóricos , Fatores de Transcrição/genética , Técnicas do Sistema de Duplo-Híbrido
19.
Bioinformatics ; 28(24): 3290-7, 2012 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-23047558

RESUMO

MOTIVATION: The integration of multiple datasets remains a key challenge in systems biology and genomic medicine. Modern high-throughput technologies generate a broad array of different data types, providing distinct-but often complementary-information. We present a Bayesian method for the unsupervised integrative modelling of multiple datasets, which we refer to as MDI (Multiple Dataset Integration). MDI can integrate information from a wide range of different datasets and data types simultaneously (including the ability to model time series data explicitly using Gaussian processes). Each dataset is modelled using a Dirichlet-multinomial allocation (DMA) mixture model, with dependencies between these models captured through parameters that describe the agreement among the datasets. RESULTS: Using a set of six artificially constructed time series datasets, we show that MDI is able to integrate a significant number of datasets simultaneously, and that it successfully captures the underlying structural similarity between the datasets. We also analyse a variety of real Saccharomyces cerevisiae datasets. In the two-dataset case, we show that MDI's performance is comparable with the present state-of-the-art. We then move beyond the capabilities of current approaches and integrate gene expression, chromatin immunoprecipitation-chip and protein-protein interaction data, to identify a set of protein complexes for which genes are co-regulated during the cell cycle. Comparisons to other unsupervised data integration techniques-as well as to non-integrative approaches-demonstrate that MDI is competitive, while also providing information that would be difficult or impossible to extract using other methods.


Assuntos
Genômica/métodos , Modelos Estatísticos , Teorema de Bayes , Imunoprecipitação da Cromatina , Análise por Conglomerados , Expressão Gênica , Perfilação da Expressão Gênica/métodos , Distribuição Normal , Análise de Sequência com Séries de Oligonucleotídeos , Mapeamento de Interação de Proteínas , Saccharomyces cerevisiae/genética , Biologia de Sistemas
20.
PLoS Comput Biol ; 8(7): e1002574, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22859915

RESUMO

The rapidly increasing amount of public data in chemistry and biology provides new opportunities for large-scale data mining for drug discovery. Systematic integration of these heterogeneous sets and provision of algorithms to data mine the integrated sets would permit investigation of complex mechanisms of action of drugs. In this work we integrated and annotated data from public datasets relating to drugs, chemical compounds, protein targets, diseases, side effects and pathways, building a semantic linked network consisting of over 290,000 nodes and 720,000 edges. We developed a statistical model to assess the association of drug target pairs based on their relation with other linked objects. Validation experiments demonstrate the model can correctly identify known direct drug target pairs with high precision. Indirect drug target pairs (for example drugs which change gene expression level) are also identified but not as strongly as direct pairs. We further calculated the association scores for 157 drugs from 10 disease areas against 1683 human targets, and measured their similarity using a [Formula: see text] score matrix. The similarity network indicates that drugs from the same disease area tend to cluster together in ways that are not captured by structural similarity, with several potential new drug pairings being identified. This work thus provides a novel, validated alternative to existing drug target prediction algorithms. The web service is freely available at: http://chem2bio2rdf.org/slap.


Assuntos
Mineração de Dados/métodos , Bases de Dados Factuais , Descoberta de Drogas/métodos , Semântica , Algoritmos , Biologia Computacional/métodos , Humanos , Modelos Teóricos , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA