Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Nucleic Acids Res ; 42(Database issue): D926-32, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24304889

RESUMEN

Expression Atlas (http://www.ebi.ac.uk/gxa) is a value-added database providing information about gene, protein and splice variant expression in different cell types, organism parts, developmental stages, diseases and other biological and experimental conditions. The database consists of selected high-quality microarray and RNA-sequencing experiments from ArrayExpress that have been manually curated, annotated with Experimental Factor Ontology terms and processed using standardized microarray and RNA-sequencing analysis methods. The new version of Expression Atlas introduces the concept of 'baseline' expression, i.e. gene and splice variant abundance levels in healthy or untreated conditions, such as tissues or cell types. Differential gene expression data benefit from an in-depth curation of experimental intent, resulting in biologically meaningful 'contrasts', i.e. instances of differential pairwise comparisons between two sets of biological replicates. Other novel aspects of Expression Atlas are its strict quality control of raw experimental data, up-to-date RNA-sequencing analysis methods, expression data at the level of gene sets, as well as genes and a more powerful search interface designed to maximize the biological value provided to the user.


Asunto(s)
Bases de Datos Genéticas , Perfilación de la Expresión Génica , Genómica , Humanos , Internet , Análisis de Secuencia por Matrices de Oligonucleótidos , Proteínas/genética , Proteínas/metabolismo , Isoformas de ARN/metabolismo , Análisis de Secuencia de ARN
2.
Bioinformatics ; 30(7): 1034-5, 2014 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-24363379

RESUMEN

SUMMARY: Ondex Web is a new web-based implementation of the network visualization and exploration tools from the Ondex data integration platform. New features such as context-sensitive menus and annotation tools provide users with intuitive ways to explore and manipulate the appearance of heterogeneous biological networks. Ondex Web is open source, written in Java and can be easily embedded into Web sites as an applet. Ondex Web supports loading data from a variety of network formats, such as XGMML, NWB, Pajek and OXL. AVAILABILITY AND IMPLEMENTATION: http://ondex.rothamsted.ac.uk/OndexWeb.


Asunto(s)
Biología/métodos , Programas Informáticos , Minería de Datos , Internet , Redes y Vías Metabólicas
3.
BMC Bioinformatics ; 12: 203, 2011 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-21612636

RESUMEN

BACKGROUND: Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems. RESULTS: We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in Arabidopsis thaliana. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters. CONCLUSIONS: Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.


Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Metabolómica/métodos , Algoritmos , Proteínas de Arabidopsis/genética , Análisis por Conglomerados , Bases de Datos Genéticas , Cadenas de Markov , Redes y Vías Metabólicas
4.
Brief Bioinform ; 10(6): 676-93, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19933213

RESUMEN

The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Mapeo Cromosómico/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genoma de Planta/genética , Almacenamiento y Recuperación de la Información/métodos , Mapeo de Interacción de Proteínas/métodos , Integración de Sistemas
5.
Nucleic Acids Res ; 35(Web Server issue): W148-51, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17439966

RESUMEN

Wheat biologists face particular problems because of the lack of genomic sequence and the three homoeologous genomes which give rise to three very similar forms for many transcripts. However, over 1.3 million available public-domain Triticeae ESTs (of which approximately 850,000 are wheat) and the full rice genomic sequence can be used to estimate likely transcript sequences present in any wheat cDNA sample to which PCR primers may then be designed. Wheat Estimated Transcript Server (WhETS) is designed to do this in a convenient form, and to provide information on the number of matching EST and high quality cDNA (hq-cDNA) sequences, tissue distribution and likely intron position inferred from rice. Triticeae EST and hq-cDNA sequences are mapped onto rice loci and stored in a database. The user selects a rice locus (directly or via Arabidopsis) and the matching Triticeae sequences are assembled according to user-defined filter and stringency settings. Assembly is achieved initially with the CAP3 program and then with a single nucleotide polymorphism (SNP)-analysis algorithm designed to separate homoeologues. Alignment of the resulting contigs and singlets against the rice template sequence is then displayed. Sequences and assembly details are available for download in fasta and ace formats, respectively. WhETS is accessible at http://www4.rothamsted.bbsrc.ac.uk/whets.


Asunto(s)
Mapeo Cromosómico , Cromosomas de las Plantas/genética , Biología Computacional/métodos , Etiquetas de Secuencia Expresada , Ploidias , Triticum/genética , Bases de Datos Genéticas , Genes de Plantas , Genoma de Planta , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Alineación de Secuencia
6.
Drug Discov Today ; 24(4): 933-938, 2019 04.
Artículo en Inglés | MEDLINE | ID: mdl-30690198

RESUMEN

Biopharmaceutical industry R&D, and indeed other life sciences R&D such as biomedical, environmental, agricultural and food production, is becoming increasingly data-driven and can significantly improve its efficiency and effectiveness by implementing the FAIR (findable, accessible, interoperable, reusable) guiding principles for scientific data management and stewardship. By so doing, the plethora of new and powerful analytical tools such as artificial intelligence and machine learning will be able, automatically and at scale, to access the data from which they learn, and on which they thrive. FAIR is a fundamental enabler for digital transformation.


Asunto(s)
Manejo de Datos , Industria Farmacéutica , Productos Biológicos , Investigación Biomédica
7.
Appl Transl Genom ; 11: 18-26, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28018846

RESUMEN

The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.

9.
Front Genet ; 5: 21, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24600467

RESUMEN

Network inference utilizes experimental high-throughput data for the reconstruction of molecular interaction networks where new relationships between the network entities can be predicted. Despite the increasing amount of experimental data, the parameters of each modeling technique cannot be optimized based on the experimental data alone, but needs to be qualitatively assessed if the components of the resulting network describe the experimental setting. Candidate list prioritization and validation builds upon data integration and data visualization. The application of tools supporting this procedure is limited to the exploration of smaller information networks because the display and interpretation of large amounts of information is challenging regarding the computational effort and the users' experience. The Ondex software framework was extended with customizable context-sensitive menus which allow additional integration and data analysis options for a selected set of candidates during interactive data exploration. We provide new functionalities for on-the-fly data integration using InterProScan, PubMed Central literature search, and sequence-based homology search. We applied the Ondex system to the integration of publicly available data for Aspergillus nidulans and analyzed transcriptome data. We demonstrate the advantages of our approach by proposing new hypotheses for the functional annotation of specific genes of differentially expressed fungal gene clusters. Our extension of the Ondex framework makes it possible to overcome the separation between data integration and interactive analysis. More specifically, computationally demanding calculations can be performed on selected sub-networks without losing any information from the whole network. Furthermore, our extensions allow for direct access to online biological databases which helps to keep the integrated information up-to-date.

10.
Artículo en Inglés | MEDLINE | ID: mdl-22585140

RESUMEN

The National Center for Biotechnology Information (NCBI) recently announced the availability of whole genome sequences for more than 1,000 species. And the number of sequenced individual organisms is growing. Ongoing improvement of DNA sequencing technology will further contribute to this, enabling large-scale evolution and population genetics studies. However, the availability of sequence information is only the first step in understanding how cells survive, reproduce, and adjust their behavior. The genetic control behind organized development and adaptation of complex organisms still remains widely undetermined. One major molecular control mechanism is transcriptional gene regulation. The direct juxtaposition of the total number of sequenced species to the handful of model organisms with known regulations is surprising. Here, we investigate how little we even know about these model organisms. We aim to predict the sizes of the whole-organism regulatory networks of seven species. In particular, we provide statistical lower bounds for the expected number of regulations. For Escherichia coli we estimate at most 37 percent of the expected gene regulatory interactions to be already discovered, 24 percent for Bacillus subtilis, and <3% human, respectively. We conclude that even for our best researched model organisms we still lack substantial understanding of fundamental molecular control mechanisms, at least on a large scale.


Asunto(s)
Biología Computacional/métodos , Redes Reguladoras de Genes , Bacillus subtilis/genética , Bacillus subtilis/metabolismo , Escherichia coli/genética , Escherichia coli/metabolismo , Regulación de la Expresión Génica , Humanos
11.
J Integr Bioinform ; 8(2): 156, 2011 Jun 27.
Artículo en Inglés | MEDLINE | ID: mdl-21705808

RESUMEN

The construction of integrated datasets from potentially hundreds of sources with bespoke formats, and their subsequent visualization and analysis, is a recurring challenge in systems biology. We present WIBL, a visualization and model development environment initially geared towards logic-based modelling of biological systems using integrated datasets. WIBL combines data integration, visualisation and modelling in a single portal-based workbench providing a comprehensive solution for interdisciplinary systems biology projects.


Asunto(s)
Programas Informáticos , Biología de Sistemas/métodos , Modelos Biológicos
12.
J Integr Bioinform ; 5(2)2008 Aug 25.
Artículo en Inglés | MEDLINE | ID: mdl-20134069

RESUMEN

The automated annotation of data from high throughput sequencing and genomics experiments is a significant challenge for bioinformatics. Most current approaches rely on sequential pipelines of gene finding and gene function prediction methods that annotate a gene with information from different reference data sources. Each function prediction method contributes evidence supporting a functional assignment. Such approaches generally ignore the links between the information in the reference datasets. These links, however, are valuable for assessing the plausibility of a function assignment and can be used to evaluate the confidence in a prediction. We are working towards a novel annotation system that uses the network of information supporting the function assignment to enrich the annotation process for use by expert curators and predicting the function of previously unannotated genes. In this paper we describe our success in the first stages of this development. We present the data integration steps that are needed to create the core database of integrated reference databases (UniProt, PFAM, PDB, GO and the pathway database Ara-Cyc) which has been established in the ONDEX data integration system. We also present a comparison between different methods for integration of GO terms as part of the function assignment pipeline and discuss the consequences of this analysis for improving the accuracy of gene function annotation. The methods and algorithms presented in this publication are an integral part of the ONDEX system which is freely available from http://ondex.sf.net/.


Asunto(s)
Gráficos por Computador , Sistemas de Administración de Bases de Datos , Genómica/métodos , Algoritmos , Interfaz Usuario-Computador
13.
Bioinformatics ; 22(11): 1383-90, 2006 Jun 01.
Artículo en Inglés | MEDLINE | ID: mdl-16533819

RESUMEN

MOTIVATION: Assembling the relevant information needed to interpret the output from high-throughput, genome scale, experiments such as gene expression microarrays is challenging. Analysis reveals genes that show statistically significant changes in expression levels, but more information is needed to determine their biological relevance. The challenge is to bring these genes together with biological information distributed across hundreds of databases or buried in the scientific literature (millions of articles). Software tools are needed to automate this task which at present is labor-intensive and requires considerable informatics and biological expertise. RESULTS: This article describes ONDEX and how it can be applied to the task of interpreting gene expression results. ONDEX is a database system that combines the features of semantic database integration and text mining with methods for graph-based analysis. An overview of the ONDEX system is presented, concentrating on recently developed features for graph-based analysis and visualization. A case study is used to show how ONDEX can help to identify causal relationships between stress response genes and metabolic pathways from gene expression data. ONDEX also discovered functional annotations for most of the genes that emerged as significant in the microarray experiment, but were previously of unknown function.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Algoritmos , Arabidopsis/genética , Automatización , Gráficos por Computador , Interpretación Estadística de Datos , Bases de Datos Genéticas , Regulación de la Expresión Génica , Procesamiento de Lenguaje Natural , Análisis de Secuencia por Matrices de Oligonucleótidos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA