Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
BMC Genomics ; 23(1): 198, 2022 Mar 12.
Artículo en Inglés | MEDLINE | ID: mdl-35279098

RESUMEN

BACKGROUND: Sphaerophoria rueppellii, a European species of hoverfly, is a highly effective beneficial predator of hemipteran crop pests including aphids, thrips and coleopteran/lepidopteran larvae in integrated pest management (IPM) programmes. It is also a key pollinator of a wide variety of important agricultural crops. No genomic information is currently available for S. rueppellii. Without genomic information for such beneficial predator species, we are unable to perform comparative analyses of insecticide target-sites and genes encoding metabolic enzymes potentially responsible for insecticide resistance, between crop pests and their predators. These metabolic mechanisms include several gene families - cytochrome P450 monooxygenases (P450s), ATP binding cassette transporters (ABCs), glutathione-S-transferases (GSTs), UDP-glycosyltransferases (UGTs) and carboxyl/choline esterases (CCEs). METHODS AND FINDINGS: In this study, a high-quality near-chromosome level de novo genome assembly (as well as a mitochondrial genome assembly) for S. rueppellii has been generated using a hybrid approach with PacBio long-read and Illumina short-read data, followed by super scaffolding using Hi-C data. The final assembly achieved a scaffold N50 of 87Mb, a total genome size of 537.6Mb and a level of completeness of 96% using a set of 1,658 core insect genes present as full-length genes. The assembly was annotated with 14,249 protein-coding genes. Comparative analysis revealed gene expansions of CYP6Zx P450s, epsilon-class GSTs, dietary CCEs and multiple UGT families (UGT37/302/308/430/431). Conversely, ABCs, delta-class GSTs and non-CYP6Zx P450s showed limited expansion. Differences were seen in the distributions of resistance-associated gene families across subfamilies between S. rueppellii and some hemipteran crop pests. Additionally, S. rueppellii had larger numbers of detoxification genes than other pollinator species. CONCLUSION AND SIGNIFICANCE: This assembly is the first published genome for a predatory member of the Syrphidae family and will serve as a useful resource for further research into selectivity and potential tolerance of insecticides by beneficial predators. Furthermore, the expansion of some gene families often linked to insecticide resistance and selectivity may be an indicator of the capacity of this predator to detoxify IPM selective insecticides. These findings could be exploited by targeted insecticide screens and functional studies to increase effectiveness of IPM strategies, which aim to increase crop yields by sustainably and effectively controlling pests without impacting beneficial predator populations.


Asunto(s)
Dípteros , Insecticidas , Animales , Cromosomas , Dípteros/genética , Tamaño del Genoma , Humanos , Resistencia a los Insecticidas/genética , Insecticidas/farmacología
2.
BMC Genomics ; 23(1): 45, 2022 Jan 11.
Artículo en Inglés | MEDLINE | ID: mdl-35012450

RESUMEN

BACKGROUND: Orius laevigatus, a minute pirate bug, is a highly effective beneficial predator of crop pests including aphids, spider mites and thrips in integrated pest management (IPM) programmes. No genomic information is currently available for O. laevigatus, as is the case for the majority of beneficial predators which feed on crop pests. In contrast, genomic information for crop pests is far more readily available. The lack of publicly available genomes for beneficial predators to date has limited our ability to perform comparative analyses of genes encoding potential insecticide resistance mechanisms between crop pests and their predators. These mechanisms include several gene/protein families including cytochrome P450s (P450s), ATP binding cassette transporters (ABCs), glutathione S-transferases (GSTs), UDP-glucosyltransferases (UGTs) and carboxyl/cholinesterases (CCEs). METHODS AND FINDINGS: In this study, a high-quality scaffold level de novo genome assembly for O. laevigatus has been generated using a hybrid approach with PacBio long-read and Illumina short-read data. The final assembly achieved a scaffold N50 of 125,649 bp and a total genome size of 150.98 Mb. The genome assembly achieved a level of completeness of 93.6% using a set of 1658 core insect genes present as full-length genes. Genome annotation identified 15,102 protein-coding genes - 87% of which were assigned a putative function. Comparative analyses revealed gene expansions of sigma class GSTs and CYP3 P450s. Conversely the UGT gene family showed limited expansion. Differences were seen in the distributions of resistance-associated gene families at the subfamily level between O. laevigatus and some of its targeted crop pests. A target site mutation in ryanodine receptors (I4790M, PxRyR) which has strong links to diamide resistance in crop pests and had previously only been identified in lepidopteran species was found to also be present in hemipteran species, including O. laevigatus. CONCLUSION AND SIGNIFICANCE: This assembly is the first published genome for the Anthocoridae family and will serve as a useful resource for further research into target-site selectivity issues and potential resistance mechanisms in beneficial predators. Furthermore, the expansion of gene families often linked to insecticide resistance may be an indicator of the capacity of this predator to detoxify selective insecticides. These findings could be exploited by targeted pesticide screens and functional studies to increase effectiveness of IPM strategies, which aim to increase crop yields by sustainably, environmentally-friendly and effectively control pests without impacting beneficial predator populations.


Asunto(s)
Heterópteros , Insecticidas , Thysanoptera , Animales , Genoma , Humanos , Resistencia a los Insecticidas
3.
Bioinformatics ; 33(7): 1096-1098, 2017 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-27993779

RESUMEN

Summary: The goal of this work is to offer a computational framework for exploring data from the Recon2 human metabolic reconstruction model. Advanced user access features have been developed using the Neo4j graph database technology and this paper describes key features such as efficient management of the network data, examples of the network querying for addressing particular tasks, and how query results are converted back to the Systems Biology Markup Language (SBML) standard format. The Neo4j-based metabolic framework facilitates exploration of highly connected and comprehensive human metabolic data and identification of metabolic subnetworks of interest. A Java-based parser component has been developed to convert query results (available in the JSON format) into SBML and SIF formats in order to facilitate further results exploration, enhancement or network sharing. Availability and Implementation: The Neo4j-based metabolic framework is freely available from: https://diseaseknowledgebase.etriks.org/metabolic/browser/ . The java code files developed for this work are available from the following url: https://github.com/ibalaur/MetabolicFramework . Contact: ibalaur@eisbm.org. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Redes y Vías Metabólicas , Programas Informáticos , Gráficos por Computador , Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Genoma , Humanos , Redes y Vías Metabólicas/genética , Modelos Biológicos
4.
Bioinformatics ; 30(7): 1034-5, 2014 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-24363379

RESUMEN

SUMMARY: Ondex Web is a new web-based implementation of the network visualization and exploration tools from the Ondex data integration platform. New features such as context-sensitive menus and annotation tools provide users with intuitive ways to explore and manipulate the appearance of heterogeneous biological networks. Ondex Web is open source, written in Java and can be easily embedded into Web sites as an applet. Ondex Web supports loading data from a variety of network formats, such as XGMML, NWB, Pajek and OXL. AVAILABILITY AND IMPLEMENTATION: http://ondex.rothamsted.ac.uk/OndexWeb.


Asunto(s)
Biología/métodos , Programas Informáticos , Minería de Datos , Internet , Redes y Vías Metabólicas
5.
J Integr Plant Biol ; 54(5): 345-55, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22494395

RESUMEN

Associating phenotypic traits and quantitative trait loci (QTL) to causative regions of the underlying genome is a key goal in agricultural research. InterStoreDB is a suite of integrated databases designed to assist in this process. The individual databases are species independent and generic in design, providing access to curated datasets relating to plant populations, phenotypic traits, genetic maps, marker loci and QTL, with links to functional gene annotation and genomic sequence data. Each component database provides access to associated metadata, including data provenance and parameters used in analyses, thus providing users with information to evaluate the relative worth of any associations identified. The databases include CropStoreDB, for management of population, genetic map, QTL and trait measurement data, SeqStoreDB for sequence-related data and AlignStoreDB, which stores sequence alignment information, and allows navigation between genetic and genomic datasets. Genetic maps are visualized and compared using the CMAP tool, and functional annotation from sequenced genomes is provided via an EnsEMBL-based genome browser. This framework facilitates navigation of the multiple biological domains involved in genetics and genomics research in a transparent manner within a single portal. We demonstrate the value of InterStoreDB as a tool for Brassica research. InterStoreDB is available from: http://www.interstoredb.org.


Asunto(s)
Bases de Datos Genéticas , Genómica , Programas Informáticos , Brassica/genética , Productos Agrícolas/genética , Genes de Plantas/genética , Sitios de Carácter Cuantitativo/genética , Alineación de Secuencia
6.
BMC Bioinformatics ; 12: 203, 2011 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-21612636

RESUMEN

BACKGROUND: Combining multiple evidence-types from different information sources has the potential to reveal new relationships in biological systems. The integrated information can be represented as a relationship network, and clustering the network can suggest possible functional modules. The value of such modules for gaining insight into the underlying biological processes depends on their functional coherence. The challenges that we wish to address are to define and quantify the functional coherence of modules in relationship networks, so that they can be used to infer function of as yet unannotated proteins, to discover previously unknown roles of proteins in diseases as well as for better understanding of the regulation and interrelationship between different elements of complex biological systems. RESULTS: We have defined the functional coherence of modules with respect to the Gene Ontology (GO) by considering two complementary aspects: (i) the fragmentation of the GO functional categories into the different modules and (ii) the most representative functions of the modules. We have proposed a set of metrics to evaluate these two aspects and demonstrated their utility in Arabidopsis thaliana. We selected 2355 proteins for which experimentally established protein-protein interaction (PPI) data were available. From these we have constructed five relationship networks, four based on single types of data: PPI, co-expression, co-occurrence of protein names in scientific literature abstracts and sequence similarity and a fifth one combining these four evidence types. The ability of these networks to suggest biologically meaningful grouping of proteins was explored by applying Markov clustering and then by measuring the functional coherence of the clusters. CONCLUSIONS: Relationship networks integrating multiple evidence-types are biologically informative and allow more proteins to be assigned to a putative functional module. Using additional evidence types concentrates the functional annotations in a smaller number of modules without unduly compromising their consistency. These results indicate that integration of more data sources improves the ability to uncover functional association between proteins, both by allowing more proteins to be linked and producing a network where modular structure more closely reflects the hierarchy in the gene ontology.


Asunto(s)
Proteínas de Arabidopsis/metabolismo , Arabidopsis/genética , Arabidopsis/metabolismo , Metabolómica/métodos , Algoritmos , Proteínas de Arabidopsis/genética , Análisis por Conglomerados , Bases de Datos Genéticas , Cadenas de Markov , Redes y Vías Metabólicas
7.
BMC Bioinformatics ; 12: 431, 2011 Nov 03.
Artículo en Inglés | MEDLINE | ID: mdl-22054122

RESUMEN

BACKGROUND: In response to the rapid growth of available genome sequences, efforts have been made to develop automatic inference methods to functionally characterize them. Pipelines that infer functional annotation are now routinely used to produce new annotations at a genome scale and for a broad variety of species. These pipelines differ widely in their inference algorithms, confidence thresholds and data sources for reasoning. This heterogeneity makes a comparison of the relative merits of each approach extremely complex. The evaluation of the quality of the resultant annotations is also challenging given there is often no existing gold-standard against which to evaluate precision and recall. RESULTS: In this paper, we present a pragmatic approach to the study of functional annotations. An ensemble of 12 metrics, describing various aspects of functional annotations, is defined and implemented in a unified framework, which facilitates their systematic analysis and inter-comparison. The use of this framework is demonstrated on three illustrative examples: analysing the outputs of state-of-the-art inference pipelines, comparing electronic versus manual annotation methods, and monitoring the evolution of publicly available functional annotations. The framework is part of the AIGO library (http://code.google.com/p/aigo) for the Analysis and the Inter-comparison of the products of Gene Ontology (GO) annotation pipelines. The AIGO library also provides functionalities to easily load, analyse, manipulate and compare functional annotations and also to plot and export the results of the analysis in various formats. CONCLUSIONS: This work is a step toward developing a unified framework for the systematic study of GO functional annotations. This framework has been designed so that new metrics on GO functional annotations can be added in a very straightforward way.


Asunto(s)
Bovinos/genética , Genómica/métodos , Anotación de Secuencia Molecular , Vocabulario Controlado , Algoritmos , Animales , Mapeo Cromosómico , Bases de Datos Genéticas , Genoma , Humanos
8.
Brief Bioinform ; 10(6): 676-93, 2009 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-19933213

RESUMEN

The development of a systems based approach to problems in plant sciences requires integration of existing information resources. However, the available information is currently often incomplete and dispersed across many sources and the syntactic and semantic heterogeneity of the data is a challenge for integration. In this article, we discuss strategies for data integration and we use a graph based integration method (Ondex) to illustrate some of these challenges with reference to two example problems concerning integration of (i) metabolic pathway and (ii) protein interaction data for Arabidopsis thaliana. We quantify the degree of overlap for three commonly used pathway and protein interaction information sources. For pathways, we find that the AraCyc database contains the widest coverage of enzyme reactions and for protein interactions we find that the IntAct database provides the largest unique contribution to the integrated dataset. For both examples, however, we observe a relatively small amount of data common to all three sources. Analysis and visual exploration of the integrated networks was used to identify a number of practical issues relating to the interpretation of these datasets. We demonstrate the utility of these approaches to the analysis of groups of coexpressed genes from an individual microarray experiment, in the context of pathway information and for the combination of coexpression data with an integrated protein interaction network.


Asunto(s)
Proteínas de Arabidopsis/genética , Arabidopsis/genética , Mapeo Cromosómico/métodos , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genoma de Planta/genética , Almacenamiento y Recuperación de la Información/métodos , Mapeo de Interacción de Proteínas/métodos , Integración de Sistemas
9.
iScience ; 24(6): 102499, 2021 Jun 25.
Artículo en Inglés | MEDLINE | ID: mdl-34308279

RESUMEN

Male honeybees (drones) are thought to congregate in large numbers in particular "drone congregation areas" to mate. We used harmonic radar to record the flight paths of individual drones and found that drones favored certain locations within the landscape which were stable over two years. Drones often visit multiple potential lekking sites within a single flight and take shared flight paths between them. Flights between such sites are relatively straight and begin as early as the drone's second flight, indicating familiarity with the sites acquired during initial learning flights. Arriving at congregation areas, drones display convoluted, looping flight patterns. We found a correlation between a drone's distance from the center of each area and its acceleration toward the center, a signature of collective behavior leading to congregation in these areas. Our study reveals the behavior of individual drones as they navigate between and within multiple aerial leks.

10.
Sci Rep ; 11(1): 4087, 2021 02 18.
Artículo en Inglés | MEDLINE | ID: mdl-33602999

RESUMEN

Despite intensive research, the aetiology of multiple sclerosis (MS) remains unknown. Cerebrospinal fluid proteomics has the potential to reveal mechanisms of MS pathogenesis, but analyses must account for disease heterogeneity. We previously reported explorative multivariate analysis by hierarchical clustering of proteomics data of MS patients and controls, which resulted in two groups of individuals. Grouping reflected increased levels of intrathecal inflammatory response proteins and decreased levels of proteins involved in neural development in one group relative to the other group. MS patients and controls were present in both groups. Here we reanalysed these data and we also reanalysed data from an independent cohort of patients diagnosed with clinically isolated syndrome (CIS), who have symptoms of MS without evidence of dissemination in space and/or time. Some, but not all, CIS patients had intrathecal inflammation. The analyses reported here identified a common protein signature of MS/CIS that was not linked to elevated intrathecal inflammation. The signature included low levels of complement proteins, semaphorin-7A, reelin, neural cell adhesion molecules, inter-alpha-trypsin inhibitor heavy chain H2, transforming growth factor beta 1, follistatin-related protein 1, malate dehydrogenase 1 cytoplasmic, plasma retinol-binding protein, biotinidase, and transferrin, all known to play roles in neural development. Low levels of these proteins suggest that MS/CIS patients suffer from abnormally low oxidative capacity that results in disrupted neural development from an early stage of the disease.


Asunto(s)
Proteínas del Líquido Cefalorraquídeo/análisis , Esclerosis Múltiple/líquido cefalorraquídeo , Proteoma/análisis , Adolescente , Adulto , Biomarcadores/líquido cefalorraquídeo , Estudios de Casos y Controles , Femenino , Humanos , Masculino , Persona de Mediana Edad , Esclerosis Múltiple/patología , Adulto Joven
11.
Nucleic Acids Res ; 36(Database issue): D572-6, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17942425

RESUMEN

The pathogen-host interaction database (PHI-base) is a web-accessible database that catalogues experimentally verified pathogenicity, virulence and effector genes from bacterial, fungal and Oomycete pathogens, which infect human, animal, plant, insect, fish and fungal hosts. Plant endophytes are also included. PHI-base is therefore an invaluable resource for the discovery of genes in medically and agronomically important pathogens, which may be potential targets for chemical intervention. The database is freely accessible to both academic and non-academic users. This publication describes recent additions to the database and both current and future applications. The number of fields that characterize PHI-base entries has almost doubled. Important additional fields deal with new experimental methods, strain information, pathogenicity islands and external references that link the database to external resources, for example, gene ontology terms and Locus IDs. Another important addition is the inclusion of anti-infectives and their target genes that makes it possible to predict the compounds, that may interact with newly identified virulence factors. In parallel, the curation process has been improved and now involves several external experts. On the technical side, several new search tools have been provided and the database is also now distributed in XML format. PHI-base is available at: http://www.phi-base.org/.


Asunto(s)
Bacterias/patogenicidad , Bases de Datos Genéticas , Hongos/patogenicidad , Interacciones Huésped-Patógeno/genética , Oomicetos/patogenicidad , Factores de Virulencia/genética , Antiinfecciosos/farmacología , Bacterias/genética , Hongos/genética , Genes Bacterianos , Genes Fúngicos , Internet , Oomicetos/genética , Interfaz Usuario-Computador , Factores de Virulencia/antagonistas & inhibidores
12.
Nucleic Acids Res ; 35(Web Server issue): W148-51, 2007 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-17439966

RESUMEN

Wheat biologists face particular problems because of the lack of genomic sequence and the three homoeologous genomes which give rise to three very similar forms for many transcripts. However, over 1.3 million available public-domain Triticeae ESTs (of which approximately 850,000 are wheat) and the full rice genomic sequence can be used to estimate likely transcript sequences present in any wheat cDNA sample to which PCR primers may then be designed. Wheat Estimated Transcript Server (WhETS) is designed to do this in a convenient form, and to provide information on the number of matching EST and high quality cDNA (hq-cDNA) sequences, tissue distribution and likely intron position inferred from rice. Triticeae EST and hq-cDNA sequences are mapped onto rice loci and stored in a database. The user selects a rice locus (directly or via Arabidopsis) and the matching Triticeae sequences are assembled according to user-defined filter and stringency settings. Assembly is achieved initially with the CAP3 program and then with a single nucleotide polymorphism (SNP)-analysis algorithm designed to separate homoeologues. Alignment of the resulting contigs and singlets against the rice template sequence is then displayed. Sequences and assembly details are available for download in fasta and ace formats, respectively. WhETS is accessible at http://www4.rothamsted.bbsrc.ac.uk/whets.


Asunto(s)
Mapeo Cromosómico , Cromosomas de las Plantas/genética , Biología Computacional/métodos , Etiquetas de Secuencia Expresada , Ploidias , Triticum/genética , Bases de Datos Genéticas , Genes de Plantas , Genoma de Planta , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Alineación de Secuencia
13.
J Integr Bioinform ; 15(3)2018 Aug 07.
Artículo en Inglés | MEDLINE | ID: mdl-30085931

RESUMEN

The speed and accuracy of new scientific discoveries - be it by humans or artificial intelligence - depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).


Asunto(s)
Biología Computacional/métodos , Gráficos por Computador , Redes Reguladoras de Genes , Genoma Humano , Programas Informáticos , Bases de Datos Factuales , Estudio de Asociación del Genoma Completo , Humanos , Conocimiento
14.
F1000Res ; 7: 1651, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30755790

RESUMEN

KnetMaps is a BioJS component for the interactive visualization of biological knowledge networks. It is well suited for applications that need to visualise complementary, connected and content-rich data in a single view in order to help users to traverse pathways linking entities of interest, for example to go from genotype to phenotype. KnetMaps loads data in JSON format, visualizes the structure and content of knowledge networks using lightweight JavaScript libraries, and supports interactive touch gestures. KnetMaps uses effective visualization techniques to prevent information overload and to allow researchers to progressively build their knowledge.


Asunto(s)
Biología , Conocimiento , Programas Informáticos , Interfaz Usuario-Computador
15.
Sci Data ; 5: 180072, 2018 May 15.
Artículo en Inglés | MEDLINE | ID: mdl-29762552

RESUMEN

The electronic Rothamsted Archive, e-RA (www.era.rothamsted.ac.uk) provides a permanent managed database to both securely store and disseminate data from Rothamsted Research's long-term field experiments (since 1843) and meteorological stations (since 1853). Both historical and contemporary data are made available via this online database which provides the scientific community with access to a unique continuous record of agricultural experiments and weather measured since the mid-19th century. Qualitative information, such as treatment and management practices, plans and soil information, accompanies the data and are made available on the e-RA website. e-RA was released externally to the wider scientific community in 2013 and this paper describes its development, content, curation and the access process for data users. Case studies illustrate the diverse applications of the data, including its original intended purposes and recent unforeseen applications. Usage monitoring demonstrates the data are of increasing interest. Future developments, including adopting FAIR data principles, are proposed as the resource is increasingly recognised as a unique archive of data relevant to sustainable agriculture, agroecology and the environment.

16.
J Integr Bioinform ; 14(1)2017 Jun 13.
Artículo en Inglés | MEDLINE | ID: mdl-28609292

RESUMEN

Genetics and "omics" studies designed to uncover genotype to phenotype relationships often identify large numbers of potential candidate genes, among which the causal genes are hidden. Scientists generally lack the time and technical expertise to review all relevant information available from the literature, from key model species and from a potentially wide range of related biological databases in a variety of data formats with variable quality and coverage. Computational tools are needed for the integration and evaluation of heterogeneous information in order to prioritise candidate genes and components of interaction networks that, if perturbed through potential interventions, have a positive impact on the biological outcome in the whole organism without producing negative side effects. Here we review several bioinformatics tools and databases that play an important role in biological knowledge discovery and candidate gene prioritization. We conclude with several key challenges that need to be addressed in order to facilitate biological knowledge discovery in the future.


Asunto(s)
Biología Computacional/métodos , Bases de Datos Factuales , Genes , Estudios de Asociación Genética/métodos , Genotipo , Fenotipo , Animales , Humanos
17.
J Comput Biol ; 24(10): 969-980, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27627442

RESUMEN

The development of colorectal cancer (CRC)-the third most common cancer type-has been associated with deregulations of cellular mechanisms stimulated by both genetic and epigenetic events. StatEpigen is a manually curated and annotated database, containing information on interdependencies between genetic and epigenetic signals, and specialized currently for CRC research. Although StatEpigen provides a well-developed graphical user interface for information retrieval, advanced queries involving associations between multiple concepts can benefit from more detailed graph representation of the integrated data. This can be achieved by using a graph database (NoSQL) approach. Data were extracted from StatEpigen and imported to our newly developed EpiGeNet, a graph database for storage and querying of conditional relationships between molecular (genetic and epigenetic) events observed at different stages of colorectal oncogenesis. We illustrate the enhanced capability of EpiGeNet for exploration of different queries related to colorectal tumor progression; specifically, we demonstrate the query process for (i) stage-specific molecular events, (ii) most frequently observed genetic and epigenetic interdependencies in colon adenoma, and (iii) paths connecting key genes reported in CRC and associated events. The EpiGeNet framework offers improved capability for management and visualization of data on molecular events specific to CRC initiation and progression.


Asunto(s)
Neoplasias Colorrectales/genética , Biología Computacional/métodos , Gráficos por Computador , Epigénesis Genética , Redes Reguladoras de Genes , Programas Informáticos , Bases de Datos Factuales , Humanos
18.
BioData Min ; 9: 23, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27462371

RESUMEN

BACKGROUND: Systems biology experiments generate large volumes of data of multiple modalities and this information presents a challenge for integration due to a mix of complexity together with rich semantics. Here, we describe how graph databases provide a powerful framework for storage, querying and envisioning of biological data. RESULTS: We show how graph databases are well suited for the representation of biological information, which is typically highly connected, semi-structured and unpredictable. We outline an application case that uses the Neo4j graph database for building and querying a prototype network to provide biological context to asthma related genes. CONCLUSIONS: Our study suggests that graph databases provide a flexible solution for the integration of multiple types of biological data and facilitate exploratory data mining to support hypothesis generation.

19.
Appl Transl Genom ; 11: 18-26, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28018846

RESUMEN

The chances of raising crop productivity to enhance global food security would be greatly improved if we had a complete understanding of all the biological mechanisms that underpinned traits such as crop yield, disease resistance or nutrient and water use efficiency. With more crop genomes emerging all the time, we are nearer having the basic information, at the gene-level, to begin assembling crop gene catalogues and using data from other plant species to understand how the genes function and how their interactions govern crop development and physiology. Unfortunately, the task of creating such a complete knowledge base of gene functions, interaction networks and trait biology is technically challenging because the relevant data are dispersed in myriad databases in a variety of data formats with variable quality and coverage. In this paper we present a general approach for building genome-scale knowledge networks that provide a unified representation of heterogeneous but interconnected datasets to enable effective knowledge mining and gene discovery. We describe the datasets and outline the methods, workflows and tools that we have developed for creating and visualising these networks for the major crop species, wheat and barley. We present the global characteristics of such knowledge networks and with an example linking a seed size phenotype to a barley WRKY transcription factor orthologous to TTG2 from Arabidopsis, we illustrate the value of integrated data in biological knowledge discovery. The software we have developed (www.ondex.org) and the knowledge resources (http://knetminer.rothamsted.ac.uk) we have created are all open-source and provide a first step towards systematic and evidence-based gene discovery in order to facilitate crop improvement.

20.
Front Genet ; 5: 21, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24600467

RESUMEN

Network inference utilizes experimental high-throughput data for the reconstruction of molecular interaction networks where new relationships between the network entities can be predicted. Despite the increasing amount of experimental data, the parameters of each modeling technique cannot be optimized based on the experimental data alone, but needs to be qualitatively assessed if the components of the resulting network describe the experimental setting. Candidate list prioritization and validation builds upon data integration and data visualization. The application of tools supporting this procedure is limited to the exploration of smaller information networks because the display and interpretation of large amounts of information is challenging regarding the computational effort and the users' experience. The Ondex software framework was extended with customizable context-sensitive menus which allow additional integration and data analysis options for a selected set of candidates during interactive data exploration. We provide new functionalities for on-the-fly data integration using InterProScan, PubMed Central literature search, and sequence-based homology search. We applied the Ondex system to the integration of publicly available data for Aspergillus nidulans and analyzed transcriptome data. We demonstrate the advantages of our approach by proposing new hypotheses for the functional annotation of specific genes of differentially expressed fungal gene clusters. Our extension of the Ondex framework makes it possible to overcome the separation between data integration and interactive analysis. More specifically, computationally demanding calculations can be performed on selected sub-networks without losing any information from the whole network. Furthermore, our extensions allow for direct access to online biological databases which helps to keep the integrated information up-to-date.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA