Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Plant Biotechnol J ; 19(8): 1670-1678, 2021 08.
Artigo em Inglês | MEDLINE | ID: mdl-33750020

RESUMO

The generation of new ideas and scientific hypotheses is often the result of extensive literature and database searches, but, with the growing wealth of public and private knowledge, the process of searching diverse and interconnected data to generate new insights into genes, gene networks, traits and diseases is becoming both more complex and more time-consuming. To guide this technically challenging data integration task and to make gene discovery and hypotheses generation easier for researchers, we have developed a comprehensive software package called KnetMiner which is open-source and containerized for easy use. KnetMiner is an integrated, intelligent, interactive gene and gene network discovery platform that supports scientists explore and understand the biological stories of complex traits and diseases across species. It features fast algorithms for generating rich interactive gene networks and prioritizing candidate genes based on knowledge mining approaches. KnetMiner is used in many plant science institutions and has been adopted by several plant breeding organizations to accelerate gene discovery. The software is generic and customizable and can therefore be readily applied to new species and data types; for example, it has been applied to pest insects and fungal pathogens; and most recently repurposed to support COVID-19 research. Here, we give an overview of the main approaches behind KnetMiner and we report plant-centric case studies for identifying genes, gene networks and trait relationships in Triticum aestivum (bread wheat), as well as, an evidence-based approach to rank candidate genes under a large Arabidopsis thaliana QTL. KnetMiner is available at: https://knetminer.org.


Assuntos
COVID-19 , Herança Multifatorial , Estudos de Associação Genética , Humanos , Melhoramento Vegetal , SARS-CoV-2
2.
J Integr Bioinform ; 15(3)2018 Aug 07.
Artigo em Inglês | MEDLINE | ID: mdl-30085931

RESUMO

The speed and accuracy of new scientific discoveries - be it by humans or artificial intelligence - depends on the quality of the underlying data and on the technology to connect, search and share the data efficiently. In recent years, we have seen the rise of graph databases and semi-formal data models such as knowledge graphs to facilitate software approaches to scientific discovery. These approaches extend work based on formalised models, such as the Semantic Web. In this paper, we present our developments to connect, search and share data about genome-scale knowledge networks (GSKN). We have developed a simple application ontology based on OWL/RDF with mappings to standard schemas. We are employing the ontology to power data access services like resolvable URIs, SPARQL endpoints, JSON-LD web APIs and Neo4j-based knowledge graphs. We demonstrate how the proposed ontology and graph databases considerably improve search and access to interoperable and reusable biological knowledge (i.e. the FAIRness data principles).


Assuntos
Biologia Computacional/métodos , Gráficos por Computador , Redes Reguladoras de Genes , Genoma Humano , Software , Bases de Dados Factuais , Estudo de Associação Genômica Ampla , Humanos , Conhecimento
3.
BMC Med Inform Decis Mak ; 17(1): 30, 2017 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-28330491

RESUMO

BACKGROUND: Translational researchers need robust IT solutions to access a range of data types, varying from public data sets to pseudonymised patient information with restricted access, provided on a case by case basis. The reason for this complication is that managing access policies to sensitive human data must consider issues of data confidentiality, identifiability, extent of consent, and data usage agreements. All these ethical, social and legal aspects must be incorporated into a differential management of restricted access to sensitive data. METHODS: In this paper we present a pilot system that uses several common open source software components in a novel combination to coordinate access to heterogeneous biomedical data repositories containing open data (open access) as well as sensitive data (restricted access) in the domain of biobanking and biosample research. Our approach is based on a digital identity federation and software to manage resource access entitlements. RESULTS: Open source software components were assembled and configured in such a way that they allow for different ways of restricted access according to the protection needs of the data. We have tested the resulting pilot infrastructure and assessed its performance, feasibility and reproducibility. CONCLUSIONS: Common open source software components are sufficient to allow for the creation of a secure system for differential access to sensitive data. The implementation of this system is exemplary for researchers facing similar requirements for restricted access data. Here we report experience and lessons learnt of our pilot implementation, which may be useful for similar use cases. Furthermore, we discuss possible extensions for more complex scenarios.


Assuntos
Bancos de Espécimes Biológicos/normas , Pesquisa Biomédica/normas , Segurança Computacional/normas , Conjuntos de Dados como Assunto , Pesquisa Translacional Biomédica/normas , Humanos , Projetos Piloto
4.
Nucleic Acids Res ; 43(Database issue): D1113-6, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25361974

RESUMO

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is an international functional genomics database at the European Bioinformatics Institute (EMBL-EBI) recommended by most journals as a repository for data supporting peer-reviewed publications. It contains data from over 7000 public sequencing and 42,000 array-based studies comprising over 1.5 million assays in total. The proportion of sequencing-based submissions has grown significantly over the last few years and has doubled in the last 18 months, whilst the rate of microarray submissions is growing slightly. All data in ArrayExpress are available in the MAGE-TAB format, which allows robust linking to data analysis and visualization tools and standardized analysis. The main development over the last two years has been the release of a new data submission tool Annotare, which has reduced the average submission time almost 3-fold. In the near future, Annotare will become the only submission route into ArrayExpress, alongside MAGE-TAB format-based pipelines. ArrayExpress is a stable and highly accessed resource. Our future tasks include automation of data flows and further integration with other EMBL-EBI resources for the representation of multi-omics data.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Software
5.
Bioinformatics ; 30(9): 1338-9, 2014 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-24413672

RESUMO

MOTIVATION: Resource description framework (RDF) is an emerging technology for describing, publishing and linking life science data. As a major provider of bioinformatics data and services, the European Bioinformatics Institute (EBI) is committed to making data readily accessible to the community in ways that meet existing demand. The EBI RDF platform has been developed to meet an increasing demand to coordinate RDF activities across the institute and provides a new entry point to querying and exploring integrated resources available at the EBI.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Academias e Institutos , Pesquisa Biomédica , Internet
6.
Nucleic Acids Res ; 42(Database issue): D50-2, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24265224

RESUMO

The BioSamples database at the EBI (http://www.ebi.ac.uk/biosamples) provides an integration point for BioSamples information between technology specific databases at the EBI, projects such as ENCODE and reference collections such as cell lines. The database delivers a unified query interface and API to query sample information across EBI's databases and provides links back to assay databases. Sample groups are used to manage related samples, e.g. those from an experimental submission, or a single reference collection. Infrastructural improvements include a new user interface with ontological and key word queries, a new query API, a new data submission API, complete RDF data download and a supporting SPARQL endpoint, accessioning at the point of submission to the European Nucleotide Archive and European Genotype Phenotype Archives and improved query response times.


Assuntos
Bases de Dados Genéticas , Linhagem Celular , Europa (Continente) , Humanos , Internet , Neoplasias/genética , Integração de Sistemas
7.
Nucleic Acids Res ; 41(Database issue): D987-90, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23193272

RESUMO

The ArrayExpress Archive of Functional Genomics Data (http://www.ebi.ac.uk/arrayexpress) is one of three international functional genomics public data repositories, alongside the Gene Expression Omnibus at NCBI and the DDBJ Omics Archive, supporting peer-reviewed publications. It accepts data generated by sequencing or array-based technologies and currently contains data from almost a million assays, from over 30 000 experiments. The proportion of sequencing-based submissions has grown significantly over the last 2 years and has reached, in 2012, 15% of all new data. All data are available from ArrayExpress in MAGE-TAB format, which allows robust linking to data analysis and visualization tools, including Bioconductor and GenomeSpace. Additionally, R objects, for microarray data, and binary alignment format files, for sequencing data, have been generated for a significant proportion of ArrayExpress data.


Assuntos
Bases de Dados Genéticas , Genômica , Análise em Microsséries , Bases de Dados Genéticas/estatística & dados numéricos , Bases de Dados Genéticas/tendências , Sequenciamento de Nucleotídeos em Larga Escala , Internet , Software , Interface Usuário-Computador
8.
Bioinformatics ; 28(12): 1665-7, 2012 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-22556367

RESUMO

MOTIVATIONS: Spreadsheet-like tabular formats are ever more popular in the biomedical field as a mean for experimental reporting. The problem of converting the graph of an experimental workflow into a table-based representation occurs in many such formats and is not easy to solve. RESULTS: We describe graph2tab, a library that implements methods to realise such a conversion in a size-optimised way. Our solution is generic and can be adapted to specific cases of data exporters or data converters that need to be implemented. AVAILABILITY AND IMPLEMENTATION: The library source code and documentation are available at http://github.com/ISA-tools/graph2tab.


Assuntos
Gráficos por Computador , Linguagens de Programação , Fluxo de Trabalho , Biologia Computacional/métodos , Bases de Dados Factuais , Análise de Sequência com Séries de Oligonucleotídeos
9.
Nucleic Acids Res ; 40(Database issue): D64-70, 2012 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-22096232

RESUMO

The BioSample Database (http://www.ebi.ac.uk/biosamples) is a new database at EBI that stores information about biological samples used in molecular experiments, such as sequencing, gene expression or proteomics. The goals of the BioSample Database include: (i) recording and linking of sample information consistently within EBI databases such as ENA, ArrayExpress and PRIDE; (ii) minimizing data entry efforts for EBI database submitters by enabling submitting sample descriptions once and referencing them later in data submissions to assay databases and (iii) supporting cross database queries by sample characteristics. Each sample in the database is assigned an accession number. The database includes a growing set of reference samples, such as cell lines, which are repeatedly used in experiments and can be easily referenced from any database by their accession numbers. Accession numbers for the reference samples will be exchanged with a similar database at NCBI. The samples in the database can be queried by their attributes, such as sample types, disease names or sample providers. A simple tab-delimited format facilitates submissions of sample information to the database, initially via email to biosamples@ebi.ac.uk.


Assuntos
Bases de Dados Genéticas , Linhagem Celular , Expressão Gênica , Genômica , Proteômica , Análise de Sequência , Integração de Sistemas , Interface Usuário-Computador
10.
Brief Bioinform ; 12(6): 562-75, 2011 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-21969471

RESUMO

Biomedical research relies increasingly on large collections of data sets and knowledge whose generation, representation and analysis often require large collaborative and interdisciplinary efforts. This dimension of 'big data' research calls for the development of computational tools to manage such a vast amount of data, as well as tools that can improve communication and access to information from collaborating researchers and from the wider community. Whenever research projects have a defined temporal scope, an additional issue of data management arises, namely how the knowledge generated within the project can be made available beyond its boundaries and life-time. DC-THERA is a European 'Network of Excellence' (NoE) that spawned a very large collaborative and interdisciplinary research community, focusing on the development of novel immunotherapies derived from fundamental research in dendritic cell immunobiology. In this article we introduce the DC-THERA Directory, which is an information system designed to support knowledge management for this research community and beyond. We present how the use of metadata and Semantic Web technologies can effectively help to organize the knowledge generated by modern collaborative research, how these technologies can enable effective data management solutions during and beyond the project lifecycle, and how resources such as the DC-THERA Directory fit into the larger context of e-science.


Assuntos
Disseminação de Informação/métodos , Armazenamento e Recuperação da Informação/métodos , Semântica , Pesquisa Translacional Biomédica , Sistemas de Gerenciamento de Base de Dados , Internet
11.
Nucleic Acids Res ; 39(Database issue): D1002-4, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21071405

RESUMO

The ArrayExpress Archive (http://www.ebi.ac.uk/arrayexpress) is one of the three international public repositories of functional genomics data supporting publications. It includes data generated by sequencing or array-based technologies. Data are submitted by users and imported directly from the NCBI Gene Expression Omnibus. The ArrayExpress Archive is closely integrated with the Gene Expression Atlas and the sequence databases at the European Bioinformatics Institute. Advanced queries provided via ontology enabled interfaces include queries based on technology and sample attributes such as disease, cell types and anatomy.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Genômica , Sequenciamento de Nucleotídeos em Larga Escala , Análise de Sequência com Séries de Oligonucleotídeos , Expressão Gênica
12.
Immunome Res ; 6: 10, 2010 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-21092113

RESUMO

BACKGROUND: The advent of Systems Biology has been accompanied by the blooming of pathway databases. Currently pathways are defined generically with respect to the organ or cell type where a reaction takes place. The cell type specificity of the reactions is the foundation of immunological research, and capturing this specificity is of paramount importance when using pathway-based analyses to decipher complex immunological datasets. Here, we present DC-ATLAS, a novel and versatile resource for the interpretation of high-throughput data generated perturbing the signaling network of dendritic cells (DCs). RESULTS: Pathways are annotated using a novel data model, the Biological Connection Markup Language (BCML), a SBGN-compliant data format developed to store the large amount of information collected. The application of DC-ATLAS to pathway-based analysis of the transcriptional program of DCs stimulated with agonists of the toll-like receptor family allows an integrated description of the flow of information from the cellular sensors to the functional outcome, capturing the temporal series of activation events by grouping sets of reactions that occur at different time points in well-defined functional modules. CONCLUSIONS: The initiative significantly improves our understanding of DC biology and regulatory networks. Developing a systems biology approach for immune system holds the promise of translating knowledge on the immune system into more successful immunotherapy strategies.

13.
Bioinformatics ; 26(18): 2354-6, 2010 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-20679334

RESUMO

UNLABELLED: The first open source software suite for experimentalists and curators that (i) assists in the annotation and local management of experimental metadata from high-throughput studies employing one or a combination of omics and other technologies; (ii) empowers users to uptake community-defined checklists and ontologies; and (iii) facilitates submission to international public repositories. AVAILABILITY AND IMPLEMENTATION: Software, documentation, case studies and implementations at http://www.isa-tools.org.


Assuntos
Software , Lista de Checagem , Documentação
14.
Nucleic Acids Res ; 37(Database issue): D868-72, 2009 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-19015125

RESUMO

ArrayExpress http://www.ebi.ac.uk/arrayexpress consists of three components: the ArrayExpress Repository--a public archive of functional genomics experiments and supporting data, the ArrayExpress Warehouse--a database of gene expression profiles and other bio-measurements and the ArrayExpress Atlas--a new summary database and meta-analytical tool of ranked gene expression across multiple experiments and different biological conditions. The Repository contains data from over 6000 experiments comprising approximately 200,000 assays, and the database doubles in size every 15 months. The majority of the data are array based, but other data types are included, most recently-ultra high-throughput sequencing transcriptomics and epigenetic data. The Warehouse and Atlas allow users to query for differentially expressed genes by gene names and properties, experimental conditions and sample properties, or a combination of both. In this update, we describe the ArrayExpress developments over the last two years.


Assuntos
Bases de Dados Genéticas , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Genômica
15.
Summit Transl Bioinform ; 2009: 112-5, 2009 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-21347181

RESUMO

BACKGROUND: As the size and complexity of scientific datasets and the corresponding information stores grow, standards for collecting, describing, formatting, submitting and exchanging information are playing an increasingly active role. Several initiatives occupy strategic positions in the international scenario, both within and across domains. However, the job of harmonising reporting standards is still very much a work in progress; both software interoperability and the data integration remain challenging as things stand. RESULTS: The status quo with respect to standardization initiatives is summarized here, with particular emphasis on the motivation for, and the challenges of, ongoing synergistic activities amongst the academic community focused on the creation of truly interoperable standards. CONCLUSIONS: Groups generating standards should engage with ongoing cross-domain activities to simplify the integration of heterogeneous data sets to the greatest possible extent.

16.
OMICS ; 12(2): 143-9, 2008 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-18447634

RESUMO

This article summarizes the motivation for, and the proceedings of, the first ISA-TAB workshop held December 6-8, 2007, at the EBI, Cambridge, UK. This exploratory workshop, organized by members of the Microarray Gene Expression Data (MGED) Society's Reporting Structure for Biological Investigations (RSBI) working group, brought together a group of developers of a range of collaborative systems to discuss the use of a common format to address the pressing need of reporting and communicating data and metadata from biological, biomedical, and environmental studies employing combinations of genomics, transcriptomics, proteomics, and metabolomics technologies along with more conventional methodologies. The expertise of the participants comprised database development, data management, and hands-on experience in the development of data communication standards. The workshop's outcomes are set to help formalize the proposed Investigation, Study, Assay (ISA)-TAB tab-delimited format for representing and communicating experimental metadata. This article is part of the special issue of OMICS on the activities of the Genomics Standards Consortium (GSC).


Assuntos
Biologia Computacional , Sistemas de Gerenciamento de Base de Dados , Educação , Genômica , Proteômica , RNA Mensageiro/genética , Reino Unido
17.
BMC Bioinformatics ; 8 Suppl 1: S21, 2007 Mar 08.
Artigo em Inglês | MEDLINE | ID: mdl-17430566

RESUMO

BACKGROUND: Gene expression databases are key resources for microarray data management and analysis and the importance of a proper annotation of their content is well understood. Public repositories as well as microarray database systems that can be implemented by single laboratories exist. However, there is not yet a tool that can easily support a collaborative environment where different users with different rights of access to data can interact to define a common highly coherent content. The scope of the Genopolis database is to provide a resource that allows different groups performing microarray experiments related to a common subject to create a common coherent knowledge base and to analyse it. The Genopolis database has been implemented as a dedicated system for the scientific community studying dendritic and macrophage cells functions and host-parasite interactions. RESULTS: The Genopolis Database system allows the community to build an object based MIAME compliant annotation of their experiments and to store images, raw and processed data from the Affymetrix GeneChip platform. It supports dynamical definition of controlled vocabularies and provides automated and supervised steps to control the coherence of data and annotations. It allows a precise control of the visibility of the database content to different sub groups in the community and facilitates exports of its content to public repositories. It provides an interactive users interface for data analysis: this allows users to visualize data matrices based on functional lists and sample characterization, and to navigate to other data matrices defined by similarity of expression values as well as functional characterizations of genes involved. A collaborative environment is also provided for the definition and sharing of functional annotation by users. CONCLUSION: The Genopolis Database supports a community in building a common coherent knowledge base and analyse it. This fills a gap between a local database and a public repository, where the development of a common coherent annotation is important. In its current implementation, it provides a uniform coherently annotated dataset on dendritic cells and macrophage differentiation.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Bases de Dados Genéticas , Armazenamento e Recuperação da Informação/métodos , Internet , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Interface Usuário-Computador , Perfilação da Expressão Gênica
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA