Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
Nucleic Acids Res ; 52(D1): D938-D949, 2024 Jan 05.
Article in English | MEDLINE | ID: mdl-38000386

ABSTRACT

Bridging the gap between genetic variations, environmental determinants, and phenotypic outcomes is critical for supporting clinical diagnosis and understanding mechanisms of diseases. It requires integrating open data at a global scale. The Monarch Initiative advances these goals by developing open ontologies, semantic data models, and knowledge graphs for translational research. The Monarch App is an integrated platform combining data about genes, phenotypes, and diseases across species. Monarch's APIs enable access to carefully curated datasets and advanced analysis tools that support the understanding and diagnosis of disease for diverse applications such as variant prioritization, deep phenotyping, and patient profile-matching. We have migrated our system into a scalable, cloud-based infrastructure; simplified Monarch's data ingestion and knowledge graph integration systems; enhanced data mapping and integration standards; and developed a new user interface with novel search and graph navigation features. Furthermore, we advanced Monarch's analytic tools by developing a customized plugin for OpenAI's ChatGPT to increase the reliability of its responses about phenotypic data, allowing us to interrogate the knowledge in the Monarch graph using state-of-the-art Large Language Models. The resources of the Monarch Initiative can be found at monarchinitiative.org and its corresponding code repository at github.com/monarch-initiative/monarch-app.


Subject(s)
Databases, Factual , Disease , Genes , Phenotype , Humans , Internet , Databases, Factual/standards , Software , Genes/genetics , Disease/genetics
2.
Nucleic Acids Res ; 51(D1): D977-D985, 2023 01 06.
Article in English | MEDLINE | ID: mdl-36350656

ABSTRACT

The NHGRI-EBI GWAS Catalog (www.ebi.ac.uk/gwas) is a FAIR knowledgebase providing detailed, structured, standardised and interoperable genome-wide association study (GWAS) data to >200 000 users per year from academic research, healthcare and industry. The Catalog contains variant-trait associations and supporting metadata for >45 000 published GWAS across >5000 human traits, and >40 000 full P-value summary statistics datasets. Content is curated from publications or acquired via author submission of prepublication summary statistics through a new submission portal and validation tool. GWAS data volume has vastly increased in recent years. We have updated our software to meet this scaling challenge and to enable rapid release of submitted summary statistics. The scope of the repository has expanded to include additional data types of high interest to the community, including sequencing-based GWAS, gene-based analyses and copy number variation analyses. Community outreach has increased the number of shared datasets from under-represented traits, e.g. cancer, and we continue to contribute to awareness of the lack of population diversity in GWAS. Interoperability of the Catalog has been enhanced through links to other resources including the Polygenic Score Catalog and the International Mouse Phenotyping Consortium, refinements to GWAS trait annotation, and the development of a standard format for GWAS data.


Subject(s)
Genome-Wide Association Study , Knowledge Bases , Animals , Humans , Mice , DNA Copy Number Variations , National Human Genome Research Institute (U.S.) , Phenotype , Polymorphism, Single Nucleotide , Software , United States
3.
Nucleic Acids Res ; 47(D1): D1172-D1178, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30407529

ABSTRACT

The BioSamples database at EMBL-EBI provides a central hub for sample metadata storage and linkage to other EMBL-EBI resources. BioSamples has recently undergone major changes, both in terms of data content and supporting infrastructure. The data content has more than doubled from around 2 million samples in 2014 to just over 5 million samples in 2018. Fast, reciprocal data exchange was fully established between sister Biosample databases and other INSDC partners, enabling a worldwide common representation and centralization of sample metadata. The BioSamples platform has been upgraded to accommodate anticipated increases in the number of submissions via GA4GH driver projects such as the Human Cell Atlas and the EGA, as well as from mirroring of NCBI dbGaP data. The BioSamples database is now the authoritative repository for all INSDC sample metadata, an ELIXIR Deposition Database for Biomolecular Data and the EMBL-EBI sample metadata hub. To support faster turnaround for sample submission, and to increase scalability and resilience, we have upgraded the BioSamples database backend storage, APIs and user interface. Finally, the website has been redesigned to allow search and retrieval of records based on specific filters, such as 'disease' or 'organism'. These changes are targeted at answering current use cases as well as providing functionalities for future emerging and anticipated developments. Availability: The BioSamples database is freely available at http://www.ebi.ac.uk/biosamples. Content is distributed under the EMBL-EBI Terms of Use available at https://www.ebi.ac.uk/about/terms-of-use.


Subject(s)
Biological Specimen Banks , Computational Biology/methods , Databases, Genetic , Databases, Nucleic Acid , Genomics/methods , Computational Biology/statistics & numerical data , Genomics/statistics & numerical data , Humans , Information Storage and Retrieval/methods , Internet , Metadata/statistics & numerical data , User-Computer Interface
4.
Nucleic Acids Res ; 47(D1): D1005-D1012, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30445434

ABSTRACT

The GWAS Catalog delivers a high-quality curated collection of all published genome-wide association studies enabling investigations to identify causal variants, understand disease mechanisms, and establish targets for novel therapies. The scope of the Catalog has also expanded to targeted and exome arrays with 1000 new associations added for these technologies. As of September 2018, the Catalog contains 5687 GWAS comprising 71673 variant-trait associations from 3567 publications. New content includes 284 full P-value summary statistics datasets for genome-wide and new targeted array studies, representing 6 × 109 individual variant-trait statistics. In the last 12 months, the Catalog's user interface was accessed by ∼90000 unique users who viewed >1 million pages. We have improved data access with the release of a new RESTful API to support high-throughput programmatic access, an improved web interface and a new summary statistics database. Summary statistics provision is supported by a new format proposed as a community standard for summary statistics data representation. This format was derived from our experience in standardizing heterogeneous submissions, mapping formats and in harmonizing content. Availability: https://www.ebi.ac.uk/gwas/.


Subject(s)
Databases, Genetic , Genome-Wide Association Study , Disease/genetics , Genetic Variation , Humans , Microarray Analysis , Publications , Software , User-Computer Interface
5.
Bioinformatics ; 29(4): 525-7, 2013 Feb 15.
Article in English | MEDLINE | ID: mdl-23267176

ABSTRACT

MOTIVATION: Data collection in spreadsheets is ubiquitous, but current solutions lack support for collaborative semantic annotation that would promote shared and interdisciplinary annotation practices, supporting geographically distributed players. RESULTS: OntoMaton is an open source solution that brings ontology lookup and tagging capabilities into a cloud-based collaborative editing environment, harnessing Google Spreadsheets and the NCBO Web services. It is a general purpose, format-agnostic tool that may serve as a component of the ISA software suite. OntoMaton can also be used to assist the ontology development process. AVAILABILITY: OntoMaton is freely available from Google widgets under the CPAL open source license; documentation and examples at: https://github.com/ISA-tools/OntoMaton.


Subject(s)
Software , Vocabulary, Controlled , Internet
6.
Nucleic Acids Res ; 39(Web Server issue): W541-5, 2011 Jul.
Article in English | MEDLINE | ID: mdl-21672956

ABSTRACT

The National Center for Biomedical Ontology (NCBO) is one of the National Centers for Biomedical Computing funded under the NIH Roadmap Initiative. Contributing to the national computing infrastructure, NCBO has developed BioPortal, a web portal that provides access to a library of biomedical ontologies and terminologies (http://bioportal.bioontology.org) via the NCBO Web services. BioPortal enables community participation in the evaluation and evolution of ontology content by providing features to add mappings between terms, to add comments linked to specific ontology terms and to provide ontology reviews. The NCBO Web services (http://www.bioontology.org/wiki/index.php/NCBO_REST_services) enable this functionality and provide a uniform mechanism to access ontologies from a variety of knowledge representation formats, such as Web Ontology Language (OWL) and Open Biological and Biomedical Ontologies (OBO) format. The Web services provide multi-layered access to the ontology content, from getting all terms in an ontology to retrieving metadata about a term. Users can easily incorporate the NCBO Web services into software applications to generate semantically aware applications and to facilitate structured data collection.


Subject(s)
Software , Terminology as Topic , Vocabulary, Controlled , Internet
7.
Nucleic Acids Res ; 39(Database issue): D7-10, 2011 Jan.
Article in English | MEDLINE | ID: mdl-21097465

ABSTRACT

The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.


Subject(s)
Databases, Factual/standards , Information Dissemination
8.
J Biomed Inform ; 44(1): 137-45, 2011 Feb.
Article in English | MEDLINE | ID: mdl-20955817

ABSTRACT

The biomedical research community relies on a diverse set of resources, both within their own institutions and at other research centers. In addition, an increasing number of shared electronic resources have been developed. Without effective means to locate and query these resources, it is challenging, if not impossible, for investigators to be aware of the myriad resources available, or to effectively perform resource discovery when the need arises. In this paper, we describe the development and use of the Biomedical Resource Ontology (BRO) to enable semantic annotation and discovery of biomedical resources. We also describe the Resource Discovery System (RDS) which is a federated, inter-institutional pilot project that uses the BRO to facilitate resource discovery on the Internet. Through the RDS framework and its associated Biositemaps infrastructure, the BRO facilitates semantic search and discovery of biomedical resources, breaking down barriers and streamlining scientific research that will improve human health.


Subject(s)
Biomedical Research , Database Management Systems , Documentation , Medical Informatics , Translational Research, Biomedical , Animals , Computational Biology , Humans , Internet , Semantics , User-Computer Interface
9.
Nucleic Acids Res ; 37(Web Server issue): W170-3, 2009 Jul.
Article in English | MEDLINE | ID: mdl-19483092

ABSTRACT

Biomedical ontologies provide essential domain knowledge to drive data integration, information retrieval, data annotation, natural-language processing and decision support. BioPortal (http://bioportal.bioontology.org) is an open repository of biomedical ontologies that provides access via Web services and Web browsers to ontologies developed in OWL, RDF, OBO format and Protégé frames. BioPortal functionality includes the ability to browse, search and visualize ontologies. The Web interface also facilitates community-based participation in the evaluation and evolution of ontology content by providing features to add notes to ontology terms, mappings between terms and ontology reviews based on criteria such as usability, domain coverage, quality of content, and documentation and support. BioPortal also enables integrated search of biomedical data resources such as the Gene Expression Omnibus (GEO), ClinicalTrials.gov, and ArrayExpress, through the annotation and indexing of these resources with ontologies in BioPortal. Thus, BioPortal not only provides investigators, clinicians, and developers 'one-stop shopping' to programmatically access biomedical ontologies, but also provides support to integrate data from a variety of biomedical resources.


Subject(s)
Software , Vocabulary, Controlled , Abstracting and Indexing , Biomedical Research , Internet , Natural Language Processing , Systems Integration , User-Computer Interface
10.
Cell Genom ; 1(1)2021 Oct 13.
Article in English | MEDLINE | ID: mdl-36082306

ABSTRACT

Genome-wide association studies (GWASs) have enabled robust mapping of complex traits in humans. The open sharing of GWAS summary statistics (SumStats) is essential in facilitating the larger meta-analyses needed for increased power in resolving the genetic basis of disease. However, most GWAS SumStats are not readily accessible because of limited sharing and a lack of defined standards. With the aim of increasing the availability, quality, and utility of GWAS SumStats, the National Human Genome Research Institute-European Bioinformatics Institute (NHGRI-EBI) GWAS Catalog organized a community workshop to address the standards, infrastructure, and incentives required to promote and enable sharing. We evaluated the barriers to SumStats sharing, both technological and sociological, and developed an action plan to address those challenges and ensure that SumStats and study metadata are findable, accessible, interoperable, and reusable (FAIR). We encourage early deposition of datasets in the GWAS Catalog as the recognized central repository. We recommend standard requirements for reporting elements and formats for SumStats and accompanying metadata as guidelines for community standards and a basis for submission to the GWAS Catalog. Finally, we provide recommendations to enable, promote, and incentivize broader data sharing, standards and FAIRness in order to advance genomic medicine.

11.
Nat Biotechnol ; 25(11): 1251-5, 2007 Nov.
Article in English | MEDLINE | ID: mdl-17989687

ABSTRACT

The value of any kind of data is greatly enhanced when it exists in a form that allows it to be integrated with other data. One approach to integration is through the annotation of multiple bodies of data using common controlled vocabularies or 'ontologies'. Unfortunately, the very success of this approach has led to a proliferation of ontologies, which itself creates obstacles to integration. The Open Biomedical Ontologies (OBO) consortium is pursuing a strategy to overcome this problem. Existing OBO ontologies, including the Gene Ontology, are undergoing coordinated reform, and new ontologies are being created on the basis of an evolving set of shared principles governing ontology development. The result is an expanding family of ontologies designed to be interoperable and logically well formed and to incorporate accurate representations of biological reality. We describe this OBO Foundry initiative and provide guidelines for those who might wish to become involved.


Subject(s)
Information Storage and Retrieval/standards , Terminology as Topic , Vocabulary, Controlled , Humans , Nervous System/anatomy & histology , Nervous System Physiological Phenomena
12.
Nat Biotechnol ; 25(10): 1127-33, 2007 Oct.
Article in English | MEDLINE | ID: mdl-17921998

ABSTRACT

The Functional Genomics Experiment data model (FuGE) has been developed to facilitate convergence of data standards for high-throughput, comprehensive analyses in biology. FuGE models the components of an experimental activity that are common across different technologies, including protocols, samples and data. FuGE provides a foundation for describing entire laboratory workflows and for the development of new data formats. The Microarray Gene Expression Data society and the Proteomics Standards Initiative have committed to using FuGE as the basis for defining their respective standards, and other standards groups, including the Metabolomics Standards Initiative, are evaluating FuGE in their development efforts. Adoption of FuGE by multiple standards bodies will enable uniform reporting of common parts of functional genomics workflows, simplify data-integration efforts and ease the burden on researchers seeking to fulfill multiple minimum reporting requirements. Such advances are important for transparent data management and mining in functional genomics and systems biology.


Subject(s)
Computational Biology , Computer Simulation/standards , Genomics/standards , Models, Biological , Oligonucleotide Array Sequence Analysis/standards , Proteomics/standards , Databases, Factual
13.
BMC Bioinformatics ; 7: 489, 2006 Nov 06.
Article in English | MEDLINE | ID: mdl-17087822

ABSTRACT

BACKGROUND: Sharing of microarray data within the research community has been greatly facilitated by the development of the disclosure and communication standards MIAME and MAGE-ML by the MGED Society. However, the complexity of the MAGE-ML format has made its use impractical for laboratories lacking dedicated bioinformatics support. RESULTS: We propose a simple tab-delimited, spreadsheet-based format, MAGE-TAB, which will become a part of the MAGE microarray data standard and can be used for annotating and communicating microarray data in a MIAME compliant fashion. CONCLUSION: MAGE-TAB will enable laboratories without bioinformatics experience or support to manage, exchange and submit well-annotated microarray data in a standard format using a spreadsheet. The MAGE-TAB format is self-contained, and does not require an understanding of MAGE-ML or XML.


Subject(s)
Computational Biology/methods , Oligonucleotide Array Sequence Analysis/methods , Software , Databases, Genetic , Humans
14.
OMICS ; 10(2): 199-204, 2006.
Article in English | MEDLINE | ID: mdl-16901226

ABSTRACT

The development of the Functional Genomics Investigation Ontology (FuGO) is a collaborative, international effort that will provide a resource for annotating functional genomics investigations, including the study design, protocols and instrumentation used, the data generated and the types of analysis performed on the data. FuGO will contain both terms that are universal to all functional genomics investigations and those that are domain specific. In this way, the ontology will serve as the "semantic glue" to provide a common understanding of data from across these disparate data sources. In addition, FuGO will reference out to existing mature ontologies to avoid the need to duplicate these resources, and will do so in such a way as to enable their ease of use in annotation. This project is in the early stages of development; the paper will describe efforts to initiate the project, the scope and organization of the project, the work accomplished to date, and the challenges encountered, as well as future plans.


Subject(s)
Biomedical Research/standards , Genomics/standards , Biomedical Research/organization & administration , Genomics/organization & administration , Terminology as Topic , Workforce
15.
Methods Enzymol ; 411: 325-39, 2006.
Article in English | MEDLINE | ID: mdl-16939798

ABSTRACT

Consistent annotation of studies using microarrays is critical to optimal management and use of microarray data. Ontologies provide defined and structured terminology suited for this purpose. The Gene Ontology (GO) has aided the analysis of expression studies greatly by providing consistent functional annotation of array sequence features. The intent of the MGED Ontology (MO) is to provide consistent experimental annotation. The MO has been developed as a community effort in support of the Minimum Information About a Microarray Experiment standard and is tied to the Microarray Gene Expression object model. The MO is freely available and has been incorporated into the annotation systems of several public microarray database systems.


Subject(s)
Oligonucleotide Array Sequence Analysis/methods , Terminology as Topic , Animals , Databases, Genetic , Humans , Software
16.
Nucleic Acids Res ; 31(1): 212-5, 2003 Jan 01.
Article in English | MEDLINE | ID: mdl-12519984

ABSTRACT

PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates the recently completed P. falciparum genome sequence and annotation, as well as draft sequence and annotation emerging from other Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for intra- and inter-species comparisons. Sequence information is integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects and proteomics studies. The relational schema used to build PlasmoDB, GUS (Genomics Unified Schema) employs a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically-based, queries of the database. A stand-alone version of the database is also available on CD-ROM (P. falciparum GenePlot), facilitating access to the data in situations where internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to facilitate utilization of the vast quantities of genomic-scale data produced by the global malaria research community. The software used to develop PlasmoDB has been used to create a second Apicomplexan parasite genome database, ToxoDB (http://ToxoDB.org).


Subject(s)
Computational Biology , Databases, Genetic , Genome, Protozoan , Plasmodium falciparum/genetics , Animals , Gene Expression , Information Storage and Retrieval , Plasmodium/genetics , Plasmodium falciparum/metabolism , Proteomics , Software
17.
Genome Biol ; 17(1): 91, 2016 05 06.
Article in English | MEDLINE | ID: mdl-27154141

ABSTRACT

Efficient tools for data management and integration are essential for many aspects of high-throughput biology. In particular, annotations of genes and human genetic variants are commonly used but highly fragmented across many resources. Here, we describe MyGene.info and MyVariant.info, high-performance web services for querying gene and variant annotation information. These web services are currently accessed more than three million times permonth. They also demonstrate a generalizable cloud-based model for organizing and querying biological annotation information. MyGene.info and MyVariant.info are provided as high-performance web services, accessible at http://mygene.info and http://myvariant.info . Both are offered free of charge to the research community.


Subject(s)
Genetic Variation , Molecular Sequence Annotation , Sequence Analysis, DNA , Software , Database Management Systems , Databases, Genetic , Humans , Internet , User-Computer Interface
18.
PLoS One ; 11(4): e0154556, 2016.
Article in English | MEDLINE | ID: mdl-27128319

ABSTRACT

The Ontology for Biomedical Investigations (OBI) is an ontology that provides terms with precisely defined meanings to describe all aspects of how investigations in the biological and medical domains are conducted. OBI re-uses ontologies that provide a representation of biomedical knowledge from the Open Biological and Biomedical Ontologies (OBO) project and adds the ability to describe how this knowledge was derived. We here describe the state of OBI and several applications that are using it, such as adding semantic expressivity to existing databases, building data entry forms, and enabling interoperability between knowledge resources. OBI covers all phases of the investigation process, such as planning, execution and reporting. It represents information and material entities that participate in these processes, as well as roles and functions. Prior to OBI, it was not possible to use a single internally consistent resource that could be applied to multiple types of experiments for these applications. OBI has made this possible by creating terms for entities involved in biological and medical investigations and by importing parts of other biomedical ontologies such as GO, Chemical Entities of Biological Interest (ChEBI) and Phenotype Attribute and Trait Ontology (PATO) without altering their meaning. OBI is being used in a wide range of projects covering genomics, multi-omics, immunology, and catalogs of services. OBI has also spawned other ontologies (Information Artifact Ontology) and methods for importing parts of ontologies (Minimum information to reference an external ontology term (MIREOT)). The OBI project is an open cross-disciplinary collaborative effort, encompassing multiple research communities from around the globe. To date, OBI has created 2366 classes and 40 relations along with textual and formal definitions. The OBI Consortium maintains a web resource (http://obi-ontology.org) providing details on the people, policies, and issues being addressed in association with OBI. The current release of OBI is available at http://purl.obolibrary.org/obo/obi.owl.


Subject(s)
Biological Ontologies , Animals , Biological Ontologies/organization & administration , Biological Ontologies/statistics & numerical data , Biological Ontologies/trends , Computational Biology , Databases, Factual , Humans , Internet , Metadata , Semantics , Software
19.
PLoS One ; 10(9): e0136206, 2015.
Article in English | MEDLINE | ID: mdl-26393351

ABSTRACT

The NIDDK Information Network (dkNET; http://dknet.org) was launched to serve the needs of basic and clinical investigators in metabolic, digestive and kidney disease by facilitating access to research resources that advance the mission of the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK). By research resources, we mean the multitude of data, software tools, materials, services, projects and organizations available to researchers in the public domain. Most of these are accessed via web-accessible databases or web portals, each developed, designed and maintained by numerous different projects, organizations and individuals. While many of the large government funded databases, maintained by agencies such as European Bioinformatics Institute and the National Center for Biotechnology Information, are well known to researchers, many more that have been developed by and for the biomedical research community are unknown or underutilized. At least part of the problem is the nature of dynamic databases, which are considered part of the "hidden" web, that is, content that is not easily accessed by search engines. dkNET was created specifically to address the challenge of connecting researchers to research resources via these types of community databases and web portals. dkNET functions as a "search engine for data", searching across millions of database records contained in hundreds of biomedical databases developed and maintained by independent projects around the world. A primary focus of dkNET are centers and projects specifically created to provide high quality data and resources to NIDDK researchers. Through the novel data ingest process used in dkNET, additional data sources can easily be incorporated, allowing it to scale with the growth of digital data and the needs of the dkNET community. Here, we provide an overview of the dkNET portal and its functions. We show how dkNET can be used to address a variety of use cases that involve searching for research resources.


Subject(s)
Diabetes Mellitus/pathology , Digestive System Diseases/pathology , Information Services , Kidney Diseases/pathology , Animals , Databases, Factual , Humans , Models, Animal , National Institute of Diabetes and Digestive and Kidney Diseases (U.S.) , Research , Search Engine , United States
SELECTION OF CITATIONS
SEARCH DETAIL