Búsqueda | Portal Regional de la BVS

1.

Specimen, biological structure, and spatial ontologies in support of a Human Reference Atlas.

Herr, Bruce W; Hardi, Josef; Quardokus, Ellen M; Bueckle, Andreas; Chen, Lu; Wang, Fusheng; Caron, Anita R; Osumi-Sutherland, David; Musen, Mark A; Börner, Katy.

Sci Data ; 10(1): 171, 2023 03 27.

Artículo en Inglés | MEDLINE | ID: mdl-36973309

RESUMEN

The Human Reference Atlas (HRA) is defined as a comprehensive, three-dimensional (3D) atlas of all the cells in the healthy human body. It is compiled by an international team of experts who develop standard terminologies that they link to 3D reference objects, describing anatomical structures. The third HRA release (v1.2) covers spatial reference data and ontology annotations for 26 organs. Experts access the HRA annotations via spreadsheets and view reference object models in 3D editing tools. This paper introduces the Common Coordinate Framework (CCF) Ontology v2.0.1 that interlinks specimen, biological structure, and spatial data, together with the CCF API that makes the HRA programmatically accessible and interoperable with Linked Open Data (LOD). We detail how real-world user needs and experimental data guide CCF Ontology design and implementation, present CCF Ontology classes and properties together with exemplary usage, and report on validation methods. The CCF Ontology graph database and API are used in the HuBMAP portal, HRA Organ Gallery, and other applications that support data queries across multiple, heterogeneous sources.

Asunto(s)

Células , Bases de Datos Factuales , Humanos

2.

The scalable precision medicine open knowledge engine (SPOKE): a massive knowledge graph of biomedical information.

Morris, John H; Soman, Karthik; Akbas, Rabia E; Zhou, Xiaoyuan; Smith, Brett; Meng, Elaine C; Huang, Conrad C; Cerono, Gabriel; Schenk, Gundolf; Rizk-Jackson, Angela; Harroud, Adil; Sanders, Lauren; Costes, Sylvain V; Bharat, Krish; Chakraborty, Arjun; Pico, Alexander R; Mardirossian, Taline; Keiser, Michael; Tang, Alice; Hardi, Josef; Shi, Yongmei; Musen, Mark; Israni, Sharat; Huang, Sui; Rose, Peter W; Nelson, Charlotte A; Baranzini, Sergio E.

Bioinformatics ; 39(2)2023 02 03.

Artículo en Inglés | MEDLINE | ID: mdl-36759942

RESUMEN

MOTIVATION: Knowledge graphs (KGs) are being adopted in industry, commerce and academia. Biomedical KG presents a challenge due to the complexity, size and heterogeneity of the underlying information. RESULTS: In this work, we present the Scalable Precision Medicine Open Knowledge Engine (SPOKE), a biomedical KG connecting millions of concepts via semantically meaningful relationships. SPOKE contains 27 million nodes of 21 different types and 53 million edges of 55 types downloaded from 41 databases. The graph is built on the framework of 11 ontologies that maintain its structure, enable mappings and facilitate navigation. SPOKE is built weekly by python scripts which download each resource, check for integrity and completeness, and then create a 'parent table' of nodes and edges. Graph queries are translated by a REST API and users can submit searches directly via an API or a graphical user interface. Conclusions/Significance: SPOKE enables the integration of seemingly disparate information to support precision medicine efforts. AVAILABILITY AND IMPLEMENTATION: The SPOKE neighborhood explorer is available at https://spoke.rbvi.ucsf.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Reconocimiento de Normas Patrones Automatizadas , Medicina de Precisión , Bases de Datos Factuales

3.

Modeling community standards for metadata as templates makes data FAIR.

Musen, Mark A; O'Connor, Martin J; Schultes, Erik; Martínez-Romero, Marcos; Hardi, Josef; Graybeal, John.

Sci Data ; 9(1): 696, 2022 11 12.

Artículo en Inglés | MEDLINE | ID: mdl-36371407

RESUMEN

It is challenging to determine whether datasets are findable, accessible, interoperable, and reusable (FAIR) because the FAIR Guiding Principles refer to highly idiosyncratic criteria regarding the metadata used to annotate datasets. Specifically, the FAIR principles require metadata to be "rich" and to adhere to "domain-relevant" community standards. Scientific communities should be able to define their own machine-actionable templates for metadata that encode these "rich," discipline-specific elements. We have explored this template-based approach in the context of two software systems. One system is the CEDAR Workbench, which investigators use to author new metadata. The other is the FAIRware Workbench, which evaluates the metadata of archived datasets for their adherence to community standards. Benefits accrue when templates for metadata become central elements in an ecosystem of tools to manage online datasets-both because the templates serve as a community reference for what constitutes FAIR data, and because they embody that perspective in a form that can be distributed among a variety of software applications to assist with data stewardship and data sharing.

4.

Without appropriate metadata, data-sharing mandates are pointless.

Musen, Mark A.

Nature ; 609(7926): 222, 2022 09.

Artículo en Inglés | MEDLINE | ID: mdl-36064801

Asunto(s)

Difusión de la Información , Metadatos , Difusión de la Información/legislación & jurisprudencia , Metadatos/normas

5.

A biomedical open knowledge network harnesses the power of AI to understand deep human biology.

Baranzini, Sergio E; Börner, Katy; Morris, John; Nelson, Charlotte A; Soman, Karthik; Schleimer, Erica; Keiser, Michael; Musen, Mark; Pearce, Roger; Reza, Tahsin; Smith, Brett; Herr, Bruce W; Oskotsky, Boris; Rizk-Jackson, Angela; Rankin, Katherine P; Sanders, Stephan J; Bove, Riley; Rose, Peter W; Israni, Sharat; Huang, Sui.

AI Mag ; 43(1): 46-58, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36093122

RESUMEN

Knowledge representation and reasoning (KR&R) has been successfully implemented in many fields to enable computers to solve complex problems with AI methods. However, its application to biomedicine has been lagging in part due to the daunting complexity of molecular and cellular pathways that govern human physiology and pathology. In this article we describe concrete uses of SPOKE, an open knowledge network that connects curated information from 37 specialized and human-curated databases into a single property graph, with 3 million nodes and 15 million edges to date. Applications discussed in this article include drug discovery, COVID-19 research and chronic disease diagnosis and management.

6.

Design of a FAIR digital data health infrastructure in Africa for COVID-19 reporting and research.

van Reisen, Mirjam; Oladipo, Francisca; Stokmans, Mia; Mpezamihgo, Mouhamed; Folorunso, Sakinat; Schultes, Erik; Basajja, Mariam; Aktau, Aliya; Amare, Samson Yohannes; Taye, Getu Tadele; Purnama Jati, Putu Hadi; Chindoza, Kudakwashe; Wirtz, Morgane; Ghardallou, Meriem; van Stam, Gertjan; Ayele, Wondimu; Nalugala, Reginald; Abdullahi, Ibrahim; Osigwe, Obinna; Graybeal, John; Medhanyie, Araya Abrha; Kawu, Abdullahi Abubakar; Liu, Fenghong; Wolstencroft, Katy; Flikkenschild, Erik; Lin, Yi; Stocker, Joëlle; Musen, Mark A.

Adv Genet (Hoboken) ; 2(2): e10050, 2021 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-34514430

RESUMEN

The limited volume of COVID-19 data from Africa raises concerns for global genome research, which requires a diversity of genotypes for accurate disease prediction, including on the provenance of the new SARS-CoV-2 mutations. The Virus Outbreak Data Network (VODAN)-Africa studied the possibility of increasing the production of clinical data, finding concerns about data ownership, and the limited use of health data for quality treatment at point of care. To address this, VODAN Africa developed an architecture to record clinical health data and research data collected on the incidence of COVID-19, producing these as human- and machine-readable data objects in a distributed architecture of locally governed, linked, human- and machine-readable data. This architecture supports analytics at the point of care and-through data visiting, across facilities-for generic analytics. An algorithm was run across FAIR Data Points to visit the distributed data and produce aggregate findings. The FAIR data architecture is deployed in Uganda, Ethiopia, Liberia, Nigeria, Kenya, Somalia, Tanzania, Zimbabwe, and Tunisia.

7.

Using ethnographic methods to classify the human experience in medicine: a case study of the presence ontology.

Maitra, Amrapali; Kamdar, Maulik R; Zulman, Donna M; Haverfield, Marie C; Brown-Johnson, Cati; Schwartz, Rachel; Israni, Sonoo Thadaney; Verghese, Abraham; Musen, Mark A.

J Am Med Inform Assoc ; 28(9): 1900-1909, 2021 08 13.

Artículo en Inglés | MEDLINE | ID: mdl-34151988

RESUMEN

OBJECTIVE: Although social and environmental factors are central to provider-patient interactions, the data that reflect these factors can be incomplete, vague, and subjective. We sought to create a conceptual framework to describe and classify data about presence, the domain of interpersonal connection in medicine. METHODS: Our top-down approach for ontology development based on the concept of "relationality" included the following: 1) a broad survey of the social sciences literature and a systematic literature review of >20 000 articles around interpersonal connection in medicine, 2) relational ethnography of clinical encounters (n = 5 pilot, 27 full), and 3) interviews about relational work with 40 medical and nonmedical professionals. We formalized the model using the Web Ontology Language in the Protégé ontology editor. We iteratively evaluated and refined the Presence Ontology through manual expert review and automated annotation of literature. RESULTS AND DISCUSSION: The Presence Ontology facilitates the naming and classification of concepts that would otherwise be vague. Our model categorizes contributors to healthcare encounters and factors such as communication, emotions, tools, and environment. Ontology evaluation indicated that cognitive models (both patients' explanatory models and providers' caregiving approaches) influenced encounters and were subsequently incorporated. We show how ethnographic methods based in relationality can aid the representation of experiential concepts (eg, empathy, trust). Our ontology could support investigative methods to improve healthcare processes for both patients and healthcare providers, including annotation of videotaped encounters, development of clinical instruments to measure presence, or implementation of electronic health record-based reminders for providers. CONCLUSION: The Presence Ontology provides a model for using ethnographic approaches to classify interpersonal data.

Asunto(s)

Antropología Cultural , Comunicación , Personal de Salud , Humanos , Lenguaje , Confianza

8.

An empirical meta-analysis of the life sciences linked open data on the web.

Kamdar, Maulik R; Musen, Mark A.

Sci Data ; 8(1): 24, 2021 01 21.

Artículo en Inglés | MEDLINE | ID: mdl-33479214

RESUMEN

While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 biomedical linked open data sources into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from a biomedical perspective. We envision that the LSLOD schema graph and the findings from this research will aid researchers who wish to query and integrate data and knowledge from multiple biomedical sources simultaneously on the Web.

Asunto(s)

Disciplinas de las Ciencias Biológicas , Almacenamiento y Recuperación de la Información , Animales , Humanos , Metaanálisis como Asunto , Semántica

9.

Obstacles to the reuse of study metadata in ClinicalTrials.gov.

Miron, Laura; Gonçalves, Rafael S; Musen, Mark A.

Sci Data ; 7(1): 443, 2020 12 18.

Artículo en Inglés | MEDLINE | ID: mdl-33339830

RESUMEN

Metadata that are structured using principled schemas and that use terms from ontologies are essential to making biomedical data findable and reusable for downstream analyses. The largest source of metadata that describes the experimental protocol, funding, and scientific leadership of clinical studies is ClinicalTrials.gov. We evaluated whether values in 302,091 trial records adhere to expected data types and use terms from biomedical ontologies, whether records contain fields required by government regulations, and whether structured elements could replace free-text elements. Contact information, outcome measures, and study design are frequently missing or underspecified. Important fields for search, such as condition and intervention, are not restricted to ontologies, and almost half of the conditions are not denoted by MeSH terms, as recommended. Eligibility criteria are stored as semi-structured free text. Enforcing the presence of all required elements, requiring values for certain fields to be drawn from ontologies, and creating a structured eligibility criteria element would improve the reusability of data from ClinicalTrials.gov in systematic reviews, metanalyses, and matching of eligible patients to trials.

Asunto(s)

Ensayos Clínicos como Asunto , Bases de Datos Factuales , Metadatos , Proyectos de Investigación/normas , Conjuntos de Datos como Asunto

10.

OrderRex clinical user testing: a randomized trial of recommender system decision support on simulated cases.

Kumar, Andre; Aikens, Rachael C; Hom, Jason; Shieh, Lisa; Chiang, Jonathan; Morales, David; Saini, Divya; Musen, Mark; Baiocchi, Michael; Altman, Russ; Goldstein, Mary K; Asch, Steven; Chen, Jonathan H.

J Am Med Inform Assoc ; 27(12): 1850-1859, 2020 12 09.

Artículo en Inglés | MEDLINE | ID: mdl-33106874

RESUMEN

OBJECTIVE: To assess usability and usefulness of a machine learning-based order recommender system applied to simulated clinical cases. MATERIALS AND METHODS: 43 physicians entered orders for 5 simulated clinical cases using a clinical order entry interface with or without access to a previously developed automated order recommender system. Cases were randomly allocated to the recommender system in a 3:2 ratio. A panel of clinicians scored whether the orders placed were clinically appropriate. Our primary outcome included the difference in clinical appropriateness scores. Secondary outcomes included total number of orders, case time, and survey responses. RESULTS: Clinical appropriateness scores per order were comparable for cases randomized to the order recommender system (mean difference -0.11 order per score, 95% CI: [-0.41, 0.20]). Physicians using the recommender placed more orders (median 16 vs 15 orders, incidence rate ratio 1.09, 95%CI: [1.01-1.17]). Case times were comparable with the recommender system. Order suggestions generated from the recommender system were more likely to match physician needs than standard manual search options. Physicians used recommender suggestions in 98% of available cases. Approximately 95% of participants agreed the system would be useful for their workflows. DISCUSSION: User testing with a simulated electronic medical record interface can assess the value of machine learning and clinical decision support tools for clinician usability and acceptance before live deployments. CONCLUSIONS: Clinicians can use and accept machine learned clinical order recommendations integrated into an electronic order entry interface in a simulated setting. The clinical appropriateness of orders entered was comparable even when supported by automated recommendations.

Asunto(s)

Sistemas de Apoyo a Decisiones Clínicas , Registros Electrónicos de Salud , Sistemas de Entrada de Órdenes Médicas , Interfaz Usuario-Computador , Humanos , Almacenamiento y Recuperación de la Información/métodos , Aprendizaje Automático

11.

Toward a Harmonized WHO Family of International Classifications Content Model.

Tu, Samson W; Nyulas, Csongor I; Tudorache, Tania; Musen, Mark A; Martinuzzi, Andrea; van Gool, Coen; Mea, Vincenzo Della; Chute, Christopher G; Frattura, Lucilla; Hardiker, Nick; Napel, Huib Ten; Madden, Richard; Almborg, Ann-Helene; Ginige, Jeewani Anupama; Sykes, Catherine; Cekik, Can; Jakob, Robert.

Stud Health Technol Inform ; 270: 1409-1410, 2020 Jun 16.

Artículo en Inglés | MEDLINE | ID: mdl-32570683

RESUMEN

An overarching WHO-FIC Content Model will allow uniform modeling of classifications in the WHO Family of International Classifications (WHO-FIC) and promote their joint use. We provide an initial conceptualization of such a model.

Asunto(s)

Clasificación Internacional de Enfermedades , Organización Mundial de la Salud

12.

Physician Usage and Acceptance of a Machine Learning Recommender System for Simulated Clinical Order Entry.

Chiang, Jonathan; Kumar, Andre; Morales, David; Saini, Divya; Hom, Jason; Shieh, Lisa; Musen, Mark; Goldstein, Mary K; Chen, Jonathan H.

AMIA Jt Summits Transl Sci Proc ; 2020: 89-97, 2020.

Artículo en Inglés | MEDLINE | ID: mdl-32477627

RESUMEN

Clinical decision support tools that automatically disseminate patterns of clinical orders have the potential to improve patient care by reducing errors of omission and streamlining physician workflows. However, it is unknown if physicians will accept such tools or how their behavior will be affected. In this randomized controlled study, we exposed 34 licensed physicians to a clinical order entry interface and five simulated emergency cases, with randomized availability of a previously developed clinical order recommender system. With the recommender available, physicians spent similar time per case (6.7 minutes), but placed more total orders (17.1 vs. 15.8). The recommender demonstrated superior recall (59% vs 41%) and precision (25% vs 17%) compared to manual search results, and was positively received by physicians recognizing workflow benefits. Further studies must assess the potential clinical impact towards a future where electronic health records automatically anticipate clinical needs.

13.

Enabling Web-scale data integration in biomedicine through Linked Open Data.

Kamdar, Maulik R; Fernández, Javier D; Polleres, Axel; Tudorache, Tania; Musen, Mark A.

NPJ Digit Med ; 2: 90, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-31531395

RESUMEN

The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.

14.

Using association rule mining and ontologies to generate metadata recommendations from multiple biomedical databases.

Martínez-Romero, Marcos; O'Connor, Martin J; Egyedi, Attila L; Willrett, Debra; Hardi, Josef; Graybeal, John; Musen, Mark A.

Database (Oxford) ; 20192019 01 01.

Artículo en Inglés | MEDLINE | ID: mdl-31210270

RESUMEN

Metadata-the machine-readable descriptions of the data-are increasingly seen as crucial for describing the vast array of biomedical datasets that are currently being deposited in public repositories. While most public repositories have firm requirements that metadata must accompany submitted datasets, the quality of those metadata is generally very poor. A key problem is that the typical metadata acquisition process is onerous and time consuming, with little interactive guidance or assistance provided to users. Secondary problems include the lack of validation and sparse use of standardized terms or ontologies when authoring metadata. There is a pressing need for improvements to the metadata acquisition process that will help users to enter metadata quickly and accurately. In this paper, we outline a recommendation system for metadata that aims to address this challenge. Our approach uses association rule mining to uncover hidden associations among metadata values and to represent them in the form of association rules. These rules are then used to present users with real-time recommendations when authoring metadata. The novelties of our method are that it is able to combine analyses of metadata from multiple repositories when generating recommendations and can enhance those recommendations by aligning them with ontology terms. We implemented our approach as a service integrated into the CEDAR Workbench metadata authoring platform, and evaluated it using metadata from two public biomedical repositories: US-based National Center for Biotechnology Information BioSample and European Bioinformatics Institute BioSamples. The results show that our approach is able to use analyses of previously entered metadata coupled with ontology-based mappings to present users with accurate recommendations when authoring metadata.

Asunto(s)

Minería de Datos/métodos , Minería de Datos/normas , Bases de Datos Factuales/normas , Metadatos , Biología Computacional/normas

15.

The variable quality of metadata about biological samples used in biomedical experiments.

Gonçalves, Rafael S; Musen, Mark A.

Sci Data ; 6: 190021, 2019 02 19.

Artículo en Inglés | MEDLINE | ID: mdl-30778255

RESUMEN

We present an analytical study of the quality of metadata about samples used in biomedical experiments. The metadata under analysis are stored in two well-known databases: BioSample-a repository managed by the National Center for Biotechnology Information (NCBI), and BioSamples-a repository managed by the European Bioinformatics Institute (EBI). We tested whether 11.4 M sample metadata records in the two repositories are populated with values that fulfill the stated requirements for such values. Our study revealed multiple anomalies in the metadata. Most metadata field names and their values are not standardized or controlled. Even simple binary or numeric fields are often populated with inadequate values of different data types. By clustering metadata field names, we discovered there are often many distinct ways to represent the same aspect of a sample. Overall, the metadata we analyzed reveal that there is a lack of principled mechanisms to enforce and validate metadata requirements. The significant aberrancies that we found in the metadata are likely to impede search and secondary use of the associated datasets.

Asunto(s)

Bancos de Muestras Biológicas , Metadatos/normas , Exactitud de los Datos

16.

Unleashing the value of Common Data Elements through the CEDAR Workbench.

O'Connor, Martin J; Warzel, Denise B; Martínez-Romero, Marcos; Hardi, Josef; Willrett, Debra; Egyedi, Attila L; Eftekhari, Aras; Graybeal, John; Musen, Mark A.

AMIA Annu Symp Proc ; 2019: 681-690, 2019.

Artículo en Inglés | MEDLINE | ID: mdl-32308863

RESUMEN

Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-precise specifications of questions and the set of allowable answers to each question-are increasingly being adopted to help meet these standardization goals. While CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions. As a result, CDEs defined in one system cannot be easily be reused by other systems. An additional problem is that current CDE-based systems tend to be rather heavyweight and cannot be easily adopted and used by third-parties. To address these problems, we developed extensions to a metadata management system called the CEDAR Workbench to provide a platform to simplify the creation, exchange, and use of CDEs. We show how the resulting system allows users to quickly define and share CDEs and to immediately use these CDEs to build and deploy Web-based forms to acquire conforming metadata. We also show how we incorporated a large CDE library from the National Cancer Institute's caDSR system and made these CDEs publicly available for general use.

Asunto(s)

Investigación Biomédica , Elementos de Datos Comunes , Recolección de Datos/normas , Manejo de Datos/métodos , Elementos de Datos Comunes/normas , Manejo de Datos/normas , Humanos , Internet , Metadatos , National Institutes of Health (U.S.) , Sistema de Registros , Estados Unidos , Interfaz Usuario-Computador

17.

The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.

Bukhari, Syed Ahmad Chan; O'Connor, Martin J; Martínez-Romero, Marcos; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A; Rubelt, Florian; Cheung, Kei-Hoi; Kleinstein, Steven H.

Front Immunol ; 9: 1877, 2018.

Artículo en Inglés | MEDLINE | ID: mdl-30166985

RESUMEN

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.

Asunto(s)

Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Receptores de Antígenos de Linfocitos B/genética , Receptores de Antígenos de Linfocitos T/genética , Programas Informáticos , Biología Computacional/organización & administración , Minería de Datos , Ontología de Genes , Humanos , Metadatos , Reproducibilidad de los Resultados , Interfaz Usuario-Computador , Flujo de Trabajo

18.

CEDAR OnDemand: a browser extension to generate ontology-based scientific metadata.

Bukhari, Syed Ahmad Chan; Martínez-Romero, Marcos; O' Connor, Martin J; Egyedi, Attila L; Willrett, Debra; Graybeal, John; Musen, Mark A; Cheung, Kei-Hoi; Kleinstein, Steven H.

BMC Bioinformatics ; 19(1): 268, 2018 07 16.

Artículo en Inglés | MEDLINE | ID: mdl-30012108

RESUMEN

BACKGROUND: Public biomedical data repositories often provide web-based interfaces to collect experimental metadata. However, these interfaces typically reflect the ad hoc metadata specification practices of the associated repositories, leading to a lack of standardization in the collected metadata. This lack of standardization limits the ability of the source datasets to be broadly discovered, reused, and integrated with other datasets. To increase reuse, discoverability, and reproducibility of the described experiments, datasets should be appropriately annotated by using agreed-upon terms, ideally from ontologies or other controlled term sources. RESULTS: This work presents "CEDAR OnDemand", a browser extension powered by the NCBO (National Center for Biomedical Ontology) BioPortal that enables users to seamlessly enter ontology-based metadata through existing web forms native to individual repositories. CEDAR OnDemand analyzes the web page contents to identify the text input fields and associate them with relevant ontologies which are recommended automatically based upon input fields' labels (using the NCBO ontology recommender) and a pre-defined list of ontologies. These field-specific ontologies are used for controlling metadata entry. CEDAR OnDemand works for any web form designed in the HTML format. We demonstrate how CEDAR OnDemand works through the NCBI (National Center for Biotechnology Information) BioSample web-based metadata entry. CONCLUSION: CEDAR OnDemand helps lower the barrier of incorporating ontologies into standardized metadata entry for public data repositories. CEDAR OnDemand is available freely on the Google Chrome store https://chrome.google.com/webstore/search/CEDAROnDemand.

Asunto(s)

Ontologías Biológicas , Internet , Metadatos , Programas Informáticos , Algoritmos , Humanos

19.

Analyzing user interactions with biomedical ontologies: A visual perspective.

Kamdar, Maulik R; Walk, Simon; Tudorache, Tania; Musen, Mark A.

Web Semant ; 49: 16-30, 2018 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-29657560

RESUMEN

Biomedical ontologies are large: Several ontologies in the BioPortal repository contain thousands or even hundreds of thousands of entities. The development and maintenance of such large ontologies is difficult. To support ontology authors and repository developers in their work, it is crucial to improve our understanding of how these ontologies are explored, queried, reused, and used in downstream applications by biomedical researchers. We present an exploratory empirical analysis of user activities in the BioPortal ontology repository by analyzing BioPortal interaction logs across different access modes over several years. We investigate how users of BioPortal query and search for ontologies and their classes, how they explore the ontologies, and how they reuse classes from different ontologies. Additionally, through three real-world scenarios, we not only analyze the usage of ontologies for annotation tasks but also compare it to the browsing and querying behaviors of BioPortal users. For our investigation, we use several different visualization techniques. To inspect large amounts of interaction, reuse, and real-world usage data at a glance, we make use of and extend PolygOnto, a visualization method that has been successfully used to analyze reuse of ontologies in previous work. Our results show that exploration, query, reuse, and actual usage behaviors rarely align, suggesting that different users tend to explore, query and use different parts of an ontology. Finally, we highlight and discuss differences and commonalities among users of BioPortal.

20.

Interpretation of biological experiments changes with evolution of the Gene Ontology and its annotations.

Tomczak, Aurelie; Mortensen, Jonathan M; Winnenburg, Rainer; Liu, Charles; Alessi, Dominique T; Swamy, Varsha; Vallania, Francesco; Lofgren, Shane; Haynes, Winston; Shah, Nigam H; Musen, Mark A; Khatri, Purvesh.

Sci Rep ; 8(1): 5115, 2018 03 23.

Artículo en Inglés | MEDLINE | ID: mdl-29572502

RESUMEN

Gene Ontology (GO) enrichment analysis is ubiquitously used for interpreting high throughput molecular data and generating hypotheses about underlying biological phenomena of experiments. However, the two building blocks of this analysis - the ontology and the annotations - evolve rapidly. We used gene signatures derived from 104 disease analyses to systematically evaluate how enrichment analysis results were affected by evolution of the GO over a decade. We found low consistency between enrichment analyses results obtained with early and more recent GO versions. Furthermore, there continues to be a strong annotation bias in the GO annotations where 58% of the annotations are for 16% of the human genes. Our analysis suggests that GO evolution may have affected the interpretation and possibly reproducibility of experiments over time. Hence, researchers must exercise caution when interpreting GO enrichment analyses and should reexamine previous analyses with the most recent GO version.

Asunto(s)

Biología Computacional , Bases de Datos Genéticas , Evolución Molecular , Ontología de Genes , Modelos Genéticos , Anotación de Secuencia Molecular , Humanos , Reproducibilidad de los Resultados

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA