Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
1.
NPJ Digit Med ; 2: 90, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31531395

RESUMO

The biomedical data landscape is fragmented with several isolated, heterogeneous data and knowledge sources, which use varying formats, syntaxes, schemas, and entity notations, existing on the Web. Biomedical researchers face severe logistical and technical challenges to query, integrate, analyze, and visualize data from multiple diverse sources in the context of available biomedical knowledge. Semantic Web technologies and Linked Data principles may aid toward Web-scale semantic processing and data integration in biomedicine. The biomedical research community has been one of the earliest adopters of these technologies and principles to publish data and knowledge on the Web as linked graphs and ontologies, hence creating the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we provide our perspective on some opportunities proffered by the use of LSLOD to integrate biomedical data and knowledge in three domains: (1) pharmacology, (2) cancer research, and (3) infectious diseases. We will discuss some of the major challenges that hinder the wide-spread use and consumption of LSLOD by the biomedical research community. Finally, we provide a few technical solutions and insights that can address these challenges. Eventually, LSLOD can enable the development of scalable, intelligent infrastructures that support artificial intelligence methods for augmenting human intelligence to achieve better clinical outcomes for patients, to enhance the quality of biomedical research, and to improve our understanding of living systems.

2.
AMIA Annu Symp Proc ; 2019: 681-690, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32308863

RESUMO

Developing promising treatments in biomedicine often requires aggregation and analysis of data from disparate sources across the healthcare and research spectrum. To facilitate these approaches, there is a growing focus on supporting interoperation of datasets by standardizing data-capture and reporting requirements. Common Data Elements (CDEs)-precise specifications of questions and the set of allowable answers to each question-are increasingly being adopted to help meet these standardization goals. While CDEs can provide a strong conceptual foundation for interoperation, there are no widely recognized serialization or interchange formats to describe and exchange their definitions. As a result, CDEs defined in one system cannot be easily be reused by other systems. An additional problem is that current CDE-based systems tend to be rather heavyweight and cannot be easily adopted and used by third-parties. To address these problems, we developed extensions to a metadata management system called the CEDAR Workbench to provide a platform to simplify the creation, exchange, and use of CDEs. We show how the resulting system allows users to quickly define and share CDEs and to immediately use these CDEs to build and deploy Web-based forms to acquire conforming metadata. We also show how we incorporated a large CDE library from the National Cancer Institute's caDSR system and made these CDEs publicly available for general use.


Assuntos
Pesquisa Biomédica , Elementos de Dados Comuns , Coleta de Dados/normas , Gerenciamento de Dados/métodos , Elementos de Dados Comuns/normas , Gerenciamento de Dados/normas , Humanos , Internet , Metadados , National Institutes of Health (U.S.) , Sistema de Registros , Estados Unidos , Interface Usuário-Computador
3.
Front Immunol ; 9: 1877, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30166985

RESUMO

The adaptation of high-throughput sequencing to the B cell receptor and T cell receptor has made it possible to characterize the adaptive immune receptor repertoire (AIRR) at unprecedented depth. These AIRR sequencing (AIRR-seq) studies offer tremendous potential to increase the understanding of adaptive immune responses in vaccinology, infectious disease, autoimmunity, and cancer. The increasingly wide application of AIRR-seq is leading to a critical mass of studies being deposited in the public domain, offering the possibility of novel scientific insights through secondary analyses and meta-analyses. However, effective sharing of these large-scale data remains a challenge. The AIRR community has proposed minimal information about adaptive immune receptor repertoire (MiAIRR), a standard for reporting AIRR-seq studies. The MiAIRR standard has been operationalized using the National Center for Biotechnology Information (NCBI) repositories. Submissions of AIRR-seq data to the NCBI repositories typically use a combination of web-based and flat-file templates and include only a minimal amount of terminology validation. As a result, AIRR-seq studies at the NCBI are often described using inconsistent terminologies, limiting scientists' ability to access, find, interoperate, and reuse the data sets. In order to improve metadata quality and ease submission of AIRR-seq studies to the NCBI, we have leveraged the software framework developed by the Center for Expanded Data Annotation and Retrieval (CEDAR), which develops technologies involving the use of data standards and ontologies to improve metadata quality. The resulting CEDAR-AIRR (CAIRR) pipeline enables data submitters to: (i) create web-based templates whose entries are controlled by ontology terms, (ii) generate and validate metadata, and (iii) submit the ontology-linked metadata and sequence files (FASTQ) to the NCBI BioProject, BioSample, and Sequence Read Archive databases. Overall, CAIRR provides a web-based metadata submission interface that supports compliance with the MiAIRR standard. This pipeline is available at http://cairr.miairr.org, and will facilitate the NCBI submission process and improve the metadata quality of AIRR-seq studies.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Ácidos Nucleicos , Receptores de Antígenos de Linfócitos B/genética , Receptores de Antígenos de Linfócitos T/genética , Software , Biologia Computacional/organização & administração , Mineração de Dados , Ontologia Genética , Humanos , Metadados , Reprodutibilidade dos Testes , Interface Usuário-Computador , Fluxo de Trabalho
4.
Sci Data ; 4: 170125, 2017 09 19.
Artigo em Inglês | MEDLINE | ID: mdl-28925997

RESUMO

The Gene Expression Omnibus (GEO) contains more than two million digital samples from functional genomics experiments amassed over almost two decades. However, individual sample meta-data remains poorly described by unstructured free text attributes preventing its largescale reanalysis. We introduce the Search Tag Analyze Resource for GEO as a web application (http://STARGEO.org) to curate better annotations of sample phenotypes uniformly across different studies, and to use these sample annotations to define robust genomic signatures of disease pathology by meta-analysis. In this paper, we target a small group of biomedical graduate students to show rapid crowd-curation of precise sample annotations across all phenotypes, and we demonstrate the biological validity of these crowd-curated annotations for breast cancer. STARGEO.org makes GEO data findable, accessible, interoperable and reusable (i.e., FAIR) to ultimately facilitate knowledge discovery. Our work demonstrates the utility of crowd-curation and interpretation of open 'big data' under FAIR principles as a first step towards realizing an ideal paradigm of precision medicine.


Assuntos
Curadoria de Dados , Bases de Dados Genéticas , Expressão Gênica , Humanos
5.
J Biomed Inform ; 68: 20-34, 2017 04.
Artigo em Inglês | MEDLINE | ID: mdl-28192233

RESUMO

The International Classification of Diseases (ICD) is the de facto standard international classification for mortality reporting and for many epidemiological, clinical, and financial use cases. The next version of ICD, ICD-11, will be submitted for approval by the World Health Assembly in 2018. Unlike previous versions of ICD, where coders mostly select single codes from pre-enumerated disease and disorder codes, ICD-11 coding will allow extensive use of multiple codes to give more detailed disease descriptions. For example, "severe malignant neoplasms of left breast" may be coded using the combination of a "stem code" (e.g., code for malignant neoplasms of breast) with a variety of "extension codes" (e.g., codes for laterality and severity). The use of multiple codes (a process called post-coordination), while avoiding the pitfall of having to pre-enumerate vast number of possible disease and qualifier combinations, risks the creation of meaningless expressions that combine stem codes with inappropriate qualifiers. To prevent that from happening, "sanctioning rules" that define legal combinations are necessary. In this work, we developed a crowdsourcing method for obtaining sanctioning rules for the post-coordination of concepts in ICD-11. Our method utilized the hierarchical structures in the domain to improve the accuracy of the sanctioning rules and to lower the crowdsourcing cost. We used Bayesian networks to model crowd workers' skills, the accuracy of their responses, and our confidence in the acquired sanctioning rules. We applied reinforcement learning to develop an agent that constantly adjusted the confidence cutoffs during the crowdsourcing process to maximize the overall quality of sanctioning rules under a fixed budget. Finally, we performed formative evaluations using a skin-disease branch of the draft ICD-11 and demonstrated that the crowd-sourced sanctioning rules replicated those defined by an expert dermatologist with high precision and recall. This work demonstrated that a crowdsourcing approach could offer a reasonably efficient method for generating a first draft of sanctioning rules that subject matter experts could verify and edit, thus relieving them of the tedium and cost of formulating the initial set of rules.


Assuntos
Teorema de Bayes , Crowdsourcing , Classificação Internacional de Doenças , Humanos , Neoplasias
6.
J Biomed Inform ; 51: 254-71, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24953242

RESUMO

Biomedical taxonomies, thesauri and ontologies in the form of the International Classification of Diseases as a taxonomy or the National Cancer Institute Thesaurus as an OWL-based ontology, play a critical role in acquiring, representing and processing information about human health. With increasing adoption and relevance, biomedical ontologies have also significantly increased in size. For example, the 11th revision of the International Classification of Diseases, which is currently under active development by the World Health Organization contains nearly 50,000 classes representing a vast variety of different diseases and causes of death. This evolution in terms of size was accompanied by an evolution in the way ontologies are engineered. Because no single individual has the expertise to develop such large-scale ontologies, ontology-engineering projects have evolved from small-scale efforts involving just a few domain experts to large-scale projects that require effective collaboration between dozens or even hundreds of experts, practitioners and other stakeholders. Understanding the way these different stakeholders collaborate will enable us to improve editing environments that support such collaborations. In this paper, we uncover how large ontology-engineering projects, such as the International Classification of Diseases in its 11th revision, unfold by analyzing usage logs of five different biomedical ontology-engineering projects of varying sizes and scopes using Markov chains. We discover intriguing interaction patterns (e.g., which properties users frequently change after specific given ones) that suggest that large collaborative ontology-engineering projects are governed by a few general principles that determine and drive development. From our analysis, we identify commonalities and differences between different projects that have implications for project managers, ontology editors, developers and contributors working on collaborative ontology-engineering projects and tools in the biomedical domain.


Assuntos
Ontologias Biológicas , Comportamento Cooperativo , Cadeias de Markov , Modelos Estatísticos , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Simulação por Computador , Interpretação Estatística de Dados , Classificação Internacional de Doenças/classificação , Classificação Internacional de Doenças/organização & administração , Internacionalidade , Semântica
7.
BMC Bioinformatics ; 10 Suppl 2: S1, 2009 Feb 05.
Artigo em Inglês | MEDLINE | ID: mdl-19208184

RESUMO

The volume of publicly available genomic scale data is increasing. Genomic datasets in public repositories are annotated with free-text fields describing the pathological state of the studied sample. These annotations are not mapped to concepts in any ontology, making it difficult to integrate these datasets across repositories. We have previously developed methods to map text-annotations of tissue microarrays to concepts in the NCI thesaurus and SNOMED-CT. In this work we generalize our methods to map text annotations of gene expression datasets to concepts in the UMLS. We demonstrate the utility of our methods by processing annotations of datasets in the Gene Expression Omnibus. We demonstrate that we enable ontology-based querying and integration of tissue and gene expression microarray data. We enable identification of datasets on specific diseases across both repositories. Our approach provides the basis for ontology-driven data integration for translational research on gene and protein expression data. Based on this work we have built a prototype system for ontology based annotation and indexing of biomedical data. The system processes the text metadata of diverse resource elements such as gene expression data sets, descriptions of radiology images, clinical-trial reports, and PubMed article abstracts to annotate and index them with concepts from appropriate ontologies. The key functionality of this system is to enable users to locate biomedical data resources related to particular ontology concepts.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica , Análise de Sequência com Séries de Oligonucleotídeos , Vocabulário Controlado , Bases de Dados Factuais , Genômica , Armazenamento e Recuperação da Informação , Unified Medical Language System
8.
AMIA Annu Symp Proc ; : 520-4, 2008 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-18998901

RESUMO

The development of ontologies that define entities and relationships among them has become essential for modern work in biomedicine. Ontologies are becoming so large in their coverage that no single centralized group of people can develop them effectively and ontology development becomes a community-based enterprise. In this paper we present Collaborative Protégé-a prototype tool that supports many aspects of community-based development, such as discussions integrated with ontology-editing process, chats, and annotation of changes. We have evaluated Collaborative Protégé in the context of the NCI Thesaurus development. Users have found the tool effective for carrying out discussions and recording design rationale.


Assuntos
Comportamento Cooperativo , Armazenamento e Recuperação da Informação/métodos , Processamento de Linguagem Natural , Neoplasias/classificação , Reconhecimento Automatizado de Padrão/métodos , Software , Descritores , Vocabulário Controlado , Algoritmos , Inteligência Artificial , Humanos , Estados Unidos
9.
Appl Ontol ; 3(3): 173-190, 2008 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-19789731

RESUMO

The National Cancer Institute's (NCI) Thesaurus is a biomedical reference ontology. The NCI Thesaurus is represented using Description Logic, more specifically Ontylog, a Description logic implemented by Apelon, Inc. We are exploring the use of the DL species of the Web Ontology Language (OWL DL)-a W3C recommended standard for ontology representation-instead of Ontylog for representing the NCI Thesaurus. We have studied the requirements for knowledge representation of the NCI Thesaurus, and considered how OWL DL (and its implementation in Protégé-OWL) satisfies these requirements. In this paper, we discuss the areas where OWL DL was sufficient for representing required components, where tool support that would hide some of the complexity and extra levels of indirection would be required, and where language expressiveness is not sufficient given the representation requirements. Because many of the knowledge-representation issues that we encountered are very similar to the issues in representing other biomedical terminologies and ontologies in general, we believe that the lessons that we learned and the approaches that we developed will prove useful and informative for other researchers.

10.
BMC Bioinformatics ; 8: 296, 2007 Aug 08.
Artigo em Inglês | MEDLINE | ID: mdl-17686183

RESUMO

BACKGROUND: The Stanford Tissue Microarray Database (TMAD) is a repository of data serving a consortium of pathologists and biomedical researchers. The tissue samples in TMAD are annotated with multiple free-text fields, specifying the pathological diagnoses for each sample. These text annotations are not structured according to any ontology, making future integration of this resource with other biological and clinical data difficult. RESULTS: We developed methods to map these annotations to the NCI thesaurus. Using the NCI-T we can effectively represent annotations for about 86% of the samples. We demonstrate how this mapping enables ontology driven integration and querying of tissue microarray data. We have deployed the mapping and ontology driven querying tools at the TMAD site for general use. CONCLUSION: We have demonstrated that we can effectively map the diagnosis-related terms describing a sample in TMAD to the NCI-T. The NCI thesaurus terms have a wide coverage and provide terms for about 86% of the samples. In our opinion the NCI thesaurus can facilitate integration of this resource with other biological data.


Assuntos
Bases de Dados Factuais , Documentação/métodos , Armazenamento e Recuperação da Informação/métodos , National Library of Medicine (U.S.) , Proteínas de Neoplasias/classificação , Proteínas de Neoplasias/metabolismo , Neoplasias/metabolismo , Análise Serial de Tecidos/métodos , Vocabulário Controlado , Sistemas de Gerenciamento de Base de Dados , Perfilação da Expressão Gênica/métodos , Estados Unidos
11.
AMIA Annu Symp Proc ; : 709-13, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17238433

RESUMO

The Stanford Tissue Microarray Database (TMAD) is a repository of data amassed by a consortium of pathologists and biomedical researchers. The TMAD data are annotated with multiple free-text fields, specifying the pathological diagnoses for each tissue sample. These annotations are spread out over multiple text fields and are not structured according to any ontology, making it difficult to integrate this resource with other biological and clinical data. We developed methods to map these annotations to the NCI thesaurus and the SNOMED-CT ontologies. Using these two ontologies we can effectively represent about 80% of the annotations in a structured manner. This mapping offers the ability to perform ontology driven querying of the TMAD data. We also found that 40% of annotations can be mapped to terms from both ontologies, providing the potential to align the two ontologies based on experimental data. Our approach provides the basis for a data-driven ontology alignment by mapping annotations of experimental data.


Assuntos
Análise em Microsséries , Neoplasias/classificação , Systematized Nomenclature of Medicine , Vocabulário Controlado , Humanos , Imuno-Histoquímica , National Institutes of Health (U.S.) , Neoplasias/diagnóstico , Estados Unidos
12.
BMJ ; 324(7337): 577-81, 2002 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-11884322

RESUMO

OBJECTIVES: To determine the characteristics of popular breast cancer related websites and whether more popular sites are of higher quality. DESIGN: The search engine Google was used to generate a list of websites about breast cancer. Google ranks search results by measures of link popularity---the number of links to a site from other sites. The top 200 sites returned in response to the query "breast cancer" were divided into "more popular" and "less popular" subgroups by three different measures of link popularity: Google rank and number of links reported independently by Google and by AltaVista (another search engine). MAIN OUTCOME MEASURES: Type and quality of content. RESULTS: More popular sites according to Google rank were more likely than less popular ones to contain information on ongoing clinical trials (27% v 12%, P=0.01 ), results of trials (12% v 3%, P=0.02), and opportunities for psychosocial adjustment (48% v 23%, P<0.01). These characteristics were also associated with higher number of links as reported by Google and AltaVista. More popular sites by number of linking sites were also more likely to provide updates on other breast cancer research, information on legislation and advocacy, and a message board service. Measures of quality such as display of authorship, attribution or references, currency of information, and disclosure did not differ between groups. CONCLUSIONS: Popularity of websites is associated with type rather than quality of content. Sites that include content correlated with popularity may best meet the public's desire for information about breast cancer.


Assuntos
Neoplasias da Mama/terapia , Serviços de Informação/normas , Internet/normas , Estudos Transversais , Feminino , Educação em Saúde/métodos , Humanos , Serviços de Informação/estatística & dados numéricos , Internet/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA