Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 16: 138, 2015 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-25925131

RESUMO

BACKGROUND: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies. RESULTS: The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available. CONCLUSIONS: A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.


Assuntos
Indexação e Redação de Resumos/métodos , Algoritmos , Medical Subject Headings , Processamento de Linguagem Natural , PubMed , Semântica , Software , Humanos , National Library of Medicine (U.S.) , Estados Unidos
2.
Sci Data ; 10(1): 318, 2023 05 25.
Artigo em Inglês | MEDLINE | ID: mdl-37231010

RESUMO

Graffiti is an urban phenomenon that is increasingly attracting the interest of the sciences. To the best of our knowledge, no suitable data corpora are available for systematic research until now. The Information System Graffiti in Germany project (INGRID) closes this gap by dealing with graffiti image collections that have been made available to the project for public use. Within INGRID, the graffiti images are collected, digitized and annotated. With this work, we aim to support the rapid access to a comprehensive data source on INGRID targeted especially by researchers. In particular, we present INGRIDKG, an RDF knowledge graph of annotated graffiti, abides by the Linked Data and FAIR principles. We weekly update INGRIDKG by augmenting the new annotated graffiti to our knowledge graph. Our generation pipeline applies RDF data conversion, link discovery and data fusion approaches to the original data. The current version of INGRIDKG contains 460,640,154 triples and is linked to 3 other knowledge graphs by over 200,000 links. In our use case studies, we demonstrate the usefulness of our knowledge graph for different applications.

3.
Sci Data ; 9(1): 389, 2022 07 08.
Artigo em Inglês | MEDLINE | ID: mdl-35803947

RESUMO

The rapid generation of large amounts of information about the coronavirus SARS-CoV-2 and the disease COVID-19 makes it increasingly difficult to gain a comprehensive overview of current insights related to the disease. With this work, we aim to support the rapid access to a comprehensive data source on COVID-19 targeted especially at researchers. Our knowledge graph, COVIDPUBGRAPH, an RDF knowledge graph of scientific publications, abides by the Linked Data and FAIR principles. The base dataset for the extraction is CORD-19, a dataset of COVID-19-related publications, which is updated regularly. Consequently, COVIDPUBGRAPH is updated biweekly. Our generation pipeline applies named entity recognition, entity linking and link discovery approaches to the original data. The current version of COVIDPUBGRAPH contains 268,108,670 triples and is linked to 9 other datasets by over 1 million links. In our use case studies, we demonstrate the usefulness of our knowledge graph for different applications. COVIDPUBGRAPH is publicly available under the Creative Commons Attribution 4.0 International license.


Assuntos
COVID-19 , Humanos , Conhecimento , Publicações , SARS-CoV-2
4.
J Biomed Semantics ; 5: 47, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25937882

RESUMO

BACKGROUD: The Cancer Genome Atlas (TCGA) is a multidisciplinary, multi-institutional effort to catalogue genetic mutations responsible for cancer using genome analysis techniques. One of the aims of this project is to create a comprehensive and open repository of cancer related molecular analysis, to be exploited by bioinformaticians towards advancing cancer knowledge. However, devising bioinformatics applications to analyse such large dataset is still challenging, as it often requires downloading large archives and parsing the relevant text files. Therefore, it is making it difficult to enable virtual data integration in order to collect the critical co-variates necessary for analysis. METHODS: We address these issues by transforming the TCGA data into the Semantic Web standard Resource Description Format (RDF), link it to relevant datasets in the Linked Open Data (LOD) cloud and further propose an efficient data distribution strategy to host the resulting 20.4 billion triples data via several SPARQL endpoints. Having the TCGA data distributed across multiple SPARQL endpoints, we enable biomedical scientists to query and retrieve information from these SPARQL endpoints by proposing a TCGA tailored federated SPARQL query processing engine named TopFed. RESULTS: We compare TopFed with a well established federation engine FedX in terms of source selection and query execution time by using 10 different federated SPARQL queries with varying requirements. Our evaluation results show that TopFed selects on average less than half of the sources (with 100% recall) with query execution time equal to one third to that of FedX. CONCLUSION: With TopFed, we aim to offer biomedical scientists a single-point-of-access through which distributed TCGA data can be accessed in unison. We believe the proposed system can greatly help researchers in the biomedical domain to carry out their research effectively with TCGA as the amount and diversity of data exceeds the ability of local resources to handle its retrieval and parsing.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA