Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
J Biomed Semantics ; 13(1): 14, 2022 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-35606797

RESUMO

BACKGROUND: The evidence-based medicine paradigm requires the ability to aggregate and compare outcomes of interventions across different trials. This can be facilitated and partially automatized by information extraction systems. In order to support the development of systems that can extract information from published clinical trials at a fine-grained and comprehensive level to populate a knowledge base, we present a richly annotated corpus at two levels. At the first level, entities that describe components of the PICO elements (e.g., population's age and pre-conditions, dosage of a treatment, etc.) are annotated. The second level comprises schema-level (i.e., slot-filling templates) annotations corresponding to complex PICO elements and other concepts related to a clinical trial (e.g. the relation between an intervention and an arm, the relation between an outcome and an intervention, etc.). RESULTS: The final corpus includes 211 annotated clinical trial abstracts with substantial agreement between annotators at the entity and scheme level. The mean Kappa value for the glaucoma and T2DM corpora was 0.74 and 0.68, respectively, for single entities. The micro-averaged F1 score to measure inter-annotator agreement for complex entities (i.e. slot-filling templates) was 0.81.The BERT-base baseline method for entity recognition achieved average micro- F1 scores of 0.76 for glaucoma and 0.77 for diabetes with exact matching. CONCLUSIONS: In this work, we have created a corpus that goes beyond the existing clinical trial corpora, since it is annotated in a schematic way that represents the classes and properties defined in an ontology. Although the corpus is small, it has fine-grained annotations and could be used to fine-tune pre-trained machine learning models and transformers to the specific task of extracting information about clinical trial abstracts.For future work, we will use the corpus for training information extraction systems that extract single entities, and predict template slot-fillers (i.e., class data/object properties) to populate a knowledge base that relies on the C-TrO ontology for the description of clinical trials. The resulting corpus and the code to measure inter-annotation agreement and the baseline method are publicly available at https://zenodo.org/record/6365890.


Assuntos
Ensaios Clínicos como Assunto , Glaucoma , Armazenamento e Recuperação da Informação , Aprendizado de Máquina , Humanos , Bases de Conhecimento , Processamento de Linguagem Natural
2.
J Biomed Semantics ; 13(1): 16, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35659056

RESUMO

BACKGROUND: Evidence-based medicine propagates that medical/clinical decisions are made by taking into account high-quality evidence, most notably in the form of randomized clinical trials. Evidence-based decision-making requires aggregating the evidence available in multiple trials to reach -by means of systematic reviews- a conclusive recommendation on which treatment is best suited for a given patient population. However, it is challenging to produce systematic reviews to keep up with the ever-growing number of published clinical trials. Therefore, new computational approaches are necessary to support the creation of systematic reviews that include the most up-to-date evidence.We propose a method to synthesize the evidence available in clinical trials in an ad-hoc and on-demand manner by automatically arranging such evidence in the form of a hierarchical argument that recommends a therapy as being superior to some other therapy along a number of key dimensions corresponding to the clinical endpoints of interest. The method has also been implemented as a web tool that allows users to explore the effects of excluding different points of evidence, and indicating relative preferences on the endpoints. RESULTS: Through two use cases, our method was shown to be able to generate conclusions similar to the ones of published systematic reviews. To evaluate our method implemented as a web tool, we carried out a survey and usability analysis with medical professionals. The results show that the tool was perceived as being valuable, acknowledging its potential to inform clinical decision-making and to complement the information from existing medical guidelines. CONCLUSIONS: The method presented is a simple but yet effective argumentation-based method that contributes to support the synthesis of clinical trial evidence. A current limitation of the method is that it relies on a manually populated knowledge base. This problem could be alleviated by deploying natural language processing methods to extract the relevant information from publications.


Assuntos
Medicina Baseada em Evidências , Árvores , Humanos , Projetos de Pesquisa , Revisões Sistemáticas como Assunto
3.
Brief Funct Genomic Proteomic ; 8(3): 199-212, 2009 May.
Artigo em Inglês | MEDLINE | ID: mdl-19734302

RESUMO

We describe various types of outliers seen in Affymetrix GeneChip data. We have been able to utilise the data in the Gene Expression Omnibus to screen GeneChips across a range of scales, from single probes, to spatially adjacent fractions of arrays, to whole arrays, to whole experiments. In this review we describe a number of causes for why some reported intensities might be misleading on GeneChips.


Assuntos
Artefatos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Sequência de Bases , Sondas de DNA/metabolismo , Humanos , Dados de Sequência Molecular , Estatística como Assunto
4.
Bioinformatics ; 23(13): i424-32, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17646327

RESUMO

MOTIVATION: Negative information about protein-protein interactions--from uncertainty about the occurrence of an interaction to knowledge that it did not occur--is often of great use to biologists and could lead to important discoveries. Yet, to our knowledge, no proposals focusing on extracting such information have been proposed in the text mining literature. RESULTS: In this work, we present an analysis of the types of negative information that is reported, and a heuristic-based system using a full dependency parser to extract such information. We performed a preliminary evaluation study that shows encouraging results of our system. Finally, we have obtained an initial corpus of negative protein-protein interactions as basis for the construction of larger ones. AVAILABILITY: The corpus is available by request from the authors.


Assuntos
Algoritmos , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Processamento de Linguagem Natural , Publicações Periódicas como Assunto , Mapeamento de Interação de Proteínas/métodos , Proteoma/metabolismo , Transdução de Sinais/fisiologia
5.
J Integr Bioinform ; 7(2): 111, 2010 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-20134078

RESUMO

A tetramer quadruplex structure is formed by four parallel strands of DNA/ RNA containing runs of guanine. These quadruplexes are able to form because guanine can Hoogsteen hydrogen bond to other guanines, and a tetrad of guanines can form a stable arrangement. Recently we have discovered that probes on Affymetrix GeneChips that contain runs of guanine do not measure gene expression reliably. We associate this finding with the likelihood that quadruplexes are forming on the surface of GeneChips. In order to cope with the rapidly expanding size of GeneChip array datasets in the public domain, we are exploring the use of cloud computing to replicate our experiments on 3' arrays to look at the effect of the location of G-spots (runs of guanines). Cloud computing is a recently introduced high-performance solution that takes advantage of the computational infrastructure of large organisations such as Amazon and Google. We expect that cloud computing will become widely adopted because it enables bioinformaticians to avoid capital expenditure on expensive computing resources and to only pay a cloud computing provider for what is used. Moreover, as well as financial efficiency, cloud computing is an ecologically-friendly technology, it enables efficient data-sharing and we expect it to be faster for development purposes. Here we propose the advantageous use of cloud computing to perform a large data-mining analysis of public domain 3' arrays.


Assuntos
Quadruplex G , Guanina/química , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Biologia Computacional/métodos , DNA/química , Bases de Dados Genéticas , RNA/química
6.
J Integr Bioinform ; 7(2)2010 Feb 19.
Artigo em Inglês | MEDLINE | ID: mdl-20167985

RESUMO

We have used large surveys of Affymetrix GeneChip data in the public domain to conduct a study of antisense expression across diverse conditions. We derive correlations between groups of probes which map uniquely to the same exon in the antisense direction. When there are no probes assigned to an exon in the sense direction we find that many of the antisense groups fail to detect a coherent block of transcription. We find that only a minority of these groups contain coherent blocks of antisense expression suggesting transcription. We also derive correlations between groups of probes which map uniquely to the same exon in both sense and antisense direction. In some of these cases the locations of sense probes overlap with the antisense probes, and the sense and antisense probe intensities are correlated with each other. This configuration suggests the existence of a Natural Antisense Transcript (NAT) pair. We find the majority of such NAT pairs detected by GeneChips are formed by a transcript of an established gene and either an EST or an mRNA. In order to determine the exact antisense regulatory mechanism indicated by the correlation of sense probes with antisense probes, a further investigation is necessary for every particular case of interest. However, the analysis of microarray data has proved to be a good method to reconfirm known NATs, discover new ones, as well as to notice possible problems in the annotation of antisense transcripts.


Assuntos
Elementos Antissenso (Genética)/química , Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos , Algoritmos , Animais , Sequência de Bases , Bases de Dados Genéticas , Etiquetas de Sequências Expressas , Humanos , Dados de Sequência Molecular , RNA Mensageiro/genética
7.
J Integr Bioinform ; 5(2)2008 Aug 25.
Artigo em Inglês | MEDLINE | ID: mdl-20134059

RESUMO

We have developed a computational pipeline to analyse large surveys of Affymetrix GeneChips, for example NCBI's Gene Expression Omnibus. GEO samples data for many organisms, tissues and phenotypes. Because of this experimental diversity, any observed correlations between probe intensities can be associated either with biology that is robust, such as common co-expression, or with systematic biases associated with the GeneChip technology. Our bioinformatics pipeline integrates the mapping of probes to exons, quality control checks on each GeneChip which identifies flaws in hybridization quality, and the mining of correlations in intensities between groups of probes. The output from our pipeline has enabled us to identify systematic biases in GeneChip data. We are also able to use the pipeline as a discovery tool for biology. We have discovered that in the majority of cases, Affymetrix probesets on Human GeneChips do not measure one unique block of transcription. Instead we see numerous examples of outlier probes. Our study has also identified that in a number of probesets the mismatch probes are an informative diagnostic of expression, rather than providing a measure of background contamination. We report evidence for systematic biases in GeneChip technology associated with probe-probe interactions. We also see signatures associated with post-transcriptional processing of RNA, such as alternative polyadenylation.


Assuntos
Genômica/instrumentação , Análise de Sequência com Séries de Oligonucleotídeos/instrumentação , Bases de Dados Genéticas , Éxons , Perfilação da Expressão Gênica , Genômica/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA