Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
J Biomed Inform ; 117: 103733, 2021 05.
Artículo en Inglés | MEDLINE | ID: mdl-33737205

RESUMEN

The context of medical conditions is an important feature to consider when processing clinical narratives. NegEx and its extension ConText became the most well-known rule-based systems that allow determining whether a medical condition is negated, historical or experienced by someone other than the patient in English clinical text. In this paper, we present a French adaptation and enrichment of FastContext which is the most recent, n-trie engine-based implementation of the ConText algorithm. We compiled an extensive list of French lexical cues by automatic and manual translation and enrichment. To evaluate French FastContext, we manually annotated the context of medical conditions present in two types of clinical narratives: (i)death certificates and (ii)electronic health records. Results show good performance across different context values on both types of clinical notes (on average 0.93 and 0.86 F1, respectively). Furthermore, French FastContext outperforms previously reported French systems for negation detection when compared on the same datasets and it is the first implementation of contextual temporality and experiencer identification reported for French. Finally, French FastContext has been implemented within the SIFR Annotator: a publicly accessible Web service to annotate French biomedical text data (http://bioportal.lirmm.fr/annotator). To our knowledge, this is the first implementation of a Web-based ConText-like system in a publicly accessible platform allowing non-natural-language-processing experts to both annotate and contextualize medical conditions in clinical notes.


Asunto(s)
Lenguaje , Procesamiento de Lenguaje Natural , Algoritmos , Registros Electrónicos de Salud , Humanos
2.
J Med Internet Res ; 22(8): e20773, 2020 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-32759101

RESUMEN

BACKGROUND: A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. OBJECTIVE: The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). METHODS: We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. RESULTS: In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. CONCLUSIONS: In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable.


Asunto(s)
Betacoronavirus , Bloqueadores de los Canales de Calcio/uso terapéutico , Infecciones por Coronavirus/tratamiento farmacológico , Hipertensión/complicaciones , Procesamiento de Lenguaje Natural , Neumonía Viral/tratamiento farmacológico , COVID-19 , Infecciones por Coronavirus/complicaciones , Minería de Datos , Registros Electrónicos de Salud , Humanos , Pandemias , Neumonía Viral/complicaciones , SARS-CoV-2 , Factores de Tiempo , Tratamiento Farmacológico de COVID-19
3.
Stud Health Technol Inform ; 290: 56-60, 2022 Jun 06.
Artículo en Inglés | MEDLINE | ID: mdl-35672970

RESUMEN

Primary Immunodeficiencies (PIDs) are associated with more than 400 rare monogenic diseases affecting various biological functions (e.g., development, regulation of the immune response) with a heterogeneous clinical expression (from no symptom to severe manifestations). To better understand PIDs, the ATRACTion project aims to perform a multi-omics analysis of PIDs cases versus a control group patients, including single-cell transcriptomics, epigenetics, proteomics, metabolomics, metagenomics and lipidomics. In this study, our goal is to develop a common data model integrating clinical and omics data, which can be used to obtain standardized information necessary for characterization of PIDs patients and for further systematic analysis. For that purpose, we extend the OMOP Common Data Model (CDM) and propose a multi-omics ATRACTion OMOP-CDM to integrate multi-omics data. This model, available for the community, is customizable for other types of rare diseases (https://framagit.org/imagine-plateforme-bdd/pub-rhu4-atraction).


Asunto(s)
Metabolómica , Proteómica , Humanos , Enfermedades Raras , Transcriptoma
4.
Stud Health Technol Inform ; 294: 834-838, 2022 May 25.
Artículo en Inglés | MEDLINE | ID: mdl-35612221

RESUMEN

INTRODUCTION: The implication of viruses in human cancers, as well as the emergence of next generation sequencing has permitted to investigate further their role and pathophysiology in the development of this disease. One such mechanism is the integration of portions of viral genomes in the human genome, as well as the specific action of viral oncogenes.inding integration sites and preserved oncogenes is still relying on heavy manual intervention. METHODS: We developed an analysis and interpretation pipeline to determine viral insertions. Using data from directed viral capture, the pipeline conducts a crude genotyping phase to select reference viral genomes, identifies chimeric reads, extracts the putative human sequences to locate in the human reference genome, scores and ranks candidate junctions, and exports tabular and visual results. RESULTS: We leverage common bioinformatics tools (bowtie2, samtools, blat), and a dedicated filtering and ranking algorithm, implemented in R, to infer candidate junctions and insertions. Static results (tables, figures) are produced, as well as an interactive interpretation tool developed as a shiny web app. DISCUSSION: We validated this pipeline against published results of HPV, HBV, and AAV2 insertions and show good information retrieval.


Asunto(s)
Biología Computacional , Virus , Algoritmos , Biología Computacional/métodos , Genoma Humano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos
5.
J Am Med Inform Assoc ; 28(3): 504-515, 2021 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-33319904

RESUMEN

BACKGROUND: The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck. OBJECTIVE: To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks. MATERIALS AND METHODS: Based on the literature across multiple researcho fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations. RESULTS: We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows. DISCUSSION: 34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features. CONCLUSION: NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.


Asunto(s)
Procesamiento de Lenguaje Natural , Reproducibilidad de los Resultados , Biología Computacional , Sistemas de Administración de Bases de Datos , Informática Médica
6.
Sci Data ; 7(1): 3, 2020 01 02.
Artículo en Inglés | MEDLINE | ID: mdl-31896797

RESUMEN

Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.


Asunto(s)
Curaduría de Datos , Farmacogenética , Aprendizaje Automático Supervisado , Humanos , PubMed
7.
Stud Health Technol Inform ; 264: 103-107, 2019 Aug 21.
Artículo en Inglés | MEDLINE | ID: mdl-31437894

RESUMEN

A significant part of medical knowledge is stored as unstructured free text. However, clinical narratives are known to contain duplicated sections due to clinicians' copy/paste parts of a former report into a new one. In this study, we aim at evaluating the duplications found within patient records in more than 650,000 French clinical narratives. We adapted a method to identify efficiently duplicated zones in a reasonable time. We evaluated the potential impact of duplications in two use cases: the presence of (i) treatments and/or (ii) relative dates. We identified an average rate of duplication of 33%. We found that 20% of the document contained drugs mentioned only in duplicated zones and that 1.45% of the document contained mentions of relative dates in duplicated zone, that could potentially lead to erroneous interpretation. We suggest the systematic identification and annotation of duplicated zones in clinical narratives for information extraction and temporal-oriented tasks.


Asunto(s)
Almacenamiento y Recuperación de la Información , Registros Electrónicos de Salud , Humanos , Lenguaje , Narración , Procesamiento de Lenguaje Natural
8.
J Mol Diagn ; 20(4): 550-564, 2018 07.
Artículo en Inglés | MEDLINE | ID: mdl-29787863

RESUMEN

Theranostic assays are based on single-gene testing, but the ability of next-generation sequencing (NGS) to interrogate numerous genetic alterations will progressively replace single-gene assays. Although NGS was evaluated to screen for theranostic mutations, its usefulness in clinical practice on large series of samples remains to be demonstrated. NGS performance was assessed following guidelines. TaqMan probes and NGS were compared for their ability to detect EGFR and KRAS mutations, and NGS mutation profiles were analyzed on a large series of non-small-cell lung cancers (n = 1343). The R2 correlation between expected and measured allelic ratio, using commercial samples, was >0.96. Mutation detection threshold was 2% for 10 ng of DNA input. κ Scores for TaqMan versus NGS were 0.99 (95% CI, 0.97-1.00) for EGFR and 0.98 (95% CI, 0.97-1.00) for KRAS after exclusion of rare EGFR (n = 40) and KRAS (n = 60) mutations. NGS identified 693 and 292 mutations in validated and potential oncogenic drivers, respectively. Significant associations were found between EGFR and PI3KCA or CTNNB1 and between KRAS and STK11. Potential oncogenic driver mutations or gene amplifications were more frequent in validated oncogenic driver nonmutated samples. This work is a proof of concept that targeted NGS is accessible in routine screening, including large screening, at reasonable cost. Clinical data should be collected and implemented in specific databases to make molecular data meaningful for direct patients' benefit.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Neoplasias Pulmonares/genética , Humanos , Mutación/genética , Oncogenes , Proteínas Proto-Oncogénicas p21(ras)/genética , Reproducibilidad de los Resultados
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA