Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 8 de 8
Filter
1.
Stud Health Technol Inform ; 290: 56-60, 2022 Jun 06.
Article in English | MEDLINE | ID: mdl-35672970

ABSTRACT

Primary Immunodeficiencies (PIDs) are associated with more than 400 rare monogenic diseases affecting various biological functions (e.g., development, regulation of the immune response) with a heterogeneous clinical expression (from no symptom to severe manifestations). To better understand PIDs, the ATRACTion project aims to perform a multi-omics analysis of PIDs cases versus a control group patients, including single-cell transcriptomics, epigenetics, proteomics, metabolomics, metagenomics and lipidomics. In this study, our goal is to develop a common data model integrating clinical and omics data, which can be used to obtain standardized information necessary for characterization of PIDs patients and for further systematic analysis. For that purpose, we extend the OMOP Common Data Model (CDM) and propose a multi-omics ATRACTion OMOP-CDM to integrate multi-omics data. This model, available for the community, is customizable for other types of rare diseases (https://framagit.org/imagine-plateforme-bdd/pub-rhu4-atraction).


Subject(s)
Metabolomics , Proteomics , Humans , Rare Diseases , Transcriptome
2.
Stud Health Technol Inform ; 294: 834-838, 2022 May 25.
Article in English | MEDLINE | ID: mdl-35612221

ABSTRACT

INTRODUCTION: The implication of viruses in human cancers, as well as the emergence of next generation sequencing has permitted to investigate further their role and pathophysiology in the development of this disease. One such mechanism is the integration of portions of viral genomes in the human genome, as well as the specific action of viral oncogenes.inding integration sites and preserved oncogenes is still relying on heavy manual intervention. METHODS: We developed an analysis and interpretation pipeline to determine viral insertions. Using data from directed viral capture, the pipeline conducts a crude genotyping phase to select reference viral genomes, identifies chimeric reads, extracts the putative human sequences to locate in the human reference genome, scores and ranks candidate junctions, and exports tabular and visual results. RESULTS: We leverage common bioinformatics tools (bowtie2, samtools, blat), and a dedicated filtering and ranking algorithm, implemented in R, to infer candidate junctions and insertions. Static results (tables, figures) are produced, as well as an interactive interpretation tool developed as a shiny web app. DISCUSSION: We validated this pipeline against published results of HPV, HBV, and AAV2 insertions and show good information retrieval.


Subject(s)
Computational Biology , Viruses , Algorithms , Computational Biology/methods , Genome, Human/genetics , High-Throughput Nucleotide Sequencing/methods , Humans
3.
J Biomed Inform ; 117: 103733, 2021 05.
Article in English | MEDLINE | ID: mdl-33737205

ABSTRACT

The context of medical conditions is an important feature to consider when processing clinical narratives. NegEx and its extension ConText became the most well-known rule-based systems that allow determining whether a medical condition is negated, historical or experienced by someone other than the patient in English clinical text. In this paper, we present a French adaptation and enrichment of FastContext which is the most recent, n-trie engine-based implementation of the ConText algorithm. We compiled an extensive list of French lexical cues by automatic and manual translation and enrichment. To evaluate French FastContext, we manually annotated the context of medical conditions present in two types of clinical narratives: (i)death certificates and (ii)electronic health records. Results show good performance across different context values on both types of clinical notes (on average 0.93 and 0.86 F1, respectively). Furthermore, French FastContext outperforms previously reported French systems for negation detection when compared on the same datasets and it is the first implementation of contextual temporality and experiencer identification reported for French. Finally, French FastContext has been implemented within the SIFR Annotator: a publicly accessible Web service to annotate French biomedical text data (http://bioportal.lirmm.fr/annotator). To our knowledge, this is the first implementation of a Web-based ConText-like system in a publicly accessible platform allowing non-natural-language-processing experts to both annotate and contextualize medical conditions in clinical notes.


Subject(s)
Language , Natural Language Processing , Algorithms , Electronic Health Records , Humans
4.
J Am Med Inform Assoc ; 28(3): 504-515, 2021 03 01.
Article in English | MEDLINE | ID: mdl-33319904

ABSTRACT

BACKGROUND: The increasing complexity of data streams and computational processes in modern clinical health information systems makes reproducibility challenging. Clinical natural language processing (NLP) pipelines are routinely leveraged for the secondary use of data. Workflow management systems (WMS) have been widely used in bioinformatics to handle the reproducibility bottleneck. OBJECTIVE: To evaluate if WMS and other bioinformatics practices could impact the reproducibility of clinical NLP frameworks. MATERIALS AND METHODS: Based on the literature across multiple researcho fields (NLP, bioinformatics and clinical informatics) we selected articles which (1) review reproducibility practices and (2) highlight a set of rules or guidelines to ensure tool or pipeline reproducibility. We aggregate insight from the literature to define reproducibility recommendations. Finally, we assess the compliance of 7 NLP frameworks to the recommendations. RESULTS: We identified 40 reproducibility features from 8 selected articles. Frameworks based on WMS match more than 50% of features (26 features for LAPPS Grid, 22 features for OpenMinted) compared to 18 features for current clinical NLP framework (cTakes, CLAMP) and 17 features for GATE, ScispaCy, and Textflows. DISCUSSION: 34 recommendations are endorsed by at least 2 articles from our selection. Overall, 15 features were adopted by every NLP Framework. Nevertheless, frameworks based on WMS had a better compliance with the features. CONCLUSION: NLP frameworks could benefit from lessons learned from the bioinformatics field (eg, public repositories of curated tools and workflows or use of containers for shareability) to enhance the reproducibility in a clinical setting.


Subject(s)
Natural Language Processing , Reproducibility of Results , Computational Biology , Database Management Systems , Medical Informatics
5.
J Med Internet Res ; 22(8): e20773, 2020 Aug 14.
Article in English | MEDLINE | ID: mdl-32759101

ABSTRACT

BACKGROUND: A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. OBJECTIVE: The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). METHODS: We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. RESULTS: In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. CONCLUSIONS: In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable.


Subject(s)
Betacoronavirus , Calcium Channel Blockers/therapeutic use , Coronavirus Infections/drug therapy , Hypertension/complications , Natural Language Processing , Pneumonia, Viral/drug therapy , COVID-19 , Coronavirus Infections/complications , Data Mining , Electronic Health Records , Humans , Pandemics , Pneumonia, Viral/complications , SARS-CoV-2 , Time Factors , COVID-19 Drug Treatment
6.
Sci Data ; 7(1): 3, 2020 01 02.
Article in English | MEDLINE | ID: mdl-31896797

ABSTRACT

Pharmacogenomics (PGx) studies how individual gene variations impact drug response phenotypes, which makes PGx-related knowledge a key component towards precision medicine. A significant part of the state-of-the-art knowledge in PGx is accumulated in scientific publications, where it is hardly reusable by humans or software. Natural language processing techniques have been developed to guide experts who curate this amount of knowledge. But existing works are limited by the absence of a high quality annotated corpus focusing on PGx domain. In particular, this absence restricts the use of supervised machine learning. This article introduces PGxCorpus, a manually annotated corpus, designed to fill this gap and to enable the automatic extraction of PGx relationships from text. It comprises 945 sentences from 911 PubMed abstracts, annotated with PGx entities of interest (mainly gene variations, genes, drugs and phenotypes), and relationships between those. In this article, we present the corpus itself, its construction and a baseline experiment that illustrates how it may be leveraged to synthesize and summarize PGx knowledge.


Subject(s)
Data Curation , Pharmacogenetics , Supervised Machine Learning , Humans , PubMed
7.
Stud Health Technol Inform ; 264: 103-107, 2019 Aug 21.
Article in English | MEDLINE | ID: mdl-31437894

ABSTRACT

A significant part of medical knowledge is stored as unstructured free text. However, clinical narratives are known to contain duplicated sections due to clinicians' copy/paste parts of a former report into a new one. In this study, we aim at evaluating the duplications found within patient records in more than 650,000 French clinical narratives. We adapted a method to identify efficiently duplicated zones in a reasonable time. We evaluated the potential impact of duplications in two use cases: the presence of (i) treatments and/or (ii) relative dates. We identified an average rate of duplication of 33%. We found that 20% of the document contained drugs mentioned only in duplicated zones and that 1.45% of the document contained mentions of relative dates in duplicated zone, that could potentially lead to erroneous interpretation. We suggest the systematic identification and annotation of duplicated zones in clinical narratives for information extraction and temporal-oriented tasks.


Subject(s)
Information Storage and Retrieval , Electronic Health Records , Humans , Language , Narration , Natural Language Processing
8.
J Mol Diagn ; 20(4): 550-564, 2018 07.
Article in English | MEDLINE | ID: mdl-29787863

ABSTRACT

Theranostic assays are based on single-gene testing, but the ability of next-generation sequencing (NGS) to interrogate numerous genetic alterations will progressively replace single-gene assays. Although NGS was evaluated to screen for theranostic mutations, its usefulness in clinical practice on large series of samples remains to be demonstrated. NGS performance was assessed following guidelines. TaqMan probes and NGS were compared for their ability to detect EGFR and KRAS mutations, and NGS mutation profiles were analyzed on a large series of non-small-cell lung cancers (n = 1343). The R2 correlation between expected and measured allelic ratio, using commercial samples, was >0.96. Mutation detection threshold was 2% for 10 ng of DNA input. κ Scores for TaqMan versus NGS were 0.99 (95% CI, 0.97-1.00) for EGFR and 0.98 (95% CI, 0.97-1.00) for KRAS after exclusion of rare EGFR (n = 40) and KRAS (n = 60) mutations. NGS identified 693 and 292 mutations in validated and potential oncogenic drivers, respectively. Significant associations were found between EGFR and PI3KCA or CTNNB1 and between KRAS and STK11. Potential oncogenic driver mutations or gene amplifications were more frequent in validated oncogenic driver nonmutated samples. This work is a proof of concept that targeted NGS is accessible in routine screening, including large screening, at reasonable cost. Clinical data should be collected and implemented in specific databases to make molecular data meaningful for direct patients' benefit.


Subject(s)
Carcinoma, Non-Small-Cell Lung/genetics , High-Throughput Nucleotide Sequencing/methods , Lung Neoplasms/genetics , Humans , Mutation/genetics , Oncogenes , Proto-Oncogene Proteins p21(ras)/genetics , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL
...