Search | VHL Search Portal

1.

COVoc and COVTriage: novel resources to support literature triage.

Caucheteur, Déborah; May Pendlington, Zoë; Roncaglia, Paola; Gobeill, Julien; Mottin, Luc; Matentzoglu, Nicolas; Agosti, Donat; Osumi-Sutherland, David; Parkinson, Helen; Ruch, Patrick.

Bioinformatics ; 39(1)2023 01 01.

Article in English | MEDLINE | ID: mdl-36511598

ABSTRACT

MOTIVATION: Since early 2020, the coronavirus disease 2019 (COVID-19) pandemic has confronted the biomedical community with an unprecedented challenge. The rapid spread of COVID-19 and ease of transmission seen worldwide is due to increased population flow and international trade. Front-line medical care, treatment research and vaccine development also require rapid and informative interpretation of the literature and COVID-19 data produced around the world, with 177 500 papers published between January 2020 and November 2021, i.e. almost 8500 papers per month. To extract knowledge and enable interoperability across resources, we developed the COVID-19 Vocabulary (COVoc), an application ontology related to the research on this pandemic. The main objective of COVoc development was to enable seamless navigation from biomedical literature to core databases and tools of ELIXIR, a European-wide intergovernmental organization for life sciences. RESULTS: This collaborative work provided data integration into SIB Literature services, an application ontology (COVoc) and a triage service named COVTriage and based on annotation processing to search for COVID-related information across pre-defined aspects with daily updates. Thanks to its interoperability potential, COVoc lends itself to wider applications, hopefully through further connections with other novel COVID-19 ontologies as has been established with Coronavirus Infectious Disease Ontology. AVAILABILITY AND IMPLEMENTATION: The data at https://github.com/EBISPOT/covoc and the service at https://candy.hesge.ch/COVTriage.

Subject(s)

COVID-19 , Humans , COVID-19/diagnosis , Triage , Commerce , Internationality

2.

Variomes: a high recall search engine to support the curation of genomic variants.

Pasche, Emilie; Mottaz, Anaïs; Caucheteur, Déborah; Gobeill, Julien; Michel, Pierre-André; Ruch, Patrick.

Bioinformatics ; 38(9): 2595-2601, 2022 04 28.

Article in English | MEDLINE | ID: mdl-35274687

ABSTRACT

MOTIVATION: Identification and interpretation of clinically actionable variants is a critical bottleneck. Searching for evidence in the literature is mandatory according to ASCO/AMP/CAP practice guidelines; however, it is both labor-intensive and error-prone. We developed a system to perform triage of publications relevant to support an evidence-based decision. The system is also able to prioritize variants. Our system searches within pre-annotated collections such as MEDLINE and PubMed Central. RESULTS: We assess the search effectiveness of the system using three different experimental settings: literature triage; variant prioritization and comparison of Variomes with LitVar. Almost two-thirds of the publications returned in the top-5 are relevant for clinical decision-support. Our approach enabled identifying 81.8% of clinically actionable variants in the top-3. Variomes retrieves on average +21.3% more articles than LitVar and returns the same number of results or more results than LitVar for 90% of the queries when tested on a set of 803 queries; thus, establishing a new baseline for searching the literature about variants. AVAILABILITY AND IMPLEMENTATION: Variomes is publicly available at https://candy.hesge.ch/Variomes. Source code is freely available at https://github.com/variomes/sibtm-variomes. SynVar is publicly available at https://goldorak.hesge.ch/synvar. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genomics , Search Engine , Genomics/methods , Genome , PubMed , Software

3.

SIB Literature Services: RESTful customizable search engines in biomedical literature, enriched with automatically mapped biomedical concepts.

Gobeill, Julien; Caucheteur, Déborah; Michel, Pierre-André; Mottin, Luc; Pasche, Emilie; Ruch, Patrick.

Nucleic Acids Res ; 48(W1): W12-W16, 2020 07 02.

Article in English | MEDLINE | ID: mdl-32379317

ABSTRACT

Thanks to recent efforts by the text mining community, biocurators have now access to plenty of good tools and Web interfaces for identifying and visualizing biomedical entities in literature. Yet, many of these systems start with a PubMed query, which is limited by strong Boolean constraints. Some semantic search engines exploit entities for Information Retrieval, and/or deliver relevance-based ranked results. Yet, they are not designed for supporting a specific curation workflow, and allow very limited control on the search process. The Swiss Institute of Bioinformatics Literature Services (SIBiLS) provide personalized Information Retrieval in the biological literature. Indeed, SIBiLS allow fully customizable search in semantically enriched contents, based on keywords and/or mapped biomedical entities from a growing set of standardized and legacy vocabularies. The services have been used and favourably evaluated to assist the curation of genes and gene products, by delivering customized literature triage engines to different curation teams. SIBiLS (https://candy.hesge.ch/SIBiLS) are freely accessible via REST APIs and are ready to empower any curation workflow, built on modern technologies scalable with big data: MongoDB and Elasticsearch. They cover MEDLINE and PubMed Central Open Access enriched by nearly 2 billion of mapped biomedical entities, and are daily updated.

Subject(s)

Data Mining/methods , Search Engine , MEDLINE , Precision Medicine

4.

Combining an Integrated Sensor Array with Machine Learning for the Simultaneous Quantification of Multiple Cations in Aqueous Mixtures.

Gabrieli, Gianmarco; Hu, Rui; Matsumoto, Keiji; Temiz, Yuksel; Bissig, Sacha; Cox, Aaron; Heller, Ralph; López, Antonio; Barroso, Jorge; Kaneda, Kitahiro; Orii, Yasumitsu; Ruch, Patrick W.

Anal Chem ; 93(50): 16853-16861, 2021 12 21.

Article in English | MEDLINE | ID: mdl-34890188

ABSTRACT

The direct quantification of multiple ions in aqueous mixtures is achieved by combining an automated machine learning pipeline with transient potentiometric data obtained from a single miniaturized array of polymeric sensors electrodeposited on a conventional printed circuit board (PCB) substrate. A proof-of-concept system was demonstrated by employing 16 polymeric sensors in combination with features extracted from the transient differential voltages produced by these sensors when transitioning from a reference solution to a test solution, thereby obviating the need for a conventional reference electrode. A tree-based regression model enabled concentrations of various metal cations in pure solutions to be determined in less than 2 min. In a model mixture comprising Al3+, Cu2+, Na+, and Fe3+, the mean relative error was found to depend on the type of ion and varied between 1% for Fe3+ and 44% for Na+ in the concentration range 1-10 mg/L. Overall, a mean relative error of 16% was obtained for quantification of these four ions across a total of 124 tests in different solutions spanning concentrations between 2 and 360 mg/L. These results demonstrate how the analytical capability of a multiselective sensor array can leverage data-driven approaches through training by examples for accelerated testing and can be proposed to complement traditional analytical tools to meet industrial demands, including traceability of chemicals.

Subject(s)

Machine Learning , Cations

5.

Development and tuning of an original search engine for patent libraries in medicinal chemistry.

Pasche, Emilie; Gobeill, Julien; Kreim, Olivier; Oezdemir-Zaech, Fatma; Vachon, Therese; Lovis, Christian; Ruch, Patrick.

BMC Bioinformatics ; 15 Suppl 1: S15, 2014.

Article in English | MEDLINE | ID: mdl-24564220

ABSTRACT

BACKGROUND: The large increase in the size of patent collections has led to the need of efficient search strategies. But the development of advanced text-mining applications dedicated to patents of the biomedical field remains rare, in particular to address the needs of the pharmaceutical & biotech industry, which intensively uses patent libraries for competitive intelligence and drug development. METHODS: We describe here the development of an advanced retrieval engine to search information in patent collections in the field of medicinal chemistry. We investigate and combine different strategies and evaluate their respective impact on the performance of the search engine applied to various search tasks, which covers the putatively most frequent search behaviours of intellectual property officers in medical chemistry: 1) a prior art search task; 2) a technical survey task; and 3) a variant of the technical survey task, sometimes called known-item search task, where a single patent is targeted. RESULTS: The optimal tuning of our engine resulted in a top-precision of 6.76% for the prior art search task, 23.28% for the technical survey task and 46.02% for the variant of the technical survey task. We observed that co-citation boosting was an appropriate strategy to improve prior art search tasks, while IPC classification of queries was improving retrieval effectiveness for technical survey tasks. Surprisingly, the use of the full body of the patent was always detrimental for search effectiveness. It was also observed that normalizing biomedical entities using curated dictionaries had simply no impact on the search tasks we evaluate. The search engine was finally implemented as a web-application within Novartis Pharma. The application is briefly described in the report. CONCLUSIONS: We have presented the development of a search engine dedicated to patent search, based on state of the art methods applied to patent corpora. We have shown that a proper tuning of the system to adapt to the various search tasks clearly increases the effectiveness of the system. We conclude that different search tasks demand different information retrieval engines' settings in order to yield optimal end-user retrieval.

Subject(s)

Chemistry, Pharmaceutical , Patents as Topic , Search Engine/methods , Algorithms , Information Storage and Retrieval , Internet , Small Molecule Libraries

6.

Application of text-mining for updating protein post-translational modification annotation in UniProtKB.

Veuthey, Anne-Lise; Bridge, Alan; Gobeill, Julien; Ruch, Patrick; McEntyre, Johanna R; Bougueleret, Lydie; Xenarios, Ioannis.

BMC Bioinformatics ; 14: 104, 2013 Mar 22.

Article in English | MEDLINE | ID: mdl-23517090

ABSTRACT

BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.

Subject(s)

Data Mining/methods , Databases, Protein , Knowledge Bases , Protein Processing, Post-Translational , Humans , Molecular Sequence Annotation , Proteomics

7.

Utilization of ontology look-up services in information retrieval for biomedical literature.

Vishnyakova, Dina; Pasche, Emilie; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 186: 155-9, 2013.

Article in English | MEDLINE | ID: mdl-23542988

ABSTRACT

With the vast amount of biomedical data we face the necessity to improve information retrieval processes in biomedical domain. The use of biomedical ontologies facilitated the combination of various data sources (e.g. scientific literature, clinical data repository) by increasing the quality of information retrieval and reducing the maintenance efforts. In this context, we developed Ontology Look-up services (OLS), based on NEWT and MeSH vocabularies. Our services were involved in some information retrieval tasks such as gene/disease normalization. The implementation of OLS services significantly accelerated the extraction of particular biomedical facts by structuring and enriching the data context. The results of precision in normalization tasks were boosted on about 20%.

Subject(s)

Abstracting and Indexing/methods , Data Mining/methods , Database Management Systems , Databases, Bibliographic , Medical Subject Headings , Natural Language Processing , Periodicals as Topic , Semantics , User-Computer Interface

8.

Assessing the use of supplementary materials to improve genomic variant discovery.

Pasche, Emilie; Mottaz, Anaïs; Gobeill, Julien; Michel, Pierre-André; Caucheteur, Déborah; Naderi, Nona; Ruch, Patrick.

Database (Oxford) ; 20232023 03 31.

Article in English | MEDLINE | ID: mdl-37002680

ABSTRACT

The curation of genomic variants requires collecting evidence not only in variant knowledge bases but also in the literature. However, some variants result in no match when searched in the scientific literature. Indeed, it has been reported that a significant subset of information related to genomic variants are not reported in the full text, but only in the supplementary materials associated with a publication. In the study, we present an evaluation of the use of supplementary data (SD) to improve the retrieval of relevant scientific publications for variant curation. Our experiments show that searching SD enables to significantly increase the volume of documents retrieved for a variant, thus reducing by â¼63% the number of variants for which no match is found in the scientific literature. SD thus represent a paramount source of information for curating variants of unknown significance and should receive more attention by global research infrastructures, which maintain literature search engines. Database URL https://www.expasy.org/resources/variomes.

Subject(s)

Genomics , Search Engine , Databases, Factual

9.

Multilingual RECIST classification of radiology reports using supervised learning.

Mottin, Luc; Goldman, Jean-Philippe; Jäggli, Christoph; Achermann, Rita; Gobeill, Julien; Knafou, Julien; Ehrsam, Julien; Wicky, Alexandre; Gérard, Camille L; Schwenk, Tanja; Charrier, Mélinda; Tsantoulis, Petros; Lovis, Christian; Leichtle, Alexander; Kiessling, Michael K; Michielin, Olivier; Pradervand, Sylvain; Foufi, Vasiliki; Ruch, Patrick.

Front Digit Health ; 5: 1195017, 2023.

Article in English | MEDLINE | ID: mdl-37388252

ABSTRACT

Objectives: The objective of this study is the exploration of Artificial Intelligence and Natural Language Processing techniques to support the automatic assignment of the four Response Evaluation Criteria in Solid Tumors (RECIST) scales based on radiology reports. We also aim at evaluating how languages and institutional specificities of Swiss teaching hospitals are likely to affect the quality of the classification in French and German languages. Methods: In our approach, 7 machine learning methods were evaluated to establish a strong baseline. Then, robust models were built, fine-tuned according to the language (French and German), and compared with the expert annotation. Results: The best strategies yield average F1-scores of 90% and 86% respectively for the 2-classes (Progressive/Non-progressive) and the 4-classes (Progressive Disease, Stable Disease, Partial Response, Complete Response) RECIST classification tasks. Conclusions: These results are competitive with the manual labeling as measured by Matthew's correlation coefficient and Cohen's Kappa (79% and 76%). On this basis, we confirm the capacity of specific models to generalize on new unseen data and we assess the impact of using Pre-trained Language Models (PLMs) on the accuracy of the classifiers.

10.

Building a transnational biosurveillance network using semantic web technologies: requirements, design, and preliminary evaluation.

Teodoro, Douglas; Pasche, Emilie; Gobeill, Julien; Emonet, Stéphane; Ruch, Patrick; Lovis, Christian.

J Med Internet Res ; 14(3): e73, 2012 May 29.

Article in English | MEDLINE | ID: mdl-22642960

ABSTRACT

BACKGROUND: Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial resistance surveillance systems was identified as one of the causes of increasing resistance, due to the lag time between new resistances and alerts to care providers. Several initiatives to track drug resistance evolution have been developed. However, no effective real-time and source-independent antimicrobial resistance monitoring system is available publicly. OBJECTIVE: To design and implement an architecture that can provide real-time and source-independent antimicrobial resistance monitoring to support transnational resistance surveillance. In particular, we investigated the use of a Semantic Web-based model to foster integration and interoperability of interinstitutional and cross-border microbiology laboratory databases. METHODS: Following the agile software development methodology, we derived the main requirements needed for effective antimicrobial resistance monitoring, from which we proposed a decentralized monitoring architecture based on the Semantic Web stack. The architecture uses an ontology-driven approach to promote the integration of a network of sentinel hospitals or laboratories. Local databases are wrapped into semantic data repositories that automatically expose local computing-formalized laboratory information in the Web. A central source mediator, based on local reasoning, coordinates the access to the semantic end points. On the user side, a user-friendly Web interface provides access and graphical visualization to the integrated views. RESULTS: We designed and implemented the online Antimicrobial Resistance Trend Monitoring System (ARTEMIS) in a pilot network of seven European health care institutions sharing 70+ million triples of information about drug resistance and consumption. Evaluation of the computing performance of the mediator demonstrated that, on average, query response time was a few seconds (mean 4.3, SD 0.1 × 10(2) seconds). Clinical pertinence assessment showed that resistance trends automatically calculated by ARTEMIS had a strong positive correlation with the European Antimicrobial Resistance Surveillance Network (EARS-Net) (ρ = .86, P < .001) and the Sentinel Surveillance of Antibiotic Resistance in Switzerland (SEARCH) (ρ = .84, P < .001) systems. Furthermore, mean resistance rates extracted by ARTEMIS were not significantly different from those of either EARS-Net (∆ = ±0.130; 95% confidence interval -0 to 0.030; P < .001) or SEARCH (∆ = ±0.042; 95% confidence interval -0.004 to 0.028; P = .004). CONCLUSIONS: We introduce a distributed monitoring architecture that can be used to build transnational antimicrobial resistance surveillance networks. Results indicated that the Semantic Web-based approach provided an efficient and reliable solution for development of eHealth architectures that enable online antimicrobial resistance monitoring from heterogeneous data sources. In future, we expect that more health care institutions can join the ARTEMIS network so that it can provide a large European and wider biosurveillance network that can be used to detect emerging bacterial resistance in a multinational context and support public health actions.

Subject(s)

International Cooperation , Internet , Population Surveillance , Computer Simulation , Drug Resistance, Microbial , Software , User-Computer Interface

11.

Pathogens and gene product normalization in the biomedical literature.

Vishnyakova, Dina; Pasche, Emilie; Teodoro, Douglas; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 174: 89-93, 2012.

Article in English | MEDLINE | ID: mdl-22491118

ABSTRACT

We present a new approach for pathogens and gene product normalization in the biomedical literature. The idea of this approach was motivated by needs such as literature curation, in particular applied to the field of infectious diseases thus, variants of bacterial species (S. aureus, Staphyloccocus aureus, ...) and their gene products (protein ArsC, Arsenical pump modifier, Arsenate reductase, ...). Our approach is based on the use of an Ontology Look-up Service, a Gene Ontology Categorizer (GOCat) and Gene Normalization methods. In the pathogen detection task the use of OLS disambiguates found pathogen names. GOCat results are incorporated into overall score system to support and to confirm the decisionmaking in normalization process of pathogens and their genomes. The evaluation was done on two test sets of BioCreativeIII benchmark: gold standard of manual curation (50 articles) and silver standard (507 articles) curated by collective results of BCIII participants. For the cross-species GN we achieved the precision of 46% for silver and 27% for gold sets. Pathogen normalization results showed 95% of precision and 93% of recall. The impact of GOCat explicitly improves results of pathogen and gene normalization, basically confirming identified pathogens and boosting correct gene identifiers on the top of the results' list ranked by confidence. A correct identification of the pathogen is able to improve significantly normalization effectiveness and to solve the disambiguation problem of genes.

Subject(s)

Bacteria/classification , Bacterial Proteins/classification , Data Mining/methods , Periodicals as Topic , Vocabulary, Controlled , Humans

12.

An advanced search engine for patent analytics in medicinal chemistry.

Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnykova, Dina; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 180: 204-9, 2012.

Article in English | MEDLINE | ID: mdl-22874181

ABSTRACT

Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.

Subject(s)

Chemistry, Pharmaceutical/methods , Data Mining/methods , Database Management Systems , Databases, Pharmaceutical , Internet , Patents as Topic , Search Engine/methods , User-Computer Interface

13.

Classification and prioritization of biomedical literature for the comparative toxicogenomics database.

Vishnyakova, Dina; Pasche, Emilie; Gobeill, Julien; Gaudinat, Arnaud; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 180: 210-4, 2012.

Article in English | MEDLINE | ID: mdl-22874182

ABSTRACT

We present a new approach to perform biomedical documents classification and prioritization for the Comparative Toxicogenomics Database (CTD). This approach is motivated by needs such as literature curation, in particular applied to the human health environment domain. The unique integration of chemical, genes/proteins and disease data in the biomedical literature may advance the identification of exposure and disease biomarkers, mechanisms of chemical actions, and the complex aetiologies of chronic diseases. Our approach aims to assist biomedical researchers when searching for relevant articles for CTD. The task is functionally defined as a binary classification task, where selected articles must also be ranked by order of relevance. We design a SVM classifier, which combines three main feature sets: an information retrieval system (EAGLi), a biomedical named-entity recognizer (MeSH term extraction), a gene normalization (GN) service (NormaGene) and an ad-hoc keyword recognizer for diseases and chemicals. The evaluation of the gene identification module was done on BioCreativeIII test data. Disease normalization is achieved with 95% precision and 93% of recall. The evaluation of the classification was done on the corpus provided by BioCreative organizers in 2012. The approach showed promising performance on the test data.

Subject(s)

Abstracting and Indexing/methods , Data Mining/methods , Databases, Chemical , Databases, Genetic , Drug-Related Side Effects and Adverse Reactions/classification , Periodicals as Topic/classification , Toxicogenetics/methods , Database Management Systems , Humans , User-Computer Interface

14.

A user-friendly tool for medical-related patent retrieval.

Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnyakova, Dina; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 174: 121-5, 2012.

Article in English | MEDLINE | ID: mdl-22491124

ABSTRACT

Health-related information retrieval is complicated by the variety of nomenclatures available to name entities, since different communities of users will use different ways to name a same entity. We present in this report the development and evaluation of a user-friendly interactive Web application aiming at facilitating health-related patent search. Our tool, called TWINC, relies on a search engine tuned during several patent retrieval competitions, enhanced with intelligent interaction modules, such as chemical query, normalization and expansion. While the functionality of related article search showed promising performances, the ad hoc search results in fairly contrasted results. Nonetheless, TWINC performed well during the PatOlympics competition and was appreciated by intellectual property experts. This result should be balanced by the limited evaluation sample. We can also assume that it can be customized to be applied in corporate search environments to process domain and company-specific vocabularies, including non-English literature and patents reports.

Subject(s)

Information Storage and Retrieval/methods , Internet , Patents as Topic , Search Engine/methods , User-Computer Interface , Artificial Intelligence , Humans

15.

Analyzing the Information Content of Text-Based Files in Supplementary Materials of Biomedical Literature.

Naderi, Nona; Mottaz, Anaïs; Teodoro, Douglas; Ruch, Patrick.

Stud Health Technol Inform ; 294: 876-877, 2022 May 25.

Article in English | MEDLINE | ID: mdl-35612233

ABSTRACT

We present an analysis of supplementary materials of PubMed Central (PMC) articles and show their importance in indexing and searching biomedical literature, in particular for the emerging genomic medicine field. On a subset of articles from PubMed Central, we use text mining methods to extract MeSH terms from abstracts, full texts, and text-based supplementary materials. We find that the recall of MeSH annotations increases by about 5.9 percentage points (+20% on relative percentage) when considering supplementary materials compared to using only abstracts. We further compare the supplementary material annotations with full-text annotations and we find out that the recall of MeSH terms increases by 1.5 percentage point (+3% on relative percentage). Additionally, we analyze genetic variant mentions in abstracts and full-texts and compare them with mentions found in supplementary text-based files. We find that the majority (about 99%) of variants are found in text-based supplementary files. In conclusion, we suggest that supplementary data should receive more attention from the information retrieval community, in particular in life and health sciences.

Subject(s)

Medical Subject Headings , Text Messaging , Data Mining/methods , PubMed , Records

16.

Designing an Optimal Expansion Method to Improve the Recall of a Genomic Variant Curation-Support Service.

Mottaz, Anaïs; Pasche, Emilie; Michel, Pierre-André; Mottin, Luc; Teodoro, Douglas; Ruch, Patrick.

Stud Health Technol Inform ; 294: 839-843, 2022 May 25.

Article in English | MEDLINE | ID: mdl-35612222

ABSTRACT

The importance of genomic data for health is rapidly growing but accessing and gathering information about variants from different sources is hindered by highly heterogeneous representations of variants, as outlined by clinical associations (AMP/ASCO/CAP) in their recommendations. To enable a smooth and effective retrieval of variant-containing documents from different resources, we developed a tool (https://goldorak.hesge.ch/synvar/) that generates for any given SNP - including variant not present in existing databases - its corresponding description at the genome, transcript and protein levels. It provides variant descriptions in the HGVS format as well as in many non-standard formats found in the literature along with database identifiers. We present the SynVar service and evaluate its impact on the recall of a genomic variant curation-support service. Using SynVar to search variants in the literature enables to increase the recall by +133.8% without a strong impact on precision (i.e. 93%).

Subject(s)

Genomics , Databases, Factual

17.

Classification of Oncology Treatment Responses from French Radiology Reports with Supervised Machine Learning.

Goldman, Jean-Philippe; Mottin, Luc; Zaghir, Jamil; Keszthelyi, Daniel; Lokaj, Belinda; Turbé, Hugues; Gobeil, Julien; Ruch, Patrick; Ehrsam, Julien; Lovis, Christian.

Stud Health Technol Inform ; 294: 849-853, 2022 May 25.

Article in English | MEDLINE | ID: mdl-35612224

ABSTRACT

The present study shows first attempts to automatically classify oncology treatment responses on the basis of the textual conclusion sections of radiology reports according to the RECIST classification. After a robust and extended manual annotation of 543 conclusion sections (5-to-50-word long), and after the training of several machine learning techniques (from traditional machine learning to deep learning), the best results show an accuracy score of 0.90 for a two-class classification (non-progressive vs. progressive disease) and of 0.82 for a four-class classification (complete response, partial response, stable disease, progressive disease) both with Logistic Regression approach. Some innovative solutions are further suggested to improve these scores in the future.

Subject(s)

Radiology , Machine Learning , Natural Language Processing , Radiography , Research Report , Supervised Machine Learning

18.

The gene normalization task in BioCreative III.

Lu, Zhiyong; Kao, Hung-Yu; Wei, Chih-Hsuan; Huang, Minlie; Liu, Jingchen; Kuo, Cheng-Ju; Hsu, Chun-Nan; Tsai, Richard Tzong-Han; Dai, Hong-Jie; Okazaki, Naoaki; Cho, Han-Cheol; Gerner, Martin; Solt, Illes; Agarwal, Shashank; Liu, Feifan; Vishnyakova, Dina; Ruch, Patrick; Romacker, Martin; Rinaldi, Fabio; Bhattacharya, Sanmitra; Srinivasan, Padmini; Liu, Hongfang; Torii, Manabu; Matos, Sergio; Campos, David; Verspoor, Karin; Livingston, Kevin M; Wilbur, W John.

BMC Bioinformatics ; 12 Suppl 8: S2, 2011 Oct 03.

Article in English | MEDLINE | ID: mdl-22151901

ABSTRACT

BACKGROUND: We report the Gene Normalization (GN) challenge in BioCreative III where participating teams were asked to return a ranked list of identifiers of the genes detected in full-text articles. For training, 32 fully and 500 partially annotated articles were prepared. A total of 507 articles were selected as the test set. Due to the high annotation cost, it was not feasible to obtain gold-standard human annotations for all test articles. Instead, we developed an Expectation Maximization (EM) algorithm approach for choosing a small number of test articles for manual annotation that were most capable of differentiating team performance. Moreover, the same algorithm was subsequently used for inferring ground truth based solely on team submissions. We report team performance on both gold standard and inferred ground truth using a newly proposed metric called Threshold Average Precision (TAP-k). RESULTS: We received a total of 37 runs from 14 different teams for the task. When evaluated using the gold-standard annotations of the 50 articles, the highest TAP-k scores were 0.3297 (k=5), 0.3538 (k=10), and 0.3535 (k=20), respectively. Higher TAP-k scores of 0.4916 (k=5, 10, 20) were observed when evaluated using the inferred ground truth over the full test set. When combining team results using machine learning, the best composite system achieved TAP-k scores of 0.3707 (k=5), 0.4311 (k=10), and 0.4477 (k=20) on the gold standard, representing improvements of 12.4%, 21.8%, and 26.6% over the best team results, respectively. CONCLUSIONS: By using full text and being species non-specific, the GN task in BioCreative III has moved closer to a real literature curation task than similar tasks in the past and presents additional challenges for the text mining community, as revealed in the overall team results. By evaluating teams using the gold standard, we show that the EM algorithm allows team submissions to be differentiated while keeping the manual annotation effort feasible. Using the inferred ground truth we show measures of comparative performance between teams. Finally, by comparing team rankings on gold standard vs. inferred ground truth, we further demonstrate that the inferred ground truth is as effective as the gold standard for detecting good team performance.

Subject(s)

Algorithms , Data Mining/methods , Genes , Animals , Data Mining/standards , Humans , National Library of Medicine (U.S.) , Periodicals as Topic , United States

19.

Interoperability driven integration of biomedical data sources.

Teodoro, Douglas; Choquet, Rémy; Schober, Daniel; Mels, Giovanni; Pasche, Emilie; Ruch, Patrick; Lovis, Christian.

Stud Health Technol Inform ; 169: 185-9, 2011.

Article in English | MEDLINE | ID: mdl-21893739

ABSTRACT

In this paper, we introduce a data integration methodology that promotes technical, syntactic and semantic interoperability for operational healthcare data sources. ETL processes provide access to different operational databases at the technical level. Furthermore, data instances have they syntax aligned according to biomedical terminologies using natural language processing. Finally, semantic web technologies are used to ensure common meaning and to provide ubiquitous access to the data. The system's performance and solvability assessments were carried out using clinical questions against seven healthcare institutions distributed across Europe. The architecture managed to provide interoperability within the limited heterogeneous grid of hospitals. Preliminary scalability result tests are provided.

Subject(s)

Data Collection/methods , Information Storage and Retrieval/methods , Medical Informatics/methods , Systems Integration , Cross Infection/epidemiology , Cross Infection/microbiology , Database Management Systems , Databases, Factual , Europe , Humans , Internet , Natural Language Processing , Programming Languages , Semantics , Terminology as Topic , Vocabulary, Controlled

20.

Using multimodal mining to drive clinical guidelines development.

Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Vishnyakova, Dina; Gaudinat, Arnaud; Ruch, Patrick; Lovis, Christian.

Stud Health Technol Inform ; 169: 477-81, 2011.

Article in English | MEDLINE | ID: mdl-21893795

ABSTRACT

We present exploratory investigations of multimodal mining to help designing clinical guidelines for antibiotherapy. Our approach is based on the assumption that combining various sources of data, such as the literature, a clinical datawarehouse, as well as information regarding costs will result in better recommendations. Compared to our baseline recommendation system based on a question-answering engine built on top of PubMed, an improvement of +16% is observed when clinical data (i.e. resistance profiles) are injected into the model. In complement to PubMed, an alternative search strategy is reported, which is significantly improved by the use of the combined multimodal approach. These results suggest that combining literature-based discovery with structured data mining can significantly improve effectiveness of decision-support systems for authors of clinical practice guidelines.

Subject(s)

Anti-Bacterial Agents/therapeutic use , Practice Guidelines as Topic , Statistics as Topic/methods , Algorithms , Anti-Bacterial Agents/economics , Computer Systems , Decision Support Systems, Clinical , Drug Costs , Humans , National Institutes of Health (U.S.) , PubMed , Staphylococcus aureus/metabolism , Staphylococcus epidermidis/metabolism , United States

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL