Pesquisa | Biblioteca Virtual em Saúde

1.

PRIMIS: Privacy-preserving medical image sharing via deep sparsifying transform learning with obfuscation.

Shiri, Isaac; Razeghi, Behrooz; Ferdowsi, Sohrab; Salimi, Yazdan; Gündüz, Deniz; Teodoro, Douglas; Voloshynovskiy, Slava; Zaidi, Habib.

J Biomed Inform ; 150: 104583, 2024 02.

Artigo em Inglês | MEDLINE | ID: mdl-38191010

RESUMO

OBJECTIVE: The primary objective of our study is to address the challenge of confidentially sharing medical images across different centers. This is often a critical necessity in both clinical and research environments, yet restrictions typically exist due to privacy concerns. Our aim is to design a privacy-preserving data-sharing mechanism that allows medical images to be stored as encoded and obfuscated representations in the public domain without revealing any useful or recoverable content from the images. In tandem, we aim to provide authorized users with compact private keys that could be used to reconstruct the corresponding images. METHOD: Our approach involves utilizing a neural auto-encoder. The convolutional filter outputs are passed through sparsifying transformations to produce multiple compact codes. Each code is responsible for reconstructing different attributes of the image. The key privacy-preserving element in this process is obfuscation through the use of specific pseudo-random noise. When applied to the codes, it becomes computationally infeasible for an attacker to guess the correct representation for all the codes, thereby preserving the privacy of the images. RESULTS: The proposed framework was implemented and evaluated using chest X-ray images for different medical image analysis tasks, including classification, segmentation, and texture analysis. Additionally, we thoroughly assessed the robustness of our method against various attacks using both supervised and unsupervised algorithms. CONCLUSION: This study provides a novel, optimized, and privacy-assured data-sharing mechanism for medical images, enabling multi-party sharing in a secure manner. While we have demonstrated its effectiveness with chest X-ray images, the mechanism can be utilized in other medical images modalities as well.

Assuntos

Algoritmos , Privacidade , Disseminação de Informação

2.

Transformer Performance for Chemical Reactions: Analysis of Different Predictive and Evaluation Scenarios.

Jaume-Santero, Fernando; Bornet, Alban; Valery, Alain; Naderi, Nona; Vicente Alvarez, David; Proios, Dimitrios; Yazdani, Anthony; Bournez, Colin; Fessard, Thomas; Teodoro, Douglas.

J Chem Inf Model ; 63(7): 1914-1924, 2023 04 10.

Artigo em Inglês | MEDLINE | ID: mdl-36952584

RESUMO

The prediction of chemical reaction pathways has been accelerated by the development of novel machine learning architectures based on the deep learning paradigm. In this context, deep neural networks initially designed for language translation have been used to accurately predict a wide range of chemical reactions. Among models suited for the task of language translation, the recently introduced molecular transformer reached impressive performance in terms of forward-synthesis and retrosynthesis predictions. In this study, we first present an analysis of the performance of transformer models for product, reactant, and reagent prediction tasks under different scenarios of data availability and data augmentation. We find that the impact of data augmentation depends on the prediction task and on the metric used to evaluate the model performance. Second, we probe the contribution of different combinations of input formats, tokenization schemes, and embedding strategies to model performance. We find that less stable input settings generally lead to better performance. Lastly, we validate the superiority of round-trip accuracy over simpler evaluation metrics, such as top-k accuracy, using a committee of human experts and show a strong agreement for predictions that pass the round-trip test. This demonstrates the usefulness of more elaborate metrics in complex predictive scenarios and highlights the limitations of direct comparisons to a predefined database, which may include a limited number of chemical reaction pathways.

Assuntos

Benchmarking , Fontes de Energia Elétrica , Humanos , Bases de Dados Factuais , Aprendizado de Máquina , Redes Neurais de Computação

3.

Utilizing Artificial Intelligence to Manage COVID-19 Scientific Evidence Torrent with Risklick AI: A Critical Tool for Pharmacology and Therapy Development.

Haas, Quentin; Alvarez, David Vicente; Borissov, Nikolay; Ferdowsi, Sohrab; von Meyenn, Leonhard; Trelle, Sven; Teodoro, Douglas; Amini, Poorya.

Pharmacology ; 106(5-6): 244-253, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33910199

RESUMO

INTRODUCTION: The SARS-CoV-2 pandemic has led to one of the most critical and boundless waves of publications in the history of modern science. The necessity to find and pursue relevant information and quantify its quality is broadly acknowledged. Modern information retrieval techniques combined with artificial intelligence (AI) appear as one of the key strategies for COVID-19 living evidence management. Nevertheless, most AI projects that retrieve COVID-19 literature still require manual tasks. METHODS: In this context, we pre-sent a novel, automated search platform, called Risklick AI, which aims to automatically gather COVID-19 scientific evidence and enables scientists, policy makers, and healthcare professionals to find the most relevant information tailored to their question of interest in real time. RESULTS: Here, we compare the capacity of Risklick AI to find COVID-19-related clinical trials and scientific publications in comparison with clinicaltrials.gov and PubMed in the field of pharmacology and clinical intervention. DISCUSSION: The results demonstrate that Risklick AI is able to find COVID-19 references more effectively, both in terms of precision and recall, compared to the baseline platforms. Hence, Risklick AI could become a useful alternative assistant to scientists fighting the COVID-19 pandemic.

Assuntos

Inteligência Artificial/tendências , COVID-19/terapia , Interpretação Estatística de Dados , Desenvolvimento de Medicamentos/tendências , Medicina Baseada em Evidências/tendências , Farmacologia/tendências , Inteligência Artificial/estatística & dados numéricos , COVID-19/diagnóstico , COVID-19/epidemiologia , Ensaios Clínicos como Assunto/estatística & dados numéricos , Desenvolvimento de Medicamentos/estatística & dados numéricos , Medicina Baseada em Evidências/estatística & dados numéricos , Humanos , Farmacologia/estatística & dados numéricos , Sistema de Registros

4.

Information Retrieval in an Infodemic: The Case of COVID-19 Publications.

Teodoro, Douglas; Ferdowsi, Sohrab; Borissov, Nikolay; Kashani, Elham; Vicente Alvarez, David; Copara, Jenny; Gouareb, Racha; Naderi, Nona; Amini, Poorya.

J Med Internet Res ; 23(9): e30161, 2021 09 17.

Artigo em Inglês | MEDLINE | ID: mdl-34375298

RESUMO

BACKGROUND: The COVID-19 global health crisis has led to an exponential surge in published scientific literature. In an attempt to tackle the pandemic, extremely large COVID-19-related corpora are being created, sometimes with inaccurate information, which is no longer at scale of human analyses. OBJECTIVE: In the context of searching for scientific evidence in the deluge of COVID-19-related literature, we present an information retrieval methodology for effective identification of relevant sources to answer biomedical queries posed using natural language. METHODS: Our multistage retrieval methodology combines probabilistic weighting models and reranking algorithms based on deep neural architectures to boost the ranking of relevant documents. Similarity of COVID-19 queries is compared to documents, and a series of postprocessing methods is applied to the initial ranking list to improve the match between the query and the biomedical information source and boost the position of relevant documents. RESULTS: The methodology was evaluated in the context of the TREC-COVID challenge, achieving competitive results with the top-ranking teams participating in the competition. Particularly, the combination of bag-of-words and deep neural language models significantly outperformed an Okapi Best Match 25-based baseline, retrieving on average, 83% of relevant documents in the top 20. CONCLUSIONS: These results indicate that multistage retrieval supported by deep learning could enhance identification of literature for COVID-19-related questions posed using natural language.

Assuntos

COVID-19 , Algoritmos , Humanos , Armazenamento e Recuperação da Informação , Idioma , SARS-CoV-2

5.

Online Health Search Via Multidimensional Information Quality Assessment Based on Deep Language Models: Algorithm Development and Validation.

Zhang, Boya; Naderi, Nona; Mishra, Rahul; Teodoro, Douglas.

JMIR AI ; 3: e42630, 2024 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-38875551

RESUMO

BACKGROUND: Widespread misinformation in web resources can lead to serious implications for individuals seeking health advice. Despite that, information retrieval models are often focused only on the query-document relevance dimension to rank results. OBJECTIVE: We investigate a multidimensional information quality retrieval model based on deep learning to enhance the effectiveness of online health care information search results. METHODS: In this study, we simulated online health information search scenarios with a topic set of 32 different health-related inquiries and a corpus containing 1 billion web documents from the April 2019 snapshot of Common Crawl. Using state-of-the-art pretrained language models, we assessed the quality of the retrieved documents according to their usefulness, supportiveness, and credibility dimensions for a given search query on 6030 human-annotated, query-document pairs. We evaluated this approach using transfer learning and more specific domain adaptation techniques. RESULTS: In the transfer learning setting, the usefulness model provided the largest distinction between help- and harm-compatible documents, with a difference of +5.6%, leading to a majority of helpful documents in the top 10 retrieved. The supportiveness model achieved the best harm compatibility (+2.4%), while the combination of usefulness, supportiveness, and credibility models achieved the largest distinction between help- and harm-compatibility on helpful topics (+16.9%). In the domain adaptation setting, the linear combination of different models showed robust performance, with help-harm compatibility above +4.4% for all dimensions and going as high as +6.8%. CONCLUSIONS: These results suggest that integrating automatic ranking models created for specific information quality dimensions can increase the effectiveness of health-related information retrieval. Thus, our approach could be used to enhance searches made by individuals seeking online health information.

6.

Did high frequency phone surveys during the COVID-19 pandemic include disability questions? An assessment of COVID-19 surveys from March 2020 to December 2022.

Nascimento Dial, Amanda; Vicente, David; Mitra, Sophie; Teodoro, Douglas; Rivas Velarde, Minerva.

BMJ Open ; 14(7): e079760, 2024 Jul 11.

Artigo em Inglês | MEDLINE | ID: mdl-38991678

RESUMO

OBJECTIVES: In the midst of the pandemic, face-to-face data collection for national censuses and surveys was suspended due to limitations on mobility and social distancing, limiting the collection of already scarce disability data. Responses to these constraints were met with a surge of high-frequency phone surveys (HFPSs) that aimed to provide timely data for understanding the socioeconomic impacts of and responses to the pandemic. This paper provides an assessment of HFPS datasets and their inclusion of disability questions to evaluate the visibility of persons with disabilities during the COVID-19 pandemic. DESIGN: We collected HFPS questionnaires conducted globally from the onset of the pandemic emergency in March 2020 until December 2022 from various online survey repositories. Each HFPS questionnaire was searched using a set of keywords for inclusion of different types of disability questions. Results were recorded in an Excel review log, which was manually reviewed by two researchers. METHODS: The review of HFPS datasets involved two stages: (1) a main review of 294 HFPS dataset-waves and (2) a semiautomated review of the same dataset-waves using a search engine-powered questionnaire review tool developed by our team. The results from the main review were compared with those of a sensitivity analysis using and testing the tool as an alternative to manual search. RESULTS: Roughly half of HFPS datasets reviewed and 60% of the countries included in this study had some type of question on disability. While disability questions were not widely absent from HFPS datasets, only 3% of HFPS datasets included functional difficulty questions that meet international standards. The search engine-powered questionnaire review tool proved to be able to streamline the search process for future research on inclusive data. CONCLUSIONS: The dearth of functional difficulty questions and the Washington-Group Short Set in particular in HFPS has contributed to the relative invisibility of persons with disabilities during the pandemic emergency, the lingering effects of which could impede policy-making, monitoring and advocacy on behalf of persons with disabilities.

Assuntos

COVID-19 , Pessoas com Deficiência , SARS-CoV-2 , Humanos , COVID-19/epidemiologia , Pessoas com Deficiência/estatística & dados numéricos , Inquéritos e Questionários , Pandemias , Telefone

7.

Exploring Medical Career Choice to Better Inform Swiss Physician Workforce Planning: Protocol for a National Cohort Study.

Abbiati, Milena; Nendaz, Mathieu R; Cerutti, Bernard; Brodmann Mäder, Monika; Spinas, Giatgen A; Vicente Alvarez, David; Teodoro, Douglas; Savoldelli, Georges L; Bajwa, Nadia M.

JMIR Res Protoc ; 13: e53138, 2024 Jan 17.

Artigo em Inglês | MEDLINE | ID: mdl-38231561

RESUMO

BACKGROUND: A medical student's career choice directly influences the physician workforce shortage and the misdistribution of resources. First, individual and contextual factors related to career choice have been evaluated separately, but their interaction over time is unclear. Second, actual career choice, reasons for this choice, and the influence of national political strategies are currently unknown in Switzerland. OBJECTIVE: The overall objective of this study is to better understand the process of Swiss medical students' career choice and to predict this choice. Our specific aims will be to examine the predominately static (ie, sociodemographic and personality traits) and predominately dynamic (ie, learning context perceptions, anxiety state, motivation, and motives for career choice) variables that predict the career choice of Swiss medical school students, as well as their interaction, and to examine the evolution of Swiss medical students' career choice and their ultimate career path, including an international comparison with French medical students. METHODS: The Swiss Medical Career Choice study is a national, multi-institution, and longitudinal study in which all medical students at all medical schools in Switzerland are eligible to participate. Data will be collected over 4 years for 4 cohorts of medical students using questionnaires in years 4 and 6. We will perform a follow-up during postgraduate training year 2 for medical graduates between 2018 and 2022. We will compare the different Swiss medical schools and a French medical school (the University of Strasbourg Faculty of Medicine). We will also examine the effect of new medical master's programs in terms of career choice and location of practice. For aim 2, in collaboration with the Swiss Institute for Medical Education, we will implement a national career choice tracking system and identify the final career choice of 2 cohorts of medical students who graduated from 4 Swiss medical schools from 2010 to 2012. We will also develop a model to predict their final career choice. Data analysis will be conducted using inferential statistics, and machine learning approaches will be used to refine the predictive model. RESULTS: This study was funded by the Swiss National Science Foundation in January 2023. Recruitment began in May 2023. Data analysis will begin after the completion of the first cohort data collection. CONCLUSIONS: Our research will inform national stakeholders and medical schools on the prediction of students' future career choice and on key aspects of physician workforce planning. We will identify targeted actions that may be implemented during medical school and may ultimately influence career choice and encourage the correct number of physicians in the right specialties to fulfill the needs of currently underserved regions. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/53138.

8.

A Dataset for Evaluating Contextualized Representation of Biomedical Concepts in Language Models.

Rouhizadeh, Hossein; Nikishina, Irina; Yazdani, Anthony; Bornet, Alban; Zhang, Boya; Ehrsam, Julien; Gaudet-Blavignac, Christophe; Naderi, Nona; Teodoro, Douglas.

Sci Data ; 11(1): 455, 2024 May 04.

Artigo em Inglês | MEDLINE | ID: mdl-38704422

RESUMO

Due to the complexity of the biomedical domain, the ability to capture semantically meaningful representations of terms in context is a long-standing challenge. Despite important progress in the past years, no evaluation benchmark has been developed to evaluate how well language models represent biomedical concepts according to their corresponding context. Inspired by the Word-in-Context (WiC) benchmark, in which word sense disambiguation is reformulated as a binary classification task, we propose a novel dataset, BioWiC, to evaluate the ability of language models to encode biomedical terms in context. BioWiC comprises 20'156 instances, covering over 7'400 unique biomedical terms, making it the largest WiC dataset in the biomedical domain. We evaluate BioWiC both intrinsically and extrinsically and show that it could be used as a reliable benchmark for evaluating context-dependent embeddings in biomedical corpora. In addition, we conduct several experiments using a variety of discriminative and generative large language models to establish robust baselines that can serve as a foundation for future research.

Assuntos

Processamento de Linguagem Natural , Semântica , Idioma

9.

Detection of Patients at Risk of Multidrug-Resistant Enterobacteriaceae Infection Using Graph Neural Networks: A Retrospective Study.

Gouareb, Racha; Bornet, Alban; Proios, Dimitrios; Pereira, Sónia Gonçalves; Teodoro, Douglas.

Health Data Sci ; 3: 0099, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-38487204

RESUMO

Background: While Enterobacteriaceae bacteria are commonly found in the healthy human gut, their colonization of other body parts can potentially evolve into serious infections and health threats. We investigate a graph-based machine learning model to predict risks of inpatient colonization by multidrug-resistant (MDR) Enterobacteriaceae. Methods: Colonization prediction was defined as a binary task, where the goal is to predict whether a patient is colonized by MDR Enterobacteriaceae in an undesirable body part during their hospital stay. To capture topological features, interactions among patients and healthcare workers were modeled using a graph structure, where patients are described by nodes and their interactions are described by edges. Then, a graph neural network (GNN) model was trained to learn colonization patterns from the patient network enriched with clinical and spatiotemporal features. Results: The GNN model achieves performance between 0.91 and 0.96 area under the receiver operating characteristic curve (AUROC) when trained in inductive and transductive settings, respectively, up to 8% above a logistic regression baseline (0.88). Comparing network topologies, the configuration considering ward-related edges (0.91 inductive, 0.96 transductive) outperforms the configurations considering caregiver-related edges (0.88, 0.89) and both types of edges (0.90, 0.94). For the top 3 most prevalent MDR Enterobacteriaceae, the AUROC varies from 0.94 for Citrobacter freundii up to 0.98 for Enterobacter cloacae using the best-performing GNN model. Conclusion: Topological features via graph modeling improve the performance of machine learning models for Enterobacteriaceae colonization prediction. GNNs could be used to support infection prevention and control programs to detect patients at risk of colonization by MDR Enterobacteriaceae and other bacteria families.

10.

Deep learning-based risk prediction for interventional clinical trials based on protocol design: A retrospective study.

Ferdowsi, Sohrab; Knafou, Julien; Borissov, Nikolay; Vicente Alvarez, David; Mishra, Rahul; Amini, Poorya; Teodoro, Douglas.

Patterns (N Y) ; 4(3): 100689, 2023 Mar 10.

Artigo em Inglês | MEDLINE | ID: mdl-36960445

RESUMO

Success rate of clinical trials (CTs) is low, with the protocol design itself being considered a major risk factor. We aimed to investigate the use of deep learning methods to predict the risk of CTs based on their protocols. Considering protocol changes and their final status, a retrospective risk assignment method was proposed to label CTs according to low, medium, and high risk levels. Then, transformer and graph neural networks were designed and combined in an ensemble model to learn to infer the ternary risk categories. The ensemble model achieved robust performance (area under the receiving operator characteristic curve [AUROC] of 0.8453 [95% confidence interval: 0.8409-0.8495]), similar to the individual architectures but significantly outperforming a baseline based on bag-of-words features (0.7548 [0.7493-0.7603] AUROC). We demonstrate the potential of deep learning in predicting the risk of CTs from their protocols, paving the way for customized risk mitigation strategies during protocol design.

11.

Ensemble of deep learning language models to support the creation of living systematic reviews for the COVID-19 literature.

Knafou, Julien; Haas, Quentin; Borissov, Nikolay; Counotte, Michel; Low, Nicola; Imeri, Hira; Ipekci, Aziz Mert; Buitrago-Garcia, Diana; Heron, Leonie; Amini, Poorya; Teodoro, Douglas.

Syst Rev ; 12(1): 94, 2023 06 05.

Artigo em Inglês | MEDLINE | ID: mdl-37277872

RESUMO

BACKGROUND: The COVID-19 pandemic has led to an unprecedented amount of scientific publications, growing at a pace never seen before. Multiple living systematic reviews have been developed to assist professionals with up-to-date and trustworthy health information, but it is increasingly challenging for systematic reviewers to keep up with the evidence in electronic databases. We aimed to investigate deep learning-based machine learning algorithms to classify COVID-19-related publications to help scale up the epidemiological curation process. METHODS: In this retrospective study, five different pre-trained deep learning-based language models were fine-tuned on a dataset of 6365 publications manually classified into two classes, three subclasses, and 22 sub-subclasses relevant for epidemiological triage purposes. In a k-fold cross-validation setting, each standalone model was assessed on a classification task and compared against an ensemble, which takes the standalone model predictions as input and uses different strategies to infer the optimal article class. A ranking task was also considered, in which the model outputs a ranked list of sub-subclasses associated with the article. RESULTS: The ensemble model significantly outperformed the standalone classifiers, achieving a F1-score of 89.2 at the class level of the classification task. The difference between the standalone and ensemble models increases at the sub-subclass level, where the ensemble reaches a micro F1-score of 70% against 67% for the best-performing standalone model. For the ranking task, the ensemble obtained the highest recall@3, with a performance of 89%. Using an unanimity voting rule, the ensemble can provide predictions with higher confidence on a subset of the data, achieving detection of original papers with a F1-score up to 97% on a subset of 80% of the collection instead of 93% on the whole dataset. CONCLUSION: This study shows the potential of using deep learning language models to perform triage of COVID-19 references efficiently and support epidemiological curation and review. The ensemble consistently and significantly outperforms any standalone model. Fine-tuning the voting strategy thresholds is an interesting alternative to annotate a subset with higher predictive confidence.

Assuntos

COVID-19 , Aprendizado Profundo , Humanos , Pandemias , Estudos Retrospectivos , Idioma

12.

Building a transnational biosurveillance network using semantic web technologies: requirements, design, and preliminary evaluation.

Teodoro, Douglas; Pasche, Emilie; Gobeill, Julien; Emonet, Stéphane; Ruch, Patrick; Lovis, Christian.

J Med Internet Res ; 14(3): e73, 2012 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-22642960

RESUMO

BACKGROUND: Antimicrobial resistance has reached globally alarming levels and is becoming a major public health threat. Lack of efficacious antimicrobial resistance surveillance systems was identified as one of the causes of increasing resistance, due to the lag time between new resistances and alerts to care providers. Several initiatives to track drug resistance evolution have been developed. However, no effective real-time and source-independent antimicrobial resistance monitoring system is available publicly. OBJECTIVE: To design and implement an architecture that can provide real-time and source-independent antimicrobial resistance monitoring to support transnational resistance surveillance. In particular, we investigated the use of a Semantic Web-based model to foster integration and interoperability of interinstitutional and cross-border microbiology laboratory databases. METHODS: Following the agile software development methodology, we derived the main requirements needed for effective antimicrobial resistance monitoring, from which we proposed a decentralized monitoring architecture based on the Semantic Web stack. The architecture uses an ontology-driven approach to promote the integration of a network of sentinel hospitals or laboratories. Local databases are wrapped into semantic data repositories that automatically expose local computing-formalized laboratory information in the Web. A central source mediator, based on local reasoning, coordinates the access to the semantic end points. On the user side, a user-friendly Web interface provides access and graphical visualization to the integrated views. RESULTS: We designed and implemented the online Antimicrobial Resistance Trend Monitoring System (ARTEMIS) in a pilot network of seven European health care institutions sharing 70+ million triples of information about drug resistance and consumption. Evaluation of the computing performance of the mediator demonstrated that, on average, query response time was a few seconds (mean 4.3, SD 0.1 × 10(2) seconds). Clinical pertinence assessment showed that resistance trends automatically calculated by ARTEMIS had a strong positive correlation with the European Antimicrobial Resistance Surveillance Network (EARS-Net) (ρ = .86, P < .001) and the Sentinel Surveillance of Antibiotic Resistance in Switzerland (SEARCH) (ρ = .84, P < .001) systems. Furthermore, mean resistance rates extracted by ARTEMIS were not significantly different from those of either EARS-Net (∆ = ±0.130; 95% confidence interval -0 to 0.030; P < .001) or SEARCH (∆ = ±0.042; 95% confidence interval -0.004 to 0.028; P = .004). CONCLUSIONS: We introduce a distributed monitoring architecture that can be used to build transnational antimicrobial resistance surveillance networks. Results indicated that the Semantic Web-based approach provided an efficient and reliable solution for development of eHealth architectures that enable online antimicrobial resistance monitoring from heterogeneous data sources. In future, we expect that more health care institutions can join the ARTEMIS network so that it can provide a large European and wider biosurveillance network that can be used to detect emerging bacterial resistance in a multinational context and support public health actions.

Assuntos

Cooperação Internacional , Internet , Vigilância da População , Simulação por Computador , Resistência Microbiana a Medicamentos , Software , Interface Usuário-Computador

13.

Pathogens and gene product normalization in the biomedical literature.

Vishnyakova, Dina; Pasche, Emilie; Teodoro, Douglas; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 174: 89-93, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22491118

RESUMO

We present a new approach for pathogens and gene product normalization in the biomedical literature. The idea of this approach was motivated by needs such as literature curation, in particular applied to the field of infectious diseases thus, variants of bacterial species (S. aureus, Staphyloccocus aureus, ...) and their gene products (protein ArsC, Arsenical pump modifier, Arsenate reductase, ...). Our approach is based on the use of an Ontology Look-up Service, a Gene Ontology Categorizer (GOCat) and Gene Normalization methods. In the pathogen detection task the use of OLS disambiguates found pathogen names. GOCat results are incorporated into overall score system to support and to confirm the decisionmaking in normalization process of pathogens and their genomes. The evaluation was done on two test sets of BioCreativeIII benchmark: gold standard of manual curation (50 articles) and silver standard (507 articles) curated by collective results of BCIII participants. For the cross-species GN we achieved the precision of 46% for silver and 27% for gold sets. Pathogen normalization results showed 95% of precision and 93% of recall. The impact of GOCat explicitly improves results of pathogen and gene normalization, basically confirming identified pathogens and boosting correct gene identifiers on the top of the results' list ranked by confidence. A correct identification of the pathogen is able to improve significantly normalization effectiveness and to solve the disambiguation problem of genes.

Assuntos

Bactérias/classificação , Proteínas de Bactérias/classificação , Mineração de Dados/métodos , Publicações Periódicas como Assunto , Vocabulário Controlado , Humanos

14.

A user-friendly tool for medical-related patent retrieval.

Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnyakova, Dina; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 174: 121-5, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22491124

RESUMO

Health-related information retrieval is complicated by the variety of nomenclatures available to name entities, since different communities of users will use different ways to name a same entity. We present in this report the development and evaluation of a user-friendly interactive Web application aiming at facilitating health-related patent search. Our tool, called TWINC, relies on a search engine tuned during several patent retrieval competitions, enhanced with intelligent interaction modules, such as chemical query, normalization and expansion. While the functionality of related article search showed promising performances, the ad hoc search results in fairly contrasted results. Nonetheless, TWINC performed well during the PatOlympics competition and was appreciated by intellectual property experts. This result should be balanced by the limited evaluation sample. We can also assume that it can be customized to be applied in corporate search environments to process domain and company-specific vocabularies, including non-English literature and patents reports.

Assuntos

Armazenamento e Recuperação da Informação/métodos , Internet , Patentes como Assunto , Ferramenta de Busca/métodos , Interface Usuário-Computador , Inteligência Artificial , Humanos

15.

Data Definition Ontology for clinical data integration and querying.

Assélé Kama, Ariane; Primadhanty, Audi; Choquet, Rémy; Teodoro, Douglas; Enders, Frank; Duclos, Catherine; Jaulent, Marie-Christine.

Stud Health Technol Inform ; 180: 38-42, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22874148

RESUMO

This paper describes an approach to build a Data Definition Ontology (DDO) in the context of full domain ontology integration with datasets in order to share and query clinical heterogeneous data repositories. We have adapted an existing semantic web tool (D2RQ) to implement a process that automatically generates the DDO from a database information model, thanks to reverse engineering and schema mapping approaches. This study has been performed in the context of the DebugIT European project (Detecting and Eliminating Bacteria UsinG Information Technology) that aims to control and monitor the bacterial growth via a semantic interoperability platform (IP). The evaluation of the process is based, first, on the accuracy of the produced DDO for different samples of database storage and second, by checking the congruency between the DDO and the D2RQ database mapping file.

Assuntos

Mineração de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Registros Eletrônicos de Saúde/classificação , Prontuários Médicos/classificação , Processamento de Linguagem Natural , Terminologia como Assunto , Documentação/métodos , Integração de Sistemas

16.

An advanced search engine for patent analytics in medicinal chemistry.

Pasche, Emilie; Gobeill, Julien; Teodoro, Douglas; Gaudinat, Arnaud; Vishnykova, Dina; Lovis, Christian; Ruch, Patrick.

Stud Health Technol Inform ; 180: 204-9, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22874181

RESUMO

Patent collections contain an important amount of medical-related knowledge, but existing tools were reported to lack of useful functionalities. We present here the development of TWINC, an advanced search engine dedicated to patent retrieval in the domain of health and life sciences. Our tool embeds two search modes: an ad hoc search to retrieve relevant patents given a short query and a related patent search to retrieve similar patents given a patent. Both search modes rely on tuning experiments performed during several patent retrieval competitions. Moreover, TWINC is enhanced with interactive modules, such as chemical query expansion, which is of prior importance to cope with various ways of naming biomedical entities. While the related patent search showed promising performances, the ad-hoc search resulted in fairly contrasted results. Nonetheless, TWINC performed well during the Chemathlon task of the PatOlympics competition and experts appreciated its usability.

Assuntos

Química Farmacêutica/métodos , Mineração de Dados/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Produtos Farmacêuticos , Internet , Patentes como Assunto , Ferramenta de Busca/métodos , Interface Usuário-Computador

17.

Analyzing the Information Content of Text-Based Files in Supplementary Materials of Biomedical Literature.

Naderi, Nona; Mottaz, Anaïs; Teodoro, Douglas; Ruch, Patrick.

Stud Health Technol Inform ; 294: 876-877, 2022 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-35612233

RESUMO

We present an analysis of supplementary materials of PubMed Central (PMC) articles and show their importance in indexing and searching biomedical literature, in particular for the emerging genomic medicine field. On a subset of articles from PubMed Central, we use text mining methods to extract MeSH terms from abstracts, full texts, and text-based supplementary materials. We find that the recall of MeSH annotations increases by about 5.9 percentage points (+20% on relative percentage) when considering supplementary materials compared to using only abstracts. We further compare the supplementary material annotations with full-text annotations and we find out that the recall of MeSH terms increases by 1.5 percentage point (+3% on relative percentage). Additionally, we analyze genetic variant mentions in abstracts and full-texts and compare them with mentions found in supplementary text-based files. We find that the majority (about 99%) of variants are found in text-based supplementary files. In conclusion, we suggest that supplementary data should receive more attention from the information retrieval community, in particular in life and health sciences.

Assuntos

Medical Subject Headings , Envio de Mensagens de Texto , Mineração de Dados/métodos , PubMed , Registros

18.

Designing an Optimal Expansion Method to Improve the Recall of a Genomic Variant Curation-Support Service.

Mottaz, Anaïs; Pasche, Emilie; Michel, Pierre-André; Mottin, Luc; Teodoro, Douglas; Ruch, Patrick.

Stud Health Technol Inform ; 294: 839-843, 2022 May 25.

Artigo em Inglês | MEDLINE | ID: mdl-35612222

RESUMO

The importance of genomic data for health is rapidly growing but accessing and gathering information about variants from different sources is hindered by highly heterogeneous representations of variants, as outlined by clinical associations (AMP/ASCO/CAP) in their recommendations. To enable a smooth and effective retrieval of variant-containing documents from different resources, we developed a tool (https://goldorak.hesge.ch/synvar/) that generates for any given SNP - including variant not present in existing databases - its corresponding description at the genome, transcript and protein levels. It provides variant descriptions in the HGVS format as well as in many non-standard formats found in the literature along with database identifiers. We present the SynVar service and evaluate its impact on the recall of a genomic variant curation-support service. Using SynVar to search variants in the literature enables to increase the recall by +133.8% without a strong impact on precision (i.e. 93%).

Assuntos

Genômica , Bases de Dados Factuais

19.

Reducing systematic review burden using Deduklick: a novel, automated, reliable, and explainable deduplication algorithm to foster medical research.

Borissov, Nikolay; Haas, Quentin; Minder, Beatrice; Kopp-Heim, Doris; von Gernler, Marc; Janka, Heidrun; Teodoro, Douglas; Amini, Poorya.

Syst Rev ; 11(1): 172, 2022 08 17.

Artigo em Inglês | MEDLINE | ID: mdl-35978441

RESUMO

BACKGROUND: Identifying and removing reference duplicates when conducting systematic reviews (SRs) remain a major, time-consuming issue for authors who manually check for duplicates using built-in features in citation managers. To address issues related to manual deduplication, we developed an automated, efficient, and rapid artificial intelligence-based algorithm named Deduklick. Deduklick combines natural language processing algorithms with a set of rules created by expert information specialists. METHODS: Deduklick's deduplication uses a multistep algorithm of data normalization, calculates a similarity score, and identifies unique and duplicate references based on metadata fields, such as title, authors, journal, DOI, year, issue, volume, and page number range. We measured and compared Deduklick's capacity to accurately detect duplicates with the information specialists' standard, manual duplicate removal process using EndNote on eight existing heterogeneous datasets. Using a sensitivity analysis, we manually cross-compared the efficiency and noise of both methods. DISCUSSION: Deduklick achieved average recall of 99.51%, average precision of 100.00%, and average F1 score of 99.75%. In contrast, the manual deduplication process achieved average recall of 88.65%, average precision of 99.95%, and average F1 score of 91.98%. Deduklick achieved equal to higher expert-level performance on duplicate removal. It also preserved high metadata quality and drastically reduced time spent on analysis. Deduklick represents an efficient, transparent, ergonomic, and time-saving solution for identifying and removing duplicates in SRs searches. Deduklick could therefore simplify SRs production and represent important advantages for scientists, including saving time, increasing accuracy, reducing costs, and contributing to quality SRs.

Assuntos

Algoritmos , Inteligência Artificial , Revisões Sistemáticas como Assunto , Pesquisa Biomédica , Humanos , Processamento de Linguagem Natural

20.

Interoperability driven integration of biomedical data sources.

Teodoro, Douglas; Choquet, Rémy; Schober, Daniel; Mels, Giovanni; Pasche, Emilie; Ruch, Patrick; Lovis, Christian.

Stud Health Technol Inform ; 169: 185-9, 2011.

Artigo em Inglês | MEDLINE | ID: mdl-21893739

RESUMO

In this paper, we introduce a data integration methodology that promotes technical, syntactic and semantic interoperability for operational healthcare data sources. ETL processes provide access to different operational databases at the technical level. Furthermore, data instances have they syntax aligned according to biomedical terminologies using natural language processing. Finally, semantic web technologies are used to ensure common meaning and to provide ubiquitous access to the data. The system's performance and solvability assessments were carried out using clinical questions against seven healthcare institutions distributed across Europe. The architecture managed to provide interoperability within the limited heterogeneous grid of hospitals. Preliminary scalability result tests are provided.

Assuntos

Coleta de Dados/métodos , Armazenamento e Recuperação da Informação/métodos , Informática Médica/métodos , Integração de Sistemas , Infecção Hospitalar/epidemiologia , Infecção Hospitalar/microbiologia , Sistemas de Gerenciamento de Base de Dados , Bases de Dados Factuais , Europa (Continente) , Humanos , Internet , Processamento de Linguagem Natural , Linguagens de Programação , Semântica , Terminologia como Assunto , Vocabulário Controlado

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA