Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
J Biomed Inform ; 127: 104005, 2022 03.
Artigo em Inglês | MEDLINE | ID: mdl-35144000

RESUMO

Consumers from non-medical backgrounds often look for information regarding a specific medical information need; however, they are limited by their lack of medical knowledge and may not be able to find reputable resources. As a case study, we investigate reducing this knowledge barrier to allow consumers to achieve search effectiveness comparable to that of an expert, or a medical professional, for COVID-19 related questions. We introduce and evaluate a hybrid index model that allows a consumer to formulate queries using consumer language to find relevant answers to COVID-19 questions. Our aim is to reduce performance degradation between medical professional queries and those of a consumer. We use a universal sentence embedding model to project consumer queries into the same semantic space as professional queries. We then incorporate sentence embeddings into a search framework alongside an inverted index. Documents from this index are retrieved using a novel scoring function that considers sentence embeddings and BM25 scoring. We find that our framework alleviates the expertise disparity, which we validate using an additional set of crowdsourced-consumer-queries even in an unsupervised setting. We also propose an extension of our method, where the sentence encoder is optimised in a supervised setup. Our framework allows for a consumer to search using consumer queries to match the search performance with that of a professional.


Assuntos
COVID-19 , Armazenamento e Recuperação da Informação , Humanos , Processamento de Linguagem Natural , SARS-CoV-2 , Unified Medical Language System
2.
BMC Bioinformatics ; 21(Suppl 19): 572, 2020 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-33349237

RESUMO

BACKGROUND: Finding relevant literature is crucial for many biomedical research activities and in the practice of evidence-based medicine. Search engines such as PubMed provide a means to search and retrieve published literature, given a query. However, they are limited in how users can control the processing of queries and articles-or as we call them documents-by the search engine. To give this control to both biomedical researchers and computer scientists working in biomedical information retrieval, we introduce a public online tool for searching over biomedical literature. Our setup is guided by the NIST setup of the relevant TREC evaluation tasks in genomics, clinical decision support, and precision medicine. RESULTS: To provide benchmark results for some of the most common biomedical information retrieval strategies, such as querying MeSH subject headings with a specific weight or querying over the title of the articles only, we present our evaluations on public datasets. Our experiments report well-known information retrieval metrics such as precision at a cutoff of ranked documents. CONCLUSIONS: We introduce the A2A search and benchmarking tool which is publicly available for the researchers who want to explore different search strategies over published biomedical literature. We outline several query formulation strategies and present their evaluations with known human judgements for a large pool of topics, from genomics to precision medicine.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Software , Pesquisa Biomédica , Bases de Dados Factuais , Humanos , Medical Subject Headings
3.
Epidemiology ; 31(1): 90-97, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-31651659

RESUMO

BACKGROUND: Melbourne, Australia, witnessed a thunderstorm asthma outbreak on 21 November 2016, resulting in over 8,000 hospital admissions by 6 P.M. This is a typical acute disease event. Because the time to respond is short for acute disease events, an algorithm based on time between events has shown promise. Shorter the time between consecutive incidents of the disease, more likely the outbreak. Social media posts such as tweets can be used as input to the monitoring algorithm. However, due to the large volume of tweets, a large number of alerts may be produced. We refer to this problem as alert swamping. METHODS: We present a four-step architecture for the early detection of the acute disease event, using social media posts (tweets) on Twitter. To curb alert swamping, the first three steps of the algorithm ensure the relevance of the tweets. The fourth step is a monitoring algorithm based on time between events. We experiment with a dataset of tweets posted in Melbourne from 2014 to 2016, focusing on the thunderstorm asthma outbreak in Melbourne in November 2016. RESULTS: Out of our 18 experiment combinations, three detected the thunderstorm asthma outbreak up to 9 hours before the time mentioned in the official report, and five were able to detect it before the first news report. CONCLUSIONS: With appropriate checks against alert swamping in place and the use of a monitoring algorithm based on time between events, tweets can provide early alerts for an acute disease event such as thunderstorm asthma.


Assuntos
Asma , Surtos de Doenças , Vigilância em Saúde Pública , Mídias Sociais , Doença Aguda , Algoritmos , Asma/epidemiologia , Austrália/epidemiologia , Humanos , Vigilância em Saúde Pública/métodos
4.
J Biomed Inform ; 109: 103530, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32818666

RESUMO

Bidirectional Encoder Representations from Transformers (BERT) have achieved state-of-the-art effectiveness in some of the biomedical information processing applications. We investigate the effectiveness of these techniques for clinical trial search systems. In precision medicine, matching patients to relevant experimental evidence or prospective treatments is a complex task which requires both clinical and biological knowledge. To assist in this complex decision making, we investigate the effectiveness of different ranking models based on the BERT models under the same retrieval platform to ensure fair comparisons. An evaluation on the TREC Precision Medicine benchmarks indicates that our approach using the BERT model pre-trained on scientific abstracts and clinical notes achieves state-of-the-art results, on par with highly specialised, manually optimised heuristic models. We also report the best results to date on the TREC Precision Medicine 2017 ad hoc retrieval task for clinical trial search.


Assuntos
Idioma , Processamento de Linguagem Natural , Humanos , Medicina de Precisão , Estudos Prospectivos
5.
J Biomed Inform ; 105: 103406, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32169670

RESUMO

Recruiting eligible patients for clinical trials is crucial for reliably answering specific questions about medical interventions and evaluation. However, clinical trial recruitment is a bottleneck in clinical research and drug development. Our goal is to provide an approach towards automating this manual and time-consuming patient recruitment task using natural language processing and machine learning techniques. Specifically, our approach extracts key information from series of narrative clinical documents in patient's records and collates helpful evidence to make decisions on eligibility of patients according to certain inclusion and exclusion criteria. Challenges in applying narrative clinical documents such as differences in reporting styles and sub-languages are addressed by enriching them with knowledge from domain ontologies in the form of semantic vector representations. We show that a machine learning model based on Multi-Layer Perceptron (MLP) is more effective for the task than five other neural networks and four conventional machine learning models. Our approach achieves overall micro-F1-Score of 84% for 13 different eligibility criteria. Our experiments also indicate that semantically enriched documents are more effective than using original documents for cohort selection. Our system provides an end-to-end machine learning-based solution that achieves comparable results with the state-of-the-art which relies on hand-crafted rules or data-centric engineered features.


Assuntos
Aprendizado de Máquina , Processamento de Linguagem Natural , Humanos , Idioma , Redes Neurais de Computação , Semântica
6.
J Biomed Inform ; 85: 68-79, 2018 09.
Artigo em Inglês | MEDLINE | ID: mdl-30026067

RESUMO

OBJECTIVE: Application of machine learning techniques for automatic and reliable classification of clinical documents have shown promising results. However, machine learning models require abundant training data specific to each target hospital and may not be able to benefit from available labeled data from each of the hospitals due to data variations. Such training data limitations have presented one of the major obstacles for maximising potential application of machine learning approaches in the healthcare domain. We investigated transferability of artificial neural network models across hospitals from different domains representing various age demographic groups (i.e., children, adults, and mixed) in order to cope with such limitations. MATERIALS AND METHODS: We explored the transferability of artificial neural networks for clinical document classification. Our case study was to detect abnormalities from limb X-ray reports obtained from the emergency department (ED) of three hospitals within different domains. Different transfer learning scenarios were investigated in order to employ a source hospital's trained model for addressing a target hospital's abnormality detection problem. RESULTS: A Convolutional Neural Network (CNN) model exhibited the best effectiveness compared to other networks when employing an embedding model trained on a large corpus of clinical documents. Furthermore, CNN models derived from a source hospital outperformed a conventional machine learning approach based on Support Vector Machines (SVM) when applied to a different (target) hospital. These models were further improved by leveraging available training data in target hospitals and outperformed the models that used only the target hospital data with F1-Score of 0.92-0.96 across three hospitals. DISCUSSION: Our transfer learning model used only simple vector representations of documents without any task-specific feature engineering. Transferring the CNN model significantly improved (approx.10% in F1-Score) the state-of-the-art approach for clinical document classification based on a trivial transferred model. In addition, the results showed that transfer learning techniques can further improve a CNN model that is trained only on either a source or target hospital's data. CONCLUSION: Transferring a pre-trained CNN model generated in one hospital to another facilitates application of machine learning approaches that alleviate both hospital-specific feature engineering and training data.


Assuntos
Redes Neurais de Computação , Interpretação de Imagem Radiográfica Assistida por Computador/estatística & dados numéricos , Radiografia/estatística & dados numéricos , Algoritmos , Biologia Computacional , Bases de Dados Factuais/estatística & dados numéricos , Aprendizado Profundo/estatística & dados numéricos , Humanos , Aprendizado de Máquina , Máquina de Vetores de Suporte
7.
J Biomed Inform ; 55: 73-81, 2015 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-25817970

RESUMO

CSIRO Adverse Drug Event Corpus (Cadec) is a new rich annotated corpus of medical forum posts on patient-reported Adverse Drug Events (ADEs). The corpus is sourced from posts on social media, and contains text that is largely written in colloquial language and often deviates from formal English grammar and punctuation rules. Annotations contain mentions of concepts such as drugs, adverse effects, symptoms, and diseases linked to their corresponding concepts in controlled vocabularies, i.e., SNOMED Clinical Terms and MedDRA. The quality of the annotations is ensured by annotation guidelines, multi-stage annotations, measuring inter-annotator agreement, and final review of the annotations by a clinical terminologist. This corpus is useful for studies in the area of information extraction, or more generally text mining, from social media to detect possible adverse drug reactions from direct patient reports. The corpus is publicly available at https://data.csiro.au.(1).


Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos/organização & administração , Informação de Saúde ao Consumidor/organização & administração , Mineração de Dados/métodos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/classificação , Mídias Sociais/organização & administração , Vocabulário Controlado , Conjuntos de Dados como Assunto/estatística & dados numéricos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Guias como Assunto , Humanos , Aprendizado de Máquina , Processamento de Linguagem Natural , Mídias Sociais/classificação , Terminologia como Assunto
8.
BMC Med Inform Decis Mak ; 15: 53, 2015 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-26174442

RESUMO

BACKGROUND: Death certificates provide an invaluable source for mortality statistics which can be used for surveillance and early warnings of increases in disease activity and to support the development and monitoring of prevention or response strategies. However, their value can be realised only if accurate, quantitative data can be extracted from death certificates, an aim hampered by both the volume and variable nature of certificates written in natural language. This study aims to develop a set of machine learning and rule-based methods to automatically classify death certificates according to four high impact diseases of interest: diabetes, influenza, pneumonia and HIV. METHODS: Two classification methods are presented: i) a machine learning approach, where detailed features (terms, term n-grams and SNOMED CT concepts) are extracted from death certificates and used to train a set of supervised machine learning models (Support Vector Machines); and ii) a set of keyword-matching rules. These methods were used to identify the presence of diabetes, influenza, pneumonia and HIV in a death certificate. An empirical evaluation was conducted using 340,142 death certificates, divided between training and test sets, covering deaths from 2000-2007 in New South Wales, Australia. Precision and recall (positive predictive value and sensitivity) were used as evaluation measures, with F-measure providing a single, overall measure of effectiveness. A detailed error analysis was performed on classification errors. RESULTS: Classification of diabetes, influenza, pneumonia and HIV was highly accurate (F-measure 0.96). More fine-grained ICD-10 classification effectiveness was more variable but still high (F-measure 0.80). The error analysis revealed that word variations as well as certain word combinations adversely affected classification. In addition, anomalies in the ground truth likely led to an underestimation of the effectiveness. CONCLUSIONS: The high accuracy and low cost of the classification methods allow for an effective means for automatic and real-time surveillance of diabetes, influenza, pneumonia and HIV deaths. In addition, the methods are generally applicable to other diseases of interest and to other sources of medical free-text besides death certificates.


Assuntos
Classificação , Atestado de Óbito , Monitoramento Epidemiológico , Aprendizado de Máquina , Humanos , New South Wales
9.
JMIR Med Inform ; 9(5): e30153, 2021 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-33939618

RESUMO

[This corrects the article DOI: 10.2196/24020.].

10.
JMIR Med Inform ; 9(4): e24020, 2021 Apr 30.
Artigo em Inglês | MEDLINE | ID: mdl-33664015

RESUMO

BACKGROUND: The prognosis, diagnosis, and treatment of many genetic disorders and familial diseases significantly improve if the family history (FH) of a patient is known. Such information is often written in the free text of clinical notes. OBJECTIVE: The aim of this study is to develop automated methods that enable access to FH data through natural language processing. METHODS: We performed information extraction by using transformers to extract disease mentions from notes. We also experimented with rule-based methods for extracting family member (FM) information from text and coreference resolution techniques. We evaluated different transfer learning strategies to improve the annotation of diseases. We provided a thorough error analysis of the contributing factors that affect such information extraction systems. RESULTS: Our experiments showed that the combination of domain-adaptive pretraining and intermediate-task pretraining achieved an F1 score of 81.63% for the extraction of diseases and FMs from notes when it was tested on a public shared task data set from the National Natural Language Processing Clinical Challenges (N2C2), providing a statistically significant improvement over the baseline (P<.001). In comparison, in the 2019 N2C2/Open Health Natural Language Processing Shared Task, the median F1 score of all 17 participating teams was 76.59%. CONCLUSIONS: Our approach, which leverages a state-of-the-art named entity recognition model for disease mention detection coupled with a hybrid method for FM mention detection, achieved an effectiveness that was close to that of the top 3 systems participating in the 2019 N2C2 FH extraction challenge, with only the top system convincingly outperforming our approach in terms of precision.

11.
BMC Med Inform Decis Mak ; 10: 58, 2010 Oct 12.
Artigo em Inglês | MEDLINE | ID: mdl-20937152

RESUMO

BACKGROUND: The process of constructing a systematic review, a document that compiles the published evidence pertaining to a specified medical topic, is intensely time-consuming, often taking a team of researchers over a year, with the identification of relevant published research comprising a substantial portion of the effort. The standard paradigm for this information-seeking task is to use Boolean search; however, this leaves the user(s) the requirement of examining every returned result. Further, our experience is that effective Boolean queries for this specific task are extremely difficult to formulate and typically require multiple iterations of refinement before being finalized. METHODS: We explore the effectiveness of using ranked retrieval as compared to Boolean querying for the purpose of constructing a systematic review. We conduct a series of experiments involving ranked retrieval, using queries defined methodologically, in an effort to understand the practicalities of incorporating ranked retrieval into the systematic search task. RESULTS: Our results show that ranked retrieval by itself is not viable for this search task requiring high recall. However, we describe a refinement of the standard Boolean search process and show that ranking within a Boolean result set can improve the overall search performance by providing early indication of the quality of the results, thereby speeding up the iterative query-refinement process. CONCLUSIONS: Outcomes of experiments suggest that an interactive query-development process using a hybrid ranked and Boolean retrieval system has the potential for significant time-savings over the current search process in the systematic reviewing.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Literatura de Revisão como Assunto
12.
PLoS One ; 15(3): e0230322, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32182277

RESUMO

First reported in March 2014, an Ebola epidemic impacted West Africa, most notably Liberia, Guinea and Sierra Leone. We demonstrate the value of social media for automated surveillance of infectious diseases such as the West Africa Ebola epidemic. We experiment with two variations of an existing surveillance architecture: the first aggregates tweets related to different symptoms together, while the second considers tweets about each symptom separately and then aggregates the set of alerts generated by the architecture. Using a dataset of tweets posted from the affected region from 2011 to 2014, we obtain alerts in December 2013, which is three months prior to the official announcement of the epidemic. Among the two variations, the second, which produces a restricted but useful set of alerts, can potentially be applied to other infectious disease surveillance and alert systems.


Assuntos
Mineração de Dados/métodos , Epidemias/prevenção & controle , Monitoramento Epidemiológico , Doença pelo Vírus Ebola/epidemiologia , Mídias Sociais/estatística & dados numéricos , Conjuntos de Dados como Assunto , Ebolavirus , Epidemias/estatística & dados numéricos , Guiné/epidemiologia , Doença pelo Vírus Ebola/diagnóstico , Doença pelo Vírus Ebola/virologia , Humanos , Libéria/epidemiologia , Serra Leoa/epidemiologia
13.
AMIA Annu Symp Proc ; 2018: 807-816, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30815123

RESUMO

Computer-assisted (diagnostic) coding (CAC) aims to improve the operational productivity and accuracy of clinical coders. The level of accuracy, especially for a wide range of complex and less prevalent clinical cases, remains an open research problem. This study investigates this problem on a broad spectrum of diagnostic codes and, in particular, investigates the effectiveness of utilising SNOMED CT for ICD-10 diagnosis coding. Hospital progress notes were used to provide the narrative rich electronic patient records for the investigation. A natural language processing (NLP) approach using mappings between SNOMED CT and ICD-10-AM (Australian Modification) was used to guide the coding. The proposed approach achieved 54.1% sensitivity and 70.2% positive predictive value. Given the complexity of the task, this was encouraging given the simplicity of the approach and what was projected as possible from a manual diagnosis code validation study (76.3% sensitivity). The results show the potential for advanced NLP-based approaches that leverage SNOMED CT to ICD-10 mapping for hospital in-patient coding.


Assuntos
Codificação Clínica/métodos , Classificação Internacional de Doenças , Processamento de Linguagem Natural , Systematized Nomenclature of Medicine , Austrália , Registros Eletrônicos de Saúde , Hospitais , Humanos , Unified Medical Language System
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA