Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 153: 104640, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38608915

RESUMO

Evidence-based medicine promises to improve the quality of healthcare by empowering medical decisions and practices with the best available evidence. The rapid growth of medical evidence, which can be obtained from various sources, poses a challenge in collecting, appraising, and synthesizing the evidential information. Recent advancements in generative AI, exemplified by large language models, hold promise in facilitating the arduous task. However, developing accountable, fair, and inclusive models remains a complicated undertaking. In this perspective, we discuss the trustworthiness of generative AI in the context of automated summarization of medical evidence.


Assuntos
Inteligência Artificial , Medicina Baseada em Evidências , Humanos , Confiança , Processamento de Linguagem Natural
2.
Ann Intern Med ; 175(7): 1001-1009, 2022 07.
Artigo em Inglês | MEDLINE | ID: mdl-35635850

RESUMO

BACKGROUND: Automation is a proposed solution for the increasing difficulty of maintaining up-to-date, high-quality health evidence. Evidence assessing the effectiveness of semiautomated data synthesis, such as risk-of-bias (RoB) assessments, is lacking. OBJECTIVE: To determine whether RobotReviewer-assisted RoB assessments are noninferior in accuracy and efficiency to assessments conducted with human effort only. DESIGN: Two-group, parallel, noninferiority, randomized trial. (Monash Research Office Project 11256). SETTING: Health-focused systematic reviews using Covidence. PARTICIPANTS: Systematic reviewers, who had not previously used RobotReviewer, completing Cochrane RoB assessments between February 2018 and May 2020. INTERVENTION: In the intervention group, reviewers received an RoB form prepopulated by RobotReviewer; in the comparison group, reviewers received a blank form. Studies were assigned in a 1:1 ratio via simple randomization to receive RobotReviewer assistance for either Reviewer 1 or Reviewer 2. Participants were blinded to study allocation before starting work on each RoB form. MEASUREMENTS: Co-primary outcomes were the accuracy of individual reviewer RoB assessments and the person-time required to complete individual assessments. Domain-level RoB accuracy was a secondary outcome. RESULTS: Of the 15 recruited review teams, 7 completed the trial (145 included studies). Integration of RobotReviewer resulted in noninferior overall RoB assessment accuracy (risk difference, -0.014 [95% CI, -0.093 to 0.065]; intervention group: 88.8% accurate assessments; control group: 90.2% accurate assessments). Data were inconclusive for the person-time outcome (RobotReviewer saved 1.40 minutes [CI, -5.20 to 2.41 minutes]). LIMITATION: Variability in user behavior and a limited number of assessable reviews led to an imprecise estimate of the time outcome. CONCLUSION: In health-related systematic reviews, RoB assessments conducted with RobotReviewer assistance are noninferior in accuracy to those conducted without RobotReviewer assistance. PRIMARY FUNDING SOURCE: University College London and Monash University.


Assuntos
Aprendizado de Máquina , Projetos de Pesquisa , Viés , Humanos , Medição de Risco
3.
BMC Med Inform Decis Mak ; 19(1): 96, 2019 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-31068178

RESUMO

OBJECTIVE: Assessing risks of bias in randomized controlled trials (RCTs) is an important but laborious task when conducting systematic reviews. RobotReviewer (RR), an open-source machine learning (ML) system, semi-automates bias assessments. We conducted a user study of RobotReviewer, evaluating time saved and usability of the tool. MATERIALS AND METHODS: Systematic reviewers applied the Cochrane Risk of Bias tool to four randomly selected RCT articles. Reviewers judged: whether an RCT was at low, or high/unclear risk of bias for each bias domain in the Cochrane tool (Version 1); and highlighted article text justifying their decision. For a random two of the four articles, the process was semi-automated: users were provided with ML-suggested bias judgments and text highlights. Participants could amend the suggestions if necessary. We measured time taken for the task, ML suggestions, usability via the System Usability Scale (SUS) and collected qualitative feedback. RESULTS: For 41 volunteers, semi-automation was quicker than manual assessment (mean 755 vs. 824 s; relative time 0.75, 95% CI 0.62-0.92). Reviewers accepted 301/328 (91%) of the ML Risk of Bias (RoB) judgments, and 202/328 (62%) of text highlights without change. Overall, ML suggested text highlights had a recall of 0.90 (SD 0.14) and precision of 0.87 (SD 0.21) with respect to the users' final versions. Reviewers assigned the system a mean 77.7 SUS score, corresponding to a rating between "good" and "excellent". CONCLUSIONS: Semi-automation (where humans validate machine learning suggestions) can improve the efficiency of evidence synthesis. Our system was rated highly usable, and expedited bias assessment of RCTs.


Assuntos
Viés , Aprendizado de Máquina , Ensaios Clínicos Controlados Aleatórios como Assunto , Retroalimentação , Humanos , Estudos Prospectivos , Medição de Risco
4.
J Med Internet Res ; 20(5): e164, 2018 05 04.
Artigo em Inglês | MEDLINE | ID: mdl-29728351

RESUMO

BACKGROUND: Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. OBJECTIVE: The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publically available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. METHODS: PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET's phenotype representation with PheKnow-Cloud's by using PheKnow-Cloud's experimental setup. In PIVET's framework, we also introduce a statistical model trained on domain expert-verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. RESULTS: PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET's analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. CONCLUSIONS: Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Internet/instrumentação , MEDLARS/normas , Algoritmos , Humanos , Fenótipo
5.
J Biomed Inform ; 61: 77-86, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27001195

RESUMO

OBJECTIVE: To evaluate whether vector representations encoding latent topic proportions that capture similarities to MeSH terms can improve performance on biomedical document retrieval and classification tasks, compared to using MeSH terms. MATERIALS AND METHODS: We developed the TopicalMeSH representation, which exploits the 'correspondence' between topics generated using latent Dirichlet allocation (LDA) and MeSH terms to create new document representations that combine MeSH terms and latent topic vectors. We used 15 systematic drug review corpora to evaluate performance on information retrieval and classification tasks using this TopicalMeSH representation, compared to using standard encodings that rely on either (1) the original MeSH terms, (2) the text, or (3) their combination. For the document retrieval task, we compared the precision and recall achieved by ranking citations using MeSH and TopicalMeSH representations, respectively. For the classification task, we considered three supervised machine learning approaches, Support Vector Machines (SVMs), logistic regression, and decision trees. We used these to classify documents as relevant or irrelevant using (independently) MeSH, TopicalMeSH, Words (i.e., n-grams extracted from citation titles and abstracts, encoded via bag-of-words representation), a combination of MeSH and Words, and a combination of TopicalMeSH and Words. We also used SVM to compare the classification performance of tf-idf weighted MeSH terms, LDA Topics, a combination of Topics and MeSH, and TopicalMeSH to supervised LDA's classification performance. RESULTS: For the document retrieval task, using the TopicalMeSH representation resulted in higher precision than MeSH in 11 of 15 corpora while achieving the same recall. For the classification task, use of TopicalMeSH features realized a higher F1 score in 14 of 15 corpora when used by SVMs, 12 of 15 corpora using logistic regression, and 12 of 15 corpora using decision trees. TopicalMeSH also had better document classification performance on 12 of 15 corpora when compared to Topics, tf-idf weighted MeSH terms, and a combination of Topics and MeSH using SVMs. Supervised LDA achieved the worst performance in most of the corpora. CONCLUSION: The proposed TopicalMeSH representation (which combines MeSH terms with latent topics) consistently improved performance on document retrieval and classification tasks, compared to using alternative standard representations using MeSH terms alone, as well as, several standard alternative approaches.


Assuntos
Armazenamento e Recuperação da Informação , Medical Subject Headings , Máquina de Vetores de Suporte , Árvores de Decisões , Humanos
6.
J Am Med Inform Assoc ; 31(4): 1009-1024, 2024 Apr 03.
Artigo em Inglês | MEDLINE | ID: mdl-38366879

RESUMO

OBJECTIVES: Question answering (QA) systems have the potential to improve the quality of clinical care by providing health professionals with the latest and most relevant evidence. However, QA systems have not been widely adopted. This systematic review aims to characterize current medical QA systems, assess their suitability for healthcare, and identify areas of improvement. MATERIALS AND METHODS: We searched PubMed, IEEE Xplore, ACM Digital Library, ACL Anthology, and forward and backward citations on February 7, 2023. We included peer-reviewed journal and conference papers describing the design and evaluation of biomedical QA systems. Two reviewers screened titles, abstracts, and full-text articles. We conducted a narrative synthesis and risk of bias assessment for each study. We assessed the utility of biomedical QA systems. RESULTS: We included 79 studies and identified themes, including question realism, answer reliability, answer utility, clinical specialism, systems, usability, and evaluation methods. Clinicians' questions used to train and evaluate QA systems were restricted to certain sources, types and complexity levels. No system communicated confidence levels in the answers or sources. Many studies suffered from high risks of bias and applicability concerns. Only 8 studies completely satisfied any criterion for clinical utility, and only 7 reported user evaluations. Most systems were built with limited input from clinicians. DISCUSSION: While machine learning methods have led to increased accuracy, most studies imperfectly reflected real-world healthcare information needs. Key research priorities include developing more realistic healthcare QA datasets and considering the reliability of answer sources, rather than merely focusing on accuracy.


Assuntos
Pessoal de Saúde , Sistemas Automatizados de Assistência Junto ao Leito , Humanos , Reprodutibilidade dos Testes , PubMed , Aprendizado de Máquina
7.
Proc Conf Assoc Comput Linguist Meet ; 2023: 15566-15589, 2023 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-37674787

RESUMO

Relation extraction (RE) is the core NLP task of inferring semantic relationships between entities from text. Standard supervised RE techniques entail training modules to tag tokens comprising entity spans and then predict the relationship between them. Recent work has instead treated the problem as a sequence-to-sequence task, linearizing relations between entities as target strings to be generated conditioned on the input. Here we push the limits of this approach, using larger language models (GPT-3 and Flan-T5 large) than considered in prior work and evaluating their performance on standard RE tasks under varying levels of supervision. We address issues inherent to evaluating generative approaches to RE by doing human evaluations, in lieu of relying on exact matching. Under this refined evaluation, we find that: (1) Few-shot prompting with GPT-3 achieves near SOTA performance, i.e., roughly equivalent to existing fully supervised models; (2) Flan-T5 is not as capable in the few-shot setting, but supervising and fine-tuning it with Chain-of-Thought (CoT) style explanations (generated via GPT-3) yields SOTA results. We release this model as a new baseline for RE tasks.

8.
Proc Conf Assoc Comput Linguist Meet ; 2023: 236-247, 2023 May.
Artigo em Inglês | MEDLINE | ID: mdl-37483390

RESUMO

We present TrialsSummarizer, a system that aims to automatically summarize evidence presented in the set of randomized controlled trials most relevant to a given query. Building on prior work (Marshall et al., 2020), the system retrieves trial publications matching a query specifying a combination of condition, intervention(s), and outcome(s), and ranks these according to sample size and estimated study quality. The top-k such studies are passed through a neural multi-document summarization system, yielding a synopsis of these trials. We consider two architectures: A standard sequence-to-sequence model based on BART (Lewis et al., 2019), and a multi-headed architecture intended to provide greater transparency to end-users. Both models produce fluent and relevant summaries of evidence retrieved for queries, but their tendency to introduce unsupported statements render them inappropriate for use in this domain at present. The proposed architecture may help users verify outputs allowing users to trace generated tokens back to inputs. The demonstration video is available at: https://vimeo.com/735605060 The prototype, source code, and model weights are available at: https://sanjanaramprasad.github.io/trials-summarizer/.

9.
J Clin Epidemiol ; 153: 26-33, 2023 01.
Artigo em Inglês | MEDLINE | ID: mdl-36150548

RESUMO

OBJECTIVES: The aim of this study is to describe and pilot a novel method for continuously identifying newly published trials relevant to a systematic review, enabled by combining artificial intelligence (AI) with human expertise. STUDY DESIGN AND SETTING: We used RobotReviewer LIVE to keep a review of COVID-19 vaccination trials updated from February to August 2021. We compared the papers identified by the system with those found by the conventional manual process by the review team. RESULTS: The manual update searches (last search date July 2021) retrieved 135 abstracts, of which 31 were included after screening (23% precision, 100% recall). By the same date, the automated system retrieved 56 abstracts, of which 31 were included after manual screening (55% precision, 100% recall). Key limitations of the system include that it is limited to searches of PubMed/MEDLINE, and considers only randomized controlled trial reports. We aim to address these limitations in future. The system is available as open-source software for further piloting and evaluation. CONCLUSION: Our system identified all relevant studies, reduced manual screening work, and enabled rolling updates on publication of new primary research.


Assuntos
Inteligência Artificial , COVID-19 , Humanos , Projetos Piloto , Vacinas contra COVID-19 , COVID-19/epidemiologia , COVID-19/prevenção & controle , PubMed
10.
J Biol Chem ; 286(24): 21623-32, 2011 Jun 17.
Artigo em Inglês | MEDLINE | ID: mdl-21527637

RESUMO

Bacterial communication via quorum sensing has been extensively investigated in recent years. Bacteria communicate in a complex manner through the production, release, and reception of diffusible low molecular weight chemical signaling molecules. Much work has focused on understanding the basic mechanisms of quorum sensing. As more and more bacteria grow resistant to conventional antibiotics, the development of drugs that do not kill bacteria but instead interrupt their communication is of increasing interest. This study presents a method for analyzing bacterial communication by investigating single cell responses. Most conventional analysis methods for bacterial communication are based on the averaged response from many bacteria, masking how individual cells respond to their immediate environment. We applied a fiber-optic microarray to record cellular communication from single cells. Single cell quorum sensing systems have previously been employed, but the highly ordered array reported here is an improvement because it allows us to simultaneously investigate cellular communication in many different environments with known cellular densities and configurations. We employed this method to detect how genes under quorum regulation are induced or repressed over time on the single cell level and to determine whether cellular density and configuration are indicative of the single cell temporal patterns of gene expression.


Assuntos
Regulação Bacteriana da Expressão Gênica , Percepção de Quorum/fisiologia , Proteínas de Bactérias/metabolismo , Biofísica/métodos , Comunicação Celular , Escherichia coli/metabolismo , Tecnologia de Fibra Óptica , Modelos Biológicos , Modelos Químicos , Análise de Sequência com Séries de Oligonucleotídeos , Fatores de Tempo , Transcrição Gênica
11.
Hum Mol Genet ; 19(3): 526-34, 2010 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-19933216

RESUMO

Chronic obstructive pulmonary disease (COPD) is a major cause of morbidity and mortality worldwide. COPD is thought to arise from the interaction of environmental exposures and genetic susceptibility, and major research efforts are underway to identify genetic determinants of COPD susceptibility. With the exception of SERPINA1, genetic associations with COPD identified by candidate gene studies have been inconsistently replicated, and this literature is difficult to interpret. We conducted a systematic review and meta-analysis of all population-based, case-control candidate gene COPD studies indexed in PubMed before 16 July 2008. We stored our findings in an online database, which serves as an up-to-date compendium of COPD genetic associations and cumulative meta-analysis estimates. On the basis of our systematic review, the vast majority of COPD candidate gene era studies are underpowered to detect genetic effect odds ratios of 1.2-1.5. We identified 27 genetic variants with adequate data for quantitative meta-analysis. Of these variants, four were significantly associated with COPD susceptibility in random effects meta-analysis, the GSTM1 null variant (OR 1.45, CI 1.09-1.92), rs1800470 in TGFB1 (0.73, CI 0.64-0.83), rs1800629 in TNF (OR 1.19, CI 1.01-1.40) and rs1799896 in SOD3 (OR 1.97, CI 1.24-3.13). In summary, most COPD candidate gene era studies are underpowered to detect moderate-sized genetic effects. Quantitative meta-analysis identified four variants in GSTM1, TGFB1, TNF and SOD3 that show statistically significant evidence of association with COPD susceptibility.


Assuntos
Bases de Dados Genéticas , Doença Pulmonar Obstrutiva Crônica/genética , Adulto , Idoso , Estudos de Casos e Controles , Feminino , Predisposição Genética para Doença , Genética Populacional , Glutationa Transferase/genética , Humanos , Masculino , Pessoa de Meia-Idade , Sistemas On-Line , Superóxido Dismutase/genética , Fator de Crescimento Transformador beta1/genética
12.
Genet Med ; 14(7): 663-9, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22481134

RESUMO

PURPOSE: The aim of this study was to demonstrate that modern data mining tools can be used as one step in reducing the labor necessary to produce and maintain systematic reviews. METHODS: We used four continuously updated, manually curated resources that summarize MEDLINE-indexed articles in entire fields using systematic review methods (PDGene, AlzGene, and SzGene for genetic determinants of Parkinson disease, Alzheimer disease, and schizophrenia, respectively; and the Tufts Cost-Effectiveness Analysis (CEA) Registry for cost-effectiveness analyses). In each data set, we trained a classification model on citations screened up until 2009. We then evaluated the ability of the model to classify citations published in 2010 as "relevant" or "irrelevant" using human screening as the gold standard. RESULTS: Classification models did not miss any of the 104, 65, and 179 eligible citations in PDGene, AlzGene, and SzGene, respectively, and missed only 1 of 79 in the CEA Registry (100% sensitivity for the first three and 99% for the fourth). The respective specificities were 90, 93, 90, and 73%. Had the semiautomated system been used in 2010, a human would have needed to read only 605/5,616 citations to update the PDGene registry (11%) and 555/7,298 (8%), 717/5,381 (13%), and 334/1,015 (33%) for the other three databases. CONCLUSION: Data mining methodologies can reduce the burden of updating systematic reviews, without missing more papers than humans.


Assuntos
Mineração de Dados , Revisões Sistemáticas como Assunto , Humanos , Doença de Alzheimer/genética , Análise Custo-Benefício , Mineração de Dados/métodos , Bases de Dados Factuais , Pesquisa Empírica , Metanálise como Assunto , Doença de Parkinson/genética , Publicações Periódicas como Assunto , Esquizofrenia/genética , Software , Avaliação da Tecnologia Biomédica
13.
Methods Mol Biol ; 2345: 17-40, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-34550582

RESUMO

Traditionally, literature identification for systematic reviews has relied on a two-step process: first, searching databases to identify potentially relevant citations, and then manually screening those citations. A number of tools have been developed to streamline and semi-automate this process, including tools to generate terms; to visualize and evaluate search queries; to trace citation linkages; to deduplicate, limit, or translate searches across databases; and to prioritize relevant abstracts for screening. Research is ongoing into tools that can unify searching and screening into a single step, and several protype tools have been developed. As this field grows, it is becoming increasingly important to develop and codify methods for evaluating the extent to which these tools fulfill their purpose.


Assuntos
Bases de Dados Factuais , Automação , Programas de Rastreamento , Publicações , Revisões Sistemáticas como Assunto
14.
Proc Conf Assoc Comput Linguist Meet ; 2022: 341-350, 2022 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37484061

RESUMO

We provide a quantitative and qualitative analysis of self-repetition in the output of neural summarizers. We measure self-repetition as the number of n-grams of length four or longer that appear in multiple outputs of the same system. We analyze the behavior of three popular architectures (BART, T5 and Pegasus), fine-tuned on five datasets. In a regression analysis, we find that the three architectures have different propensities for repeating content across output summaries for inputs, with BART being particularly prone to self-repetition. Fine-tuning on more abstractive data, and on data featuring formulaic language, is associated with a higher rate of self-repetition. In qualitative analysis we find systems produce artefacts such as ads and disclaimers unrelated to the content being summarized, as well as formulaic phrases common in the fine-tuning domain. Our approach to corpus level analysis of self-repetition may help practitioners clean up training data for summarizers and ultimately support methods for minimizing the amount of self-repetition.

15.
Proc Conf Assoc Comput Linguist Meet ; 2022: 7331-7345, 2022 May.
Artigo em Inglês | MEDLINE | ID: mdl-36404800

RESUMO

Automated simplification models aim to make input texts more readable. Such methods have the potential to make complex information accessible to a wider audience, e.g., providing access to recent medical literature which might otherwise be impenetrable for a lay reader. However, such models risk introducing errors into automatically simplified texts, for instance by inserting statements unsupported by the corresponding original text, or by omitting key information. Providing more readable but inaccurate versions of texts may in many cases be worse than providing no such access at all. The problem of factual accuracy (and the lack thereof) has received heightened attention in the context of summarization models, but the factuality of automatically simplified texts has not been investigated. We introduce a taxonomy of errors that we use to analyze both references drawn from standard simplification datasets and state-of-the-art model outputs. We find that errors often appear in both that are not captured by existing evaluation metrics, motivating a need for research into ensuring the factual accuracy of automated simplification models.

16.
Proc Conf Empir Methods Nat Lang Process ; 2022: 3626-3648, 2022 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37103483

RESUMO

Pretraining multimodal models on Electronic Health Records (EHRs) provides a means of learning representations that can transfer to downstream tasks with minimal supervision. Recent multimodal models induce soft local alignments between image regions and sentences. This is of particular interest in the medical domain, where alignments might highlight regions in an image relevant to specific phenomena described in free-text. While past work has suggested that attention "heatmaps" can be interpreted in this manner, there has been little evaluation of such alignments. We compare alignments from a state-of-the-art multimodal (image and text) model for EHR with human annotations that link image regions to sentences. Our main finding is that the text has an often weak or unintuitive influence on attention; alignments do not consistently reflect basic anatomical information. Moreover, synthetic modifications - such as substituting "left" for "right" - do not substantially influence highlights. Simple techniques such as allowing the model to opt out of attending to the image and few-shot finetuning show promise in terms of their ability to improve alignments with very little or no supervision. We make our code and checkpoints open-source.

17.
Artigo em Inglês | MEDLINE | ID: mdl-35663506

RESUMO

Medical question answering (QA) systems have the potential to answer clinicians' uncertainties about treatment and diagnosis on-demand, informed by the latest evidence. However, despite the significant progress in general QA made by the NLP community, medical QA systems are still not widely used in clinical environments. One likely reason for this is that clinicians may not readily trust QA system outputs, in part because transparency, trustworthiness, and provenance have not been key considerations in the design of such models. In this paper we discuss a set of criteria that, if met, we argue would likely increase the utility of biomedical QA systems, which may in turn lead to adoption of such systems in practice. We assess existing models, tasks, and datasets with respect to these criteria, highlighting shortcomings of previously proposed approaches and pointing toward what might be more usable QA systems.

18.
Proc Conf ; 2021: 4972-4984, 2021 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35663507

RESUMO

We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplification does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline encoder-decoder Transformer models for simplification and propose a novel augmentation to these in which we explicitly penalize the decoder for producing 'jargon' terms; we find that this yields improvements over baselines in terms of readability.

19.
Syst Rev ; 10(1): 16, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33419479

RESUMO

BACKGROUND: The increasingly rapid rate of evidence publication has made it difficult for evidence synthesis-systematic reviews and health guidelines-to be continually kept up to date. One proposed solution for this is the use of automation in health evidence synthesis. Guideline developers are key gatekeepers in the acceptance and use of evidence, and therefore, their opinions on the potential use of automation are crucial. METHODS: The objective of this study was to analyze the attitudes of guideline developers towards the use of automation in health evidence synthesis. The Diffusion of Innovations framework was chosen as an initial analytical framework because it encapsulates some of the core issues which are thought to affect the adoption of new innovations in practice. This well-established theory posits five dimensions which affect the adoption of novel technologies: Relative Advantage, Compatibility, Complexity, Trialability, and Observability. Eighteen interviews were conducted with individuals who were currently working, or had previously worked, in guideline development. After transcription, a multiphase mixed deductive and grounded approach was used to analyze the data. First, transcripts were coded with a deductive approach using Rogers' Diffusion of Innovation as the top-level themes. Second, sub-themes within the framework were identified using a grounded approach. RESULTS: Participants were consistently most concerned with the extent to which an innovation is in line with current values and practices (i.e., Compatibility in the Diffusion of Innovations framework). Participants were also concerned with Relative Advantage and Observability, which were discussed in approximately equal amounts. For the latter, participants expressed a desire for transparency in the methodology of automation software. Participants were noticeably less interested in Complexity and Trialability, which were discussed infrequently. These results were reasonably consistent across all participants. CONCLUSIONS: If machine learning and other automation technologies are to be used more widely and to their full potential in systematic reviews and guideline development, it is crucial to ensure new technologies are in line with current values and practice. It will also be important to maximize the transparency of the methods of these technologies to address the concerns of guideline developers.


Assuntos
Revisões Sistemáticas como Assunto , Automação , Humanos
20.
AMIA Jt Summits Transl Sci Proc ; 2021: 605-614, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34457176

RESUMO

We consider the problem of automatically generating a narrative biomedical evidence summary from multiple trial reports. We evaluate modern neural models for abstractive summarization of relevant article abstracts from systematic reviews previously conducted by members of the Cochrane collaboration, using the authors conclusions section of the review abstract as our target. We enlist medical professionals to evaluate generated summaries, and we find that summarization systems yield consistently fluent and relevant synopses, but these often contain factual inaccuracies. We propose new approaches that capitalize on domain-specific models to inform summarization, e.g., by explicitly demarcating snippets of inputs that convey key findings, and emphasizing the reports of large and high-quality trials. We find that these strategies modestly improve the factual accuracy of generated summaries. Finally, we propose a new method for automatically evaluating the factuality of generated narrative evidence syntheses using models that infer the directionality of reported findings.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA