Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 43
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
J Biomed Inform ; 149: 104578, 2024 01.
Artigo em Inglês | MEDLINE | ID: mdl-38122841

RESUMO

OBJECTIVE: Coreference resolution (CR) is a natural language processing (NLP) task that is concerned with finding all expressions within a single document that refer to the same entity. This makes it crucial in supporting downstream NLP tasks such as summarization, question answering and information extraction. Despite great progress in CR, our experiments have highlighted a substandard performance of the existing open-source CR tools in the clinical domain. We set out to explore some practical solutions to fine-tune their performance on clinical data. METHODS: We first explored the possibility of automatically producing silver standards following the success of such an approach in other clinical NLP tasks. We designed an ensemble approach that leverages multiple models to automatically annotate co-referring mentions. Subsequently, we looked into other ways of incorporating human feedback to improve the performance of an existing neural network approach. We proposed a semi-automatic annotation process to facilitate the manual annotation process. We also compared the effectiveness of active learning relative to random sampling in an effort to further reduce the cost of manual annotation. RESULTS: Our experiments demonstrated that the silver standard approach was ineffective in fine-tuning the CR models. Our results indicated that active learning should also be applied with caution. The semi-automatic annotation approach combined with continued training was found to be well suited for the rapid transfer of CR models under low-resource conditions. The ensemble approach demonstrated a potential to further improve accuracy by leveraging multiple fine-tuned models. CONCLUSION: Overall, we have effectively transferred a general CR model to a clinical domain. Our findings based on extensive experimentation have been summarized into practical suggestions for rapid transferring of CR models across different styles of clinical narratives.


Assuntos
Armazenamento e Recuperação da Informação , Redes Neurais de Computação , Humanos , Processamento de Linguagem Natural , Narração , Pesquisa Empírica
2.
J Mammary Gland Biol Neoplasia ; 28(1): 6, 2023 03 24.
Artigo em Inglês | MEDLINE | ID: mdl-36961631

RESUMO

Mammary cancer is one of the most common neoplasms of dogs, primarily bitches. While studies have been carried out identifying differing risk of mammary neoplasia in different dog breeds, few studies have reported associations between dog breeds and clinical features such as number of neoplastic lesions found in an individual case or the likelihood of lesions being benign or malignant. Such epidemiological studies are essential as a foundation for exploring potential genetic drivers of mammary tumour behaviour. Here, we have examined associations between breed, age and neuter status and the odds of a diagnosis of a mammary epithelial-origin neoplastic lesion (as opposed to any other histopathological diagnosis from a biopsied lesion) as well as the odds of a bitch presenting with either a single mammary lesion or multiple lesions, and the odds that those lesions are benign or malignant. The study population consisted of 129,258 samples from bitches, including 13,401 mammary epithelial neoplasms, submitted for histological assessment to a single histopathology laboratory between 2008 and 2021.In multivariable analysis, breed, age and neuter status were all significantly associated with the odds of a diagnosis of a mammary epithelial-origin neoplastic lesion. Smaller breeds were more likely to receive such a diagnosis. In cases diagnosed with a mammary epithelial neoplasm, these three factors were also significantly associated with the odds of diagnosis with a malignant lesion and of diagnosis with multiple lesions. Notably, while neutered animals were less likely to have a mammary epithelial neoplasm diagnosed, and were less likely to have multiple neoplasms, they were more likely to have malignant disease. Exploration of the patterns of risk of developing malignant disease, or multiple lesions, across individual breeds showed no breed with increased odds of both outcomes. Breeds with altered odds compared to the Crossbreed baseline were either at increased risk of malignant disease and decreased risk of multiple lesions, or vice versa, or they were at significantly altered odds of one outcome with no change in the other outcome. Our analysis supports the hypothesis that age, neuter status and intrinsic biological and genetic factors all combine to influence the biological heterogeneity of canine mammary neoplasia.


Assuntos
Neoplasias da Mama , Carcinoma , Neoplasias Mamárias Animais , Feminino , Cães , Humanos , Animais , Neoplasias Mamárias Animais/diagnóstico , Neoplasias Mamárias Animais/epidemiologia , Neoplasias Mamárias Animais/patologia , Carcinoma/patologia , Estudos Epidemiológicos , Cruzamento
3.
Bioinformatics ; 38(11): 3136-3138, 2022 May 26.
Artigo em Inglês | MEDLINE | ID: mdl-35482480

RESUMO

MOTIVATION: Global acronyms are used in written text without their formal definitions. This makes it difficult to automatically interpret their sense as acronyms tend to be ambiguous. Supervised machine learning approaches to sense disambiguation require large training datasets. In clinical applications, large datasets are difficult to obtain due to patient privacy. Manual data annotation creates an additional bottleneck. RESULTS: We proposed an approach to automatically modifying scientific abstracts to (i) simulate global acronym usage and (ii) annotate their senses without the need for external sources or manual intervention. We implemented it as a web-based application, which can create large datasets that in turn can be used to train supervised approaches to word sense disambiguation of biomedical acronyms. AVAILABILITY AND IMPLEMENTATION: The datasets will be generated on demand based on a user query and will be downloadable from https://datainnovation.cardiff.ac.uk/acronyms/.

4.
BMC Musculoskelet Disord ; 21(1): 66, 2020 Feb 03.
Artigo em Inglês | MEDLINE | ID: mdl-32013997

RESUMO

BACKGROUND: Referral letters from primary care contain a large amount of information that could be used to improve the appropriateness of the referral pathway for individuals seeking specialist opinion for knee or hip pain. The primary aim of this study was to evaluate the content of the referral letters to identify information that can independently predict an optimal care pathway. METHODS: Using a prospective longitudinal design, a convenience sample of patients with hip or knee pain were recruited from orthopaedic, specialist general practice and advanced physiotherapy practitioner clinics. Individuals completed a Knee or hip Osteoarthritis Outcome Score at initial consultation and after 6 months. Participant demographics, body mass index, medication and co-morbidity data were extracted from the referral letters. Free text of the referral letters was mapped automatically onto the Unified Medical Language System to identify relevant clinical variables. Treatment outcomes were extracted from the consultation letters. Each outcome was classified as being an optimal or sub-optimal pathway, where an optimal pathway was defined as the one that results in the right treatment at the right time. Logistic regression was used to identify variables that were independently associated with an optimal pathway. RESULTS: A total of 643 participants were recruited, 419 (66.7%) were classified as having an optimal pathway. Variables independently associated with having an optimal care pathway were lower body mass index (OR 1.0, 95% CI 0.9 to 1.0 p = 0.004), named disease or syndromes (OR 1.8, 95% CI 1.1 to 2.8, p = 0.02) and taking pharmacologic substances (OR 1.8, 95% CI 1.0 to 3.3, p = 0.02). Having a single diagnostic procedure was associated with a suboptimal pathway (OR 0.5, 95% CI 0.3 to 0.9 p < 0.001). Neither Knee nor Hip Osteoarthritis Outcome scores were associated with an optimal pathway. Body mass index was found to be a good predictor of patient rated function (coefficient - 0.8, 95% CI -1.1, - 0.4 p < 0.001). CONCLUSION: Over 30% of patients followed sub-optimal care pathway, which represents potential inefficiency and wasted healthcare resource. A core data set including body mass index should be considered as this was a predictor of optimal care and patient rated pain and function.


Assuntos
Artralgia/terapia , Acessibilidade aos Serviços de Saúde/organização & administração , Osteoartrite do Quadril/terapia , Osteoartrite do Joelho/terapia , Encaminhamento e Consulta/estatística & dados numéricos , Adulto , Idoso , Artralgia/diagnóstico , Artralgia/etiologia , Índice de Massa Corporal , Procedimentos Clínicos/organização & administração , Conjuntos de Dados como Assunto , Feminino , Seguimentos , Clínicos Gerais/estatística & dados numéricos , Humanos , Estudos Longitudinais , Masculino , Pessoa de Meia-Idade , Osteoartrite do Quadril/complicações , Osteoartrite do Quadril/diagnóstico , Osteoartrite do Joelho/complicações , Osteoartrite do Joelho/diagnóstico , Medição da Dor , Especialidade de Fisioterapia/organização & administração , Especialidade de Fisioterapia/estatística & dados numéricos , Atenção Primária à Saúde/estatística & dados numéricos , Estudos Prospectivos , Resultado do Tratamento
5.
BMC Musculoskelet Disord ; 18(1): 471, 2017 Nov 21.
Artigo em Inglês | MEDLINE | ID: mdl-29162071

RESUMO

BACKGROUND: Physiotherapy rehabilitation following surgical reconstruction to the Anterior Cruciate Ligament (ACL) can take up to 12 months to complete. Given the lengthy rehabilitation process, a blended intervention can be used to compliment face-to-face physiotherapy with a digital exercise intervention. In this study, we used TRAK, a web-based tool that has been developed to support knee rehabilitation, which provides individually tailored exercise programs with videos, instructions and progress logs for each exercise, relevant health information and a contact option that allows a patient to email a physiotherapist for additional support. The aim of this study was to evaluate the acceptability of TRAK-based blended intervention in post ACL reconstruction rehabilitation. METHODS: A qualitative research design using semi-structured interviews was used on a convenience sample of participants following an ACL reconstruction, and their treating physiotherapists, in a London NHS hospital. Participants were asked to use TRAK alongside face-to-face physiotherapy for 16 weeks. Interviews were carried out, audio recorded, transcribed verbatim and coded by two researchers independently. Data were analyzed using thematic analysis. RESULTS: Of the 25 individuals that were approached to be part of the study, 24 consented, comprising 8 females and 16 males, mean age 30 years. 17 individuals used TRAK for 16 weeks and were available for interview. Four physiotherapists were also interviewed. The six main themes identified from patients were: the experience of TRAK rehabilitation, personal characteristics for engagement, strengths and weaknesses of the intervention, TRAK in the future and attitudes to digital healthcare. The main themes from the physiotherapist interviews were: potential benefits, availability of resources and service organization to support use of TRAK. CONCLUSIONS: TRAK was found to be an acceptable method of delivering ACL rehabilitation alongside face-to-face physiotherapy. Patients reported that TRAK, specifically the videos, increased their confidence and motivation with their rehabilitation. They identified ways in which TRAK could be developed in the future to meet technological expectations and further support rehabilitation. For Physiotherapists time and availability of computers affected acceptability. Organization of care to support integration of digital exercise interventions such as TRAK into a blended approach to rehabilitation is required.


Assuntos
Lesões do Ligamento Cruzado Anterior/cirurgia , Reconstrução do Ligamento Cruzado Anterior , Terapia por Exercício/métodos , Aceitação pelo Paciente de Cuidados de Saúde , Telerreabilitação/métodos , Adulto , Ligamento Cruzado Anterior/fisiologia , Ligamento Cruzado Anterior/cirurgia , Feminino , Humanos , Articulação do Joelho/fisiologia , Articulação do Joelho/cirurgia , Masculino , Fisioterapeutas , Avaliação de Programas e Projetos de Saúde , Pesquisa Qualitativa , Amplitude de Movimento Articular , Fatores de Tempo
6.
BMC Public Health ; 14: 21, 2014 Jan 10.
Artigo em Inglês | MEDLINE | ID: mdl-24405575

RESUMO

BACKGROUND: Alcohol-related violence in and in the vicinity of licensed premises continues to place a considerable burden on the United Kingdom's (UK) health services. Robust interventions targeted at licensed premises are therefore required to reduce the costs of alcohol-related harm. Previous evaluations of interventions in licensed premises have a number of methodological limitations and none have been conducted in the UK. The aim of the trial was to determine the effectiveness of the Safety Management in Licensed Environments intervention designed to reduce alcohol-related violence in licensed premises, delivered by Environmental Health Officers, under their statutory authority to intervene in cases of violence in the workplace. METHODS/DESIGN: A national randomised controlled trial, with licensed premises as the unit of allocation. Premises were identified from all 22 Local Authorities in Wales. Eligible premises were those with identifiable violent incidents on premises, using police recorded violence data. Premises were allocated to intervention or control by optimally balancing by Environmental Health Officer capacity in each Local Authority, number of violent incidents in the 12 months leading up to the start of the project and opening hours. The primary outcome measure is the difference in frequency of violence between intervention and control premises over a 12 month follow-up period, based on a recurrent event model. The trial incorporates an embedded process evaluation to assess intervention implementation, fidelity, reach and reception, and to interpret outcome effects, as well as investigate its economic impact. DISCUSSION: The results of the trial will be applicable to all statutory authorities directly involved with managing violence in the night time economy and will provide the first formal test of Health and Safety policy in this environment. If successful, opportunities for replication and generalisation will be considered. TRIAL REGISTRATION: UKCRN 14077; ISRCTN78924818.


Assuntos
Consumo de Bebidas Alcoólicas/efeitos adversos , Promoção da Saúde , Violência/prevenção & controle , Consumo de Bebidas Alcoólicas/psicologia , Humanos , Licenciamento , Polícia , Restaurantes/legislação & jurisprudência , País de Gales
7.
Front Digit Health ; 6: 1282043, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38482049

RESUMO

Clinical narratives commonly use acronyms without explicitly defining their long forms. This makes it difficult to automatically interpret their sense as acronyms tend to be highly ambiguous. Supervised learning approaches to their disambiguation in the clinical domain are hindered by issues associated with patient privacy and manual annotation, which limit the size and diversity of training data. In this study, we demonstrate how scientific abstracts can be utilised to overcome these issues by creating a large automatically annotated dataset of artificially simulated global acronyms. A neural network trained on such a dataset achieved the F1-score of 95% on disambiguation of acronym mentions in scientific abstracts. This network was integrated with multi-word term recognition to extract a sense inventory of acronyms from a corpus of clinical narratives on the fly. Acronym sense extraction achieved the F1-score of 74% on a corpus of radiology reports. In clinical practice, the suggested approach can be used to facilitate development of institution-specific inventories.

8.
J Biomed Inform ; 46(4): 615-25, 2013 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-23665300

RESUMO

In this paper we discuss the design and development of TRAK (Taxonomy for RehAbilitation of Knee conditions), an ontology that formally models information relevant for the rehabilitation of knee conditions. TRAK provides the framework that can be used to collect coded data in sufficient detail to support epidemiologic studies so that the most effective treatment components can be identified, new interventions developed and the quality of future randomized control trials improved to incorporate a control intervention that is well defined and reflects clinical practice. TRAK follows design principles recommended by the Open Biomedical Ontologies (OBO) Foundry. TRAK uses the Basic Formal Ontology (BFO) as the upper-level ontology and refers to other relevant ontologies such as Information Artifact Ontology (IAO), Ontology for General Medical Science (OGMS) and Phenotype And Trait Ontology (PATO). TRAK is orthogonal to other bio-ontologies and represents domain-specific knowledge about treatments and modalities used in rehabilitation of knee conditions. Definitions of typical exercises used as treatment modalities are supported with appropriate illustrations, which can be viewed in the OBO-Edit ontology editor. The vast majority of other classes in TRAK are cross-referenced to the Unified Medical Language System (UMLS) to facilitate future integration with other terminological sources. TRAK is implemented in OBO, a format widely used by the OBO community. TRAK is available for download from http://www.cs.cf.ac.uk/trak. In addition, its public release can be accessed through BioPortal, where it can be browsed, searched and visualized.


Assuntos
Traumatismos do Joelho/reabilitação , Vocabulário Controlado , Humanos
9.
Bioinformatics ; 26(7): 932-8, 2010 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-20176582

RESUMO

MOTIVATION: Research in systems biology is carried out through a combination of experiments and models. Several data standards have been adopted for representing models (Systems Biology Markup Language) and various types of relevant experimental data (such as FuGE and those of the Proteomics Standards Initiative). However, until now, there has been no standard way to associate a model and its entities to the corresponding datasets, or vice versa. Such a standard would provide a means to represent computational simulation results as well as to frame experimental data in the context of a particular model. Target applications include model-driven data analysis, parameter estimation, and sharing and archiving model simulations. RESULTS: We propose the Systems Biology Results Markup Language (SBRML), an XML-based language that associates a model with several datasets. Each dataset is represented as a series of values associated with model variables, and their corresponding parameter values. SBRML provides a flexible way of indexing the results to model parameter values, which supports both spreadsheet-like data and multidimensional data cubes. We present and discuss several examples of SBRML usage in applications such as enzyme kinetics, microarray gene expression and various types of simulation results. AVAILABILITY AND IMPLEMENTATION: The XML Schema file for SBRML is available at http://www.comp-sys-bio.org/SBRML under the Academic Free License (AFL) v3.0.


Assuntos
Software , Biologia de Sistemas/métodos , Biologia Computacional/métodos , Bases de Dados Factuais , Análise de Sequência com Séries de Oligonucleotídeos
10.
Artif Intell Med ; 119: 102138, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34531007

RESUMO

Aspect-based sentiment analysis is a natural language processing task whose aim is to automatically classify the sentiment associated with a specific aspect of a written text. In this study, we propose a novel model for aspect-based sentiment analysis, which exploits the dependency parse tree of a sentence using graph convolution to classify the sentiment of a given aspect. To evaluate this model in the domain of health and well-being, where this task is biased toward negative sentiment, we used a corpus of drug reviews. Specific aspects were grounded in the Unified Medical Language System, a large repository of inter-related biomedical concepts and the corresponding terminology. Our experiments demonstrated that graph convolution approach outperforms standard deep learning architectures on the task of aspect-based sentiment analysis. Moreover, graph convolution over dependency parse trees (F-score of 0.8179) outperforms the same approach over a flat sequence representation of sentences (F-score of 0.7332). These results bring the performance of sentiment analysis in health and well-being in line with the state of the art in other domains.


Assuntos
Processamento de Linguagem Natural , Unified Medical Language System , Idioma
11.
BMJ Open Sport Exerc Med ; 7(2): e001002, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34035951

RESUMO

OBJECTIVES: To evaluate the feasibility of trialling taxonomy for the rehabilitation of knee conditions-ACL (TRAK-ACL), a digital health intervention that provides health information, personalised exercise plans and remote clinical support combined with treatment as usual (TAU), for people following ACL reconstruction. METHODS: The study design was a two-arm parallel randomised controlled trial (RCT). Eligible participants were English-speaking adults who had undergone ACL reconstruction within the last 12 weeks, had access to the internet and could provide informed consent. Recruitment took place at three sites in the UK. TRAK-ACL intervention was an interactive website informed by behaviour change technique combined with TAU. The comparator was TAU. Outcomes were: recruitment and retention; completeness of outcome measures at follow-up; fidelity of intervention delivery and engagement with the intervention. Individuals were randomised using a computer-generated random number sequence. Blinded assessors allocated groups and collected outcome measures. RESULTS: Fifty-nine people were assessed for eligibility at two of the participating sites, and 51 were randomised; 26 were allocated to TRAK-ACL and 25 to TAU. Follow-up data were collected on 44 and 40 participants at 3 and 6 months, respectively. All outcome measures were completed fully at 6 months except the Client Service Receipt Inventory. Two patients in each arm did not receive the treatment they were randomised to. Engagement with TRAK-ACL intervention was a median of 5 logins (IQR 3-13 logins), over 18 weeks (SD 12.2 weeks). CONCLUSION: TRAK-ACL would be suitable for evaluation of effectiveness in a fully powered RCT.

12.
JMIR Med Inform ; 9(12): e28632, 2021 Dec 24.
Artigo em Inglês | MEDLINE | ID: mdl-34951601

RESUMO

BACKGROUND: Pharmacovigilance and safety reporting, which involve processes for monitoring the use of medicines in clinical trials, play a critical role in the identification of previously unrecognized adverse events or changes in the patterns of adverse events. OBJECTIVE: This study aims to demonstrate the feasibility of automating the coding of adverse events described in the narrative section of the serious adverse event report forms to enable statistical analysis of the aforementioned patterns. METHODS: We used the Unified Medical Language System (UMLS) as the coding scheme, which integrates 217 source vocabularies, thus enabling coding against other relevant terminologies such as the International Classification of Diseases-10th Revision, Medical Dictionary for Regulatory Activities, and Systematized Nomenclature of Medicine). We used MetaMap, a highly configurable dictionary lookup software, to identify the mentions of the UMLS concepts. We trained a binary classifier using Bidirectional Encoder Representations from Transformers (BERT), a transformer-based language model that captures contextual relationships, to differentiate between mentions of the UMLS concepts that represented adverse events and those that did not. RESULTS: The model achieved a high F1 score of 0.8080, despite the class imbalance. This is 10.15 percent points lower than human-like performance but also 17.45 percent points higher than that of the baseline approach. CONCLUSIONS: These results confirmed that automated coding of adverse events described in the narrative section of serious adverse event reports is feasible. Once coded, adverse events can be statistically analyzed so that any correlations with the trialed medicines can be estimated in a timely fashion.

13.
BMC Bioinformatics ; 11: 582, 2010 Nov 29.
Artigo em Inglês | MEDLINE | ID: mdl-21114840

RESUMO

BACKGROUND: The behaviour of biological systems can be deduced from their mathematical models. However, multiple sources of data in diverse forms are required in the construction of a model in order to define its components and their biochemical reactions, and corresponding parameters. Automating the assembly and use of systems biology models is dependent upon data integration processes involving the interoperation of data and analytical resources. RESULTS: Taverna workflows have been developed for the automated assembly of quantitative parameterised metabolic networks in the Systems Biology Markup Language (SBML). A SBML model is built in a systematic fashion by the workflows which starts with the construction of a qualitative network using data from a MIRIAM-compliant genome-scale model of yeast metabolism. This is followed by parameterisation of the SBML model with experimental data from two repositories, the SABIO-RK enzyme kinetics database and a database of quantitative experimental results. The models are then calibrated and simulated in workflows that call out to COPASIWS, the web service interface to the COPASI software application for analysing biochemical networks. These systems biology workflows were evaluated for their ability to construct a parameterised model of yeast glycolysis. CONCLUSIONS: Distributed information about metabolic reactions that have been described to MIRIAM standards enables the automated assembly of quantitative systems biology models of metabolic networks based on user-defined criteria. Such data integration processes can be implemented as Taverna workflows to provide a rapid overview of the components and their relationships within a biochemical system.


Assuntos
Redes e Vias Metabólicas , Biologia de Sistemas/métodos , Bases de Dados Factuais , Modelos Biológicos
14.
Bioinformatics ; 25(11): 1404-11, 2009 Jun 01.
Artigo em Inglês | MEDLINE | ID: mdl-19336445

RESUMO

MOTIVATION: Most experimental evidence on kinetic parameters is buried in the literature, whose manual searching is complex, time consuming and partial. These shortcomings become particularly acute in systems biology, where these parameters need to be integrated into detailed, genome-scale, metabolic models. These problems are addressed by KiPar, a dedicated information retrieval system designed to facilitate access to the literature relevant for kinetic modelling of a given metabolic pathway in yeast. Searching for kinetic data in the context of an individual pathway offers modularity as a way of tackling the complexity of developing a full metabolic model. It is also suitable for large-scale mining, since multiple reactions and their kinetic parameters can be specified in a single search request, rather than one reaction at a time, which is unsuitable given the size of genome-scale models. RESULTS: We developed an integrative approach, combining public data and software resources for the rapid development of large-scale text mining tools targeting complex biological information. The user supplies input in the form of identifiers used in relevant data resources to refer to the concepts of interest, e.g. EC numbers, GO and SBO identifiers. By doing so, the user is freed from providing any other knowledge or terminology concerned with these concepts and their relations, since they are retrieved from these and cross-referenced resources automatically. The terminology acquired is used to index the literature by mapping concepts to their synonyms, and then to textual documents mentioning them. The indexing results and the previously acquired knowledge about relations between concepts are used to formulate complex search queries aiming at documents relevant to the user's information needs. The conceptual approach is demonstrated in the implementation of KiPar. Evaluation reveals that KiPar performs better than a Boolean search. The precision achieved for abstracts (60%) and full-text articles (48%) is considerably better than the baseline precision (44% and 24%, respectively). The baseline recall is improved by 36% for abstracts and by 100% for full text. It appears that full-text articles are a much richer source of information on kinetic data than are their abstracts. Finally, the combined results for abstracts and full text compared with the curated literature provide high values for relative recall (88%) and novelty ratio (92%), suggesting that the system is able to retrieve a high proportion of new documents. AVAILABILITY: Source code and documentation are available at: (http://www.mcisb.org/resources/kipar/).


Assuntos
Biologia Computacional/métodos , Sistemas de Informação , Saccharomyces cerevisiae/metabolismo , Software , Sistemas de Informação/normas , Redes e Vias Metabólicas , Biologia de Sistemas
15.
JMIR Med Inform ; 8(11): e21252, 2020 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-33155985

RESUMO

BACKGROUND: Musculoskeletal conditions are managed within primary care, but patients can be referred to secondary care if a specialist opinion is required. The ever-increasing demand for health care resources emphasizes the need to streamline care pathways with the ultimate aim of ensuring that patients receive timely and optimal care. Information contained in referral letters underpins the referral decision-making process but is yet to be explored systematically for the purposes of treatment prioritization for musculoskeletal conditions. OBJECTIVE: This study aims to explore the feasibility of using natural language processing and machine learning to automate the triage of patients with musculoskeletal conditions by analyzing information from referral letters. Specifically, we aim to determine whether referral letters can be automatically assorted into latent topics that are clinically relevant, that is, considered relevant when prescribing treatments. Here, clinical relevance is assessed by posing 2 research questions. Can latent topics be used to automatically predict treatment? Can clinicians interpret latent topics as cohorts of patients who share common characteristics or experiences such as medical history, demographics, and possible treatments? METHODS: We used latent Dirichlet allocation to model each referral letter as a finite mixture over an underlying set of topics and model each topic as an infinite mixture over an underlying set of topic probabilities. The topic model was evaluated in the context of automating patient triage. Given a set of treatment outcomes, a binary classifier was trained for each outcome using previously extracted topics as the input features of the machine learning algorithm. In addition, a qualitative evaluation was performed to assess the human interpretability of topics. RESULTS: The prediction accuracy of binary classifiers outperformed the stratified random classifier by a large margin, indicating that topic modeling could be used to predict the treatment, thus effectively supporting patient triage. The qualitative evaluation confirmed the high clinical interpretability of the topic model. CONCLUSIONS: The results established the feasibility of using natural language processing and machine learning to automate triage of patients with knee or hip pain by analyzing information from their referral letters.

16.
JMIR Med Inform ; 8(3): e17984, 2020 Mar 31.
Artigo em Inglês | MEDLINE | ID: mdl-32229465

RESUMO

BACKGROUND: Clinical narratives represent the main form of communication within health care, providing a personalized account of patient history and assessments, and offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. OBJECTIVE: The main aim of this study was to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigated the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. METHODS: Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified 110 relevant studies and extracted information about text data used to support machine learning, NLP tasks supported, and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation, and any relevant statistics. RESULTS: The majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents, with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing the predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free-text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable because of the sensitive nature of data considered. Besides the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The majority of studies focused on text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management, and surveillance. CONCLUSIONS: We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which do not require data annotation.

17.
JMIR Med Inform ; 8(1): e16023, 2020 Jan 28.
Artigo em Inglês | MEDLINE | ID: mdl-32012057

RESUMO

BACKGROUND: Sentiment analysis (SA) is a subfield of natural language processing whose aim is to automatically classify the sentiment expressed in a free text. It has found practical applications across a wide range of societal contexts including marketing, economy, and politics. This review focuses specifically on applications related to health, which is defined as "a state of complete physical, mental, and social well-being and not merely the absence of disease or infirmity." OBJECTIVE: This study aimed to establish the state of the art in SA related to health and well-being by conducting a systematic review of the recent literature. To capture the perspective of those individuals whose health and well-being are affected, we focused specifically on spontaneously generated content and not necessarily that of health care professionals. METHODS: Our methodology is based on the guidelines for performing systematic reviews. In January 2019, we used PubMed, a multifaceted interface, to perform a literature search against MEDLINE. We identified a total of 86 relevant studies and extracted data about the datasets analyzed, discourse topics, data creators, downstream applications, algorithms used, and their evaluation. RESULTS: The majority of data were collected from social networking and Web-based retailing platforms. The primary purpose of online conversations is to exchange information and provide social support online. These communities tend to form around health conditions with high severity and chronicity rates. Different treatments and services discussed include medications, vaccination, surgery, orthodontic services, individual physicians, and health care services in general. We identified 5 roles with respect to health and well-being among the authors of the types of spontaneously generated narratives considered in this review: a sufferer, an addict, a patient, a carer, and a suicide victim. Out of 86 studies considered, only 4 reported the demographic characteristics. A wide range of methods were used to perform SA. Most common choices included support vector machines, naïve Bayesian learning, decision trees, logistic regression, and adaptive boosting. In contrast with general trends in SA research, only 1 study used deep learning. The performance lags behind the state of the art achieved in other domains when measured by F-score, which was found to be below 60% on average. In the context of SA, the domain of health and well-being was found to be resource poor: few domain-specific corpora and lexica are shared publicly for research purposes. CONCLUSIONS: SA results in the area of health and well-being lag behind those in other domains. It is yet unclear if this is because of the intrinsic differences between the domains and their respective sublanguages, the size of training datasets, the lack of domain-specific sentiment lexica, or the choice of algorithms.

18.
J Am Med Inform Assoc ; 16(4): 596-600, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19390098

RESUMO

OBJECTIVE The authors present a system developed for the Challenge in Natural Language Processing for Clinical Data-the i2b2 obesity challenge, whose aim was to automatically identify the status of obesity and 15 related co-morbidities in patients using their clinical discharge summaries. The challenge consisted of two tasks, textual and intuitive. The textual task was to identify explicit references to the diseases, whereas the intuitive task focused on the prediction of the disease status when the evidence was not explicitly asserted. DESIGN The authors assembled a set of resources to lexically and semantically profile the diseases and their associated symptoms, treatments, etc. These features were explored in a hybrid text mining approach, which combined dictionary look-up, rule-based, and machine-learning methods. MEASUREMENTS The methods were applied on a set of 507 previously unseen discharge summaries, and the predictions were evaluated against a manually prepared gold standard. The overall ranking of the participating teams was primarily based on the macro-averaged F-measure. RESULTS The implemented method achieved the macro-averaged F-measure of 81% for the textual task (which was the highest achieved in the challenge) and 63% for the intuitive task (ranked 7(th) out of 28 teams-the highest was 66%). The micro-averaged F-measure showed an average accuracy of 97% for textual and 96% for intuitive annotations. CONCLUSIONS The performance achieved was in line with the agreement between human annotators, indicating the potential of text mining for accurate and efficient prediction of disease statuses from clinical discharge summaries.


Assuntos
Armazenamento e Recuperação da Informação/métodos , Sistemas Computadorizados de Registros Médicos , Processamento de Linguagem Natural , Obesidade , Alta do Paciente , Comorbidade , Humanos , Software
19.
JMIR Med Inform ; 7(4): e15980, 2019 Oct 31.
Artigo em Inglês | MEDLINE | ID: mdl-31674914

RESUMO

BACKGROUND: Clinical trials are an important step in introducing new interventions into clinical practice by generating data on their safety and efficacy. Clinical trials need to ensure that participants are similar so that the findings can be attributed to the interventions studied and not to some other factors. Therefore, each clinical trial defines eligibility criteria, which describe characteristics that must be shared by the participants. Unfortunately, the complexities of eligibility criteria may not allow them to be translated directly into readily executable database queries. Instead, they may require careful analysis of the narrative sections of medical records. Manual screening of medical records is time consuming, thus negatively affecting the timeliness of the recruitment process. OBJECTIVE: Track 1 of the 2018 National Natural Language Processing Clinical Challenge focused on the task of cohort selection for clinical trials, aiming to answer the following question: Can natural language processing be applied to narrative medical records to identify patients who meet eligibility criteria for clinical trials? The task required the participating systems to analyze longitudinal patient records to determine if the corresponding patients met the given eligibility criteria. We aimed to describe a system developed to address this task. METHODS: Our system consisted of 13 classifiers, one for each eligibility criterion. All classifiers used a bag-of-words document representation model. To prevent the loss of relevant contextual information associated with such representation, a pattern-matching approach was used to extract context-sensitive features. They were embedded back into the text as lexically distinguishable tokens, which were consequently featured in the bag-of-words representation. Supervised machine learning was chosen wherever a sufficient number of both positive and negative instances was available to learn from. A rule-based approach focusing on a small set of relevant features was chosen for the remaining criteria. RESULTS: The system was evaluated using microaveraged F measure. Overall, 4 machine algorithms, including support vector machine, logistic regression, naïve Bayesian classifier, and gradient tree boosting (GTB), were evaluated on the training data using 10-fold cross-validation. Overall, GTB demonstrated the most consistent performance. Its performance peaked when oversampling was used to balance the training data. The final evaluation was performed on previously unseen test data. On average, the F measure of 89.04% was comparable to 3 of the top ranked performances in the shared task (91.11%, 90.28%, and 90.21%). With an F measure of 88.14%, we significantly outperformed these systems (81.03%, 78.50%, and 70.81%) in identifying patients with advanced coronary artery disease. CONCLUSIONS: The holdout evaluation provides evidence that our system was able to identify eligible patients for the given clinical trial with high accuracy. Our approach demonstrates how rule-based knowledge infusion can improve the performance of machine learning algorithms even when trained on a relatively small dataset.

20.
J Biomed Semantics ; 10(Suppl 1): 24, 2019 11 12.
Artigo em Inglês | MEDLINE | ID: mdl-31711536

RESUMO

BACKGROUND: Knee injury and Osteoarthritis Outcome Score (KOOS) is an instrument used to quantify patients' perceptions about their knee condition and associated problems. It is administered as a 42-item closed-ended questionnaire in which patients are asked to self-assess five outcomes: pain, other symptoms, activities of daily living, sport and recreation activities, and quality of life. We developed KLOG as a 10-item open-ended version of the KOOS questionnaire in an attempt to obtain deeper insight into patients' opinions including their unmet needs. However, the open-ended nature of the questionnaire incurs analytical overhead associated with the interpretation of responses. The goal of this study was to automate such analysis. We implemented KLOSURE as a system for mining free-text responses to the KLOG questionnaire. It consists of two subsystems, one concerned with feature extraction and the other one concerned with classification of feature vectors. Feature extraction is performed by a set of four modules whose main functionalities are linguistic pre-processing, sentiment analysis, named entity recognition and lexicon lookup respectively. Outputs produced by each module are combined into feature vectors. The structure of feature vectors will vary across the KLOG questions. Finally, Weka, a machine learning workbench, was used for classification of feature vectors. RESULTS: The precision of the system varied between 62.8 and 95.3%, whereas the recall varied from 58.3 to 87.6% across the 10 questions. The overall performance in terms of F-measure varied between 59.0 and 91.3% with an average of 74.4% and a standard deviation of 8.8. CONCLUSIONS: We demonstrated the feasibility of mining open-ended patient questionnaires. By automatically mapping free text answers onto a Likert scale, we can effectively measure the progress of rehabilitation over time. In comparison to traditional closed-ended questionnaires, our approach offers much richer information that can be utilised to support clinical decision making. In conclusion, we demonstrated how text mining can be used to combine the benefits of qualitative and quantitative analysis of patient experiences.


Assuntos
Mineração de Dados , Inquéritos e Questionários , Atividades Cotidianas , Humanos , Traumatismos do Joelho/psicologia , Osteoartrite/psicologia , Qualidade de Vida
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA