RESUMO
Type 2 diabetes mellitus (T2DM) often results in high morbidity and mortality. In addition, T2DM presents a substantial financial burden for individuals and their families, health systems, and societies. According to studies and reports, globally, the incidence and prevalence of T2DM are increasing rapidly. Several models have been built to predict T2DM onset in the future or detect undiagnosed T2DM in patients. Additional to the performance of such models, their interpretability is crucial for health experts, especially in personalized clinical prediction models. Data collected over 42 months from health check-up examinations and prescribed drugs data repositories of four primary healthcare providers were used in this study. We propose a framework consisting of LogicRegression based feature extraction and Least Absolute Shrinkage and Selection operator based prediction modeling for undiagnosed T2DM prediction. Performance of the models was measured using Area under the ROC curve (AUC) with corresponding confidence intervals. Results show that using LogicRegression based feature extraction resulted in simpler models, which are easier for healthcare experts to interpret, especially in cases with many binary features. Models developed using the proposed framework resulted in an AUC of 0.818 (95% Confidence Interval (CI): 0.812-0.823) that was comparable to more complex models (i.e., models with a larger number of features), where all features were included in prediction model development with the AUC of 0.816 (95% CI: 0.810-0.822). However, the difference in the number of used features was significant. This study proposes a framework for building interpretable models in healthcare that can contribute to higher trust in prediction models from healthcare experts.
RESUMO
BACKGROUND: Implementation research has delved into barriers to implementing change and interventions for the implementation of innovation in practice. There remains a gap, however, that fails to connect implementation barriers to the most effective implementation strategies and provide a more tailored approach during implementation. This study aimed to explore barriers for the implementation of professional services in community pharmacies and to predict the effectiveness of facilitation strategies to overcome implementation barriers using machine learning techniques. METHODS: Six change facilitators facilitated a 2-year change programme aimed at implementing professional services across community pharmacies in Australia. A mixed methods approach was used where barriers were identified by change facilitators during the implementation study. Change facilitators trialled and recorded tailored facilitation strategies delivered to overcome identified barriers. Barriers were coded according to implementation factors derived from the Consolidated Framework for Implementation Research and the Theoretical Domains Framework. Tailored facilitation strategies were coded into 16 facilitation categories. To predict the effectiveness of these strategies, data mining with random forest was used to provide the highest level of accuracy. A predictive resolution percentage was established for each implementation strategy in relation to the barriers that were resolved by that particular strategy. RESULTS: During the 2-year programme, 1131 barriers and facilitation strategies were recorded by change facilitators. The most frequently identified barriers were a 'lack of ability to plan for change', 'lack of internal supporters for the change', 'lack of knowledge and experience', 'lack of monitoring and feedback', 'lack of individual alignment with the change', 'undefined change objectives', 'lack of objective feedback' and 'lack of time'. The random forest algorithm used was able to provide 96.9% prediction accuracy. The strategy category with the highest predicted resolution rate across the most number of implementation barriers was 'to empower stakeholders to develop objectives and solve problems'. CONCLUSIONS: Results from this study have provided a better understanding of implementation barriers in community pharmacy and how data-driven approaches can be used to predict the effectiveness of facilitation strategies to overcome implementation barriers. Tailored facilitation strategies such as these can increase the rate of real-time implementation of innovations in healthcare, leading to an industry that can confidently and efficiently adapt to continuous change.
Assuntos
Farmácias , Austrália , Atenção à Saúde , Instalações de Saúde , Humanos , FarmacêuticosRESUMO
BACKGROUND: Multimorbidity presents an increasingly common problem in older population, and is tightly related to polypharmacy, i.e., concurrent use of multiple medications by one individual. Detecting polypharmacy from drug prescription records is not only related to multimorbidity, but can also point at incorrect use of medicines. In this work, we build models for predicting polypharmacy from drug prescription records for newly diagnosed chronic patients. We evaluate the models' performance with a strong focus on interpretability of the results. METHODS: A centrally collected nationwide dataset of prescription records was used to perform electronic phenotyping of patients for the following two chronic conditions: type 2 diabetes mellitus (T2D) and cardiovascular disease (CVD). In addition, a hospital discharge dataset was linked to the prescription records. A regularized regression model was built for 11 different experimental scenarios on two datasets, and complexity of the model was controlled with a maximum number of dimensions (MND) parameter. Performance and interpretability of the model were evaluated with AUC, AUPRC, calibration plots, and interpretation by a medical doctor. RESULTS: For the CVD model, AUC and AUPRC values of 0.900 (95% [0.898-0.901]) and 0.640 (0.635-0.645) were reached, respectively, while for the T2D model the values were 0.808 (0.803-0.812) and 0.732 (0.725-0.739). Reducing complexity of the model by 65% and 48% for CVD and T2D, resulted in 3% and 4% lower AUC, and 4% and 5% lower AUPRC values, respectively. Calibration plots for our models showed that we can achieve moderate calibration with reducing the models' complexity without significant loss of predictive performance. DISCUSSION: In this study, we found that it is possible to use drug prescription data to build a model for polypharmacy prediction in older population. In addition, the study showed that it is possible to find a balance between good performance and interpretability of the model, and achieve acceptable calibration at the same time.
RESUMO
PURPOSE: There is as yet no computer-processable resource to describe treatment end points in cancer, hindering our ability to systematically capture and share outcomes data to inform better patient care. To address these unmet needs, we have built an ontology, the Cancer Care Treatment Outcome Ontology (CCTOO), to organize high-level concepts of treatment end points with structured knowledge representation to facilitate standardized sharing of real-world data. METHODS: End points from oncology trials in ClinicalTrials.gov were extracted, queried using the keyword cancer, and followed by an expert appraisal. Synonyms and relevant terms were imported from the National Cancer Institute Thesaurus and Common Terminology Criteria for Adverse Events. Logical relationships among concepts were manually represented by production rules. The applicability of 1,847 rules was tested in an index case. RESULTS: After removing duplicated terms from 54,705 trial entries, an ontology holding 1,133 terms was built. CCTOO organized concepts into four domains (cancer treatment, health services, physical, and psychosocial health-related concepts), 13 subgroups (including efficacy, safety, and quality of life), and two (taxonomic and evaluative) concept hierarchies. This ontology has a comprehensive term coverage in the cancer trial literature: at least one term was mentioned in 98% of MEDLINE abstracts of phase I to III trials, whereas concepts about efficacy were mentioned in 7,208 (79%) phase I, 15,051 (92%) phase II, and 3,884 (86%) phase III trials. The event sequence of the index case was readily convertible to a comprehensive profile incorporating response, treatment toxicity, and survival by applying the set of production rules curated in the CCTOO. CONCLUSION: CCTOO categorizes high-level treatment end points used in oncology and provides a mechanism for profiling individual patient data by outcomes to facilitate translational analysis.
Assuntos
Ontologias Biológicas/tendências , Neoplasias/terapia , Qualidade de Vida/psicologia , Humanos , Resultado do TratamentoRESUMO
BACKGROUND: In the era of semantic web, life science ontologies play an important role in tasks such as annotating biological objects, linking relevant data pieces, and verifying data consistency. Understanding ontology structures and overlapping ontologies is essential for tasks such as ontology reuse and development. We present an exploratory study where we examine structure and look for patterns in BioPortal, a comprehensive publicly available repository of live science ontologies. METHODS: We report an analysis of biomedical ontology mapping data over time. We apply graph theory methods such as Modularity Analysis and Betweenness Centrality to analyse data gathered at five different time points. We identify communities, i.e., sets of overlapping ontologies, and define similar and closest communities. We demonstrate evolution of identified communities over time and identify core ontologies of the closest communities. We use BioPortal project and category data to measure community coherence. We also validate identified communities with their mutual mentions in scientific literature. RESULTS: With comparing mapping data gathered at five different time points, we identified similar and closest communities of overlapping ontologies, and demonstrated evolution of communities over time. Results showed that anatomy and health ontologies tend to form more isolated communities compared to other categories. We also showed that communities contain all or the majority of ontologies being used in narrower projects. In addition, we identified major changes in mapping data after migration to BioPortal Version 4.
RESUMO
OBJECTIVE: Text and data mining play an important role in obtaining insights from Health and Hospital Information Systems. This paper presents a text mining system for detecting admissions marked as positive for several diseases: Lung Cancer, Breast Cancer, Colon Cancer, Secondary Malignant Neoplasm of Respiratory and Digestive Organs, Multiple Myeloma and Malignant Plasma Cell Neoplasms, Pneumonia, and Pulmonary Embolism. We specifically examine the effect of linking multiple data sources on text classification performance. METHODS: Support Vector Machine classifiers are built for eight data source combinations, and evaluated using the metrics of Precision, Recall and F-Score. Sub-sampling techniques are used to address unbalanced datasets of medical records. We use radiology reports as an initial data source and add other sources, such as pathology reports and patient and hospital admission data, in order to assess the research question regarding the impact of the value of multiple data sources. Statistical significance is measured using the Wilcoxon signed-rank test. A second set of experiments explores aspects of the system in greater depth, focusing on Lung Cancer. We explore the impact of feature selection; analyse the learning curve; examine the effect of restricting admissions to only those containing reports from all data sources; and examine the impact of reducing the sub-sampling. These experiments provide better understanding of how to best apply text classification in the context of imbalanced data of variable completeness. RESULTS: Radiology questions plus patient and hospital admission data contribute valuable information for detecting most of the diseases, significantly improving performance when added to radiology reports alone or to the combination of radiology and pathology reports. CONCLUSION: Overall, linking data sources significantly improved classification performance for all the diseases examined. However, there is no single approach that suits all scenarios; the choice of the most effective combination of data sources depends on the specific disease to be classified.
Assuntos
Mineração de Dados , Doença/classificação , Registros Hospitalares , Processamento de Linguagem Natural , Hospitalização , Humanos , Cooperação do Paciente , Máquina de Vetores de SuporteRESUMO
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
RESUMO
PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. METHODS: This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. RESULTS: The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. CONCLUSIONS: The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
Assuntos
Biologia Computacional/métodos , Árvores de Decisões , Modelos Teóricos , Inteligência Artificial , Mineração de Dados , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Humanos , Proteínas/química , Proteínas/classificação , Reprodutibilidade dos Testes , SolubilidadeRESUMO
UNLABELLED: Often, the most informative genes have to be selected from different gene sets and several computer gene ranking algorithms have been developed to cope with the problem. To help researchers decide which algorithm to use, we developed the analysis of gene ranking algorithms (AGRA) system that offers a novel technique for comparing ranked lists of genes. The most important feature of AGRA is that no previous knowledge of gene ranking algorithms is needed for their comparison. Using the text mining system finding-associated concepts with text analysis. AGRA defines what we call biomedical concept space (BCS) for each gene list and offers a comparison of the gene lists in six different BCS categories. The uploaded gene lists can be compared using two different methods. In the first method, the overlap between each pair of two gene lists of BCSs is calculated. The second method offers a text field where a specific biomedical concept can be entered. AGRA searches for this concept in each gene lists' BCS, highlights the rank of the concept and offers a visual representation of concepts ranked above and below it. AVAILABILITY AND IMPLEMENTATION: Available at http://agra.fzv.uni-mb.si/, implemented in Java and running on the Glassfish server. CONTACT: simon.kocbek@uni-mb.si.