Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Eur Heart J ; 44(9): 713-725, 2023 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-36629285

RESUMEN

Artificial intelligence (AI) is increasingly being utilized in healthcare. This article provides clinicians and researchers with a step-wise foundation for high-value AI that can be applied to a variety of different data modalities. The aim is to improve the transparency and application of AI methods, with the potential to benefit patients in routine cardiovascular care. Following a clear research hypothesis, an AI-based workflow begins with data selection and pre-processing prior to analysis, with the type of data (structured, semi-structured, or unstructured) determining what type of pre-processing steps and machine-learning algorithms are required. Algorithmic and data validation should be performed to ensure the robustness of the chosen methodology, followed by an objective evaluation of performance. Seven case studies are provided to highlight the wide variety of data modalities and clinical questions that can benefit from modern AI techniques, with a focus on applying them to cardiovascular disease management. Despite the growing use of AI, further education for healthcare workers, researchers, and the public are needed to aid understanding of how AI works and to close the existing gap in knowledge. In addition, issues regarding data access, sharing, and security must be addressed to ensure full engagement by patients and the public. The application of AI within healthcare provides an opportunity for clinicians to deliver a more personalized approach to medical care by accounting for confounders, interactions, and the rising prevalence of multi-morbidity.


Asunto(s)
Inteligencia Artificial , Sistema Cardiovascular , Humanos , Algoritmos , Aprendizaje Automático , Atención a la Salud
2.
Lancet ; 398(10309): 1427-1435, 2021 10 16.
Artículo en Inglés | MEDLINE | ID: mdl-34474011

RESUMEN

BACKGROUND: Mortality remains unacceptably high in patients with heart failure and reduced left ventricular ejection fraction (LVEF) despite advances in therapeutics. We hypothesised that a novel artificial intelligence approach could better assess multiple and higher-dimension interactions of comorbidities, and define clusters of ß-blocker efficacy in patients with sinus rhythm and atrial fibrillation. METHODS: Neural network-based variational autoencoders and hierarchical clustering were applied to pooled individual patient data from nine double-blind, randomised, placebo-controlled trials of ß blockers. All-cause mortality during median 1·3 years of follow-up was assessed by intention to treat, stratified by electrocardiographic heart rhythm. The number of clusters and dimensions was determined objectively, with results validated using a leave-one-trial-out approach. This study was prospectively registered with ClinicalTrials.gov (NCT00832442) and the PROSPERO database of systematic reviews (CRD42014010012). FINDINGS: 15 659 patients with heart failure and LVEF of less than 50% were included, with median age 65 years (IQR 56-72) and LVEF 27% (IQR 21-33). 3708 (24%) patients were women. In sinus rhythm (n=12 822), most clusters demonstrated a consistent overall mortality benefit from ß blockers, with odds ratios (ORs) ranging from 0·54 to 0·74. One cluster in sinus rhythm of older patients with less severe symptoms showed no significant efficacy (OR 0·86, 95% CI 0·67-1·10; p=0·22). In atrial fibrillation (n=2837), four of five clusters were consistent with the overall neutral effect of ß blockers versus placebo (OR 0·92, 0·77-1·10; p=0·37). One cluster of younger atrial fibrillation patients at lower mortality risk but similar LVEF to average had a statistically significant reduction in mortality with ß blockers (OR 0·57, 0·35-0·93; p=0·023). The robustness and consistency of clustering was confirmed for all models (p<0·0001 vs random), and cluster membership was externally validated across the nine independent trials. INTERPRETATION: An artificial intelligence-based clustering approach was able to distinguish prognostic response from ß blockers in patients with heart failure and reduced LVEF. This included patients in sinus rhythm with suboptimal efficacy, as well as a cluster of patients with atrial fibrillation where ß blockers did reduce mortality. FUNDING: Medical Research Council, UK, and EU/EFPIA Innovative Medicines Initiative BigData@Heart.


Asunto(s)
Antagonistas Adrenérgicos beta/uso terapéutico , Fibrilación Atrial/tratamiento farmacológico , Análisis por Conglomerados , Insuficiencia Cardíaca/tratamiento farmacológico , Aprendizaje Automático , Anciano , Comorbilidad , Método Doble Ciego , Femenino , Insuficiencia Cardíaca/mortalidad , Humanos , Masculino , Persona de Mediana Edad , Volumen Sistólico , Función Ventricular Izquierda
3.
Regul Toxicol Pharmacol ; 128: 105089, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34861320

RESUMEN

Respiratory irritation is an important human health endpoint in chemical risk assessment. There are two established modes of action of respiratory irritation, 1) sensory irritation mediated by the interaction with sensory neurons, potentially stimulating trigeminal nerve, and 2) direct tissue irritation. The aim of our research was to, develop a QSAR method to predict human respiratory irritants, and to potentially reduce the reliance on animal testing for the identification of respiratory irritants. Compounds are classified as irritating based on combined evidence from different types of toxicological data, including inhalation studies with acute and repeated exposure. The curated project database comprised 1997 organic substances, 1553 being classified as irritating and 444 as non-irritating. A comparison of machine learning approaches, including Logistic Regression (LR), Random Forests (RFs), and Gradient Boosted Decision Trees (GBTs), showed, the best classification was obtained by GBTs. The LR model resulted in an area under the curve (AUC) of 0.65, while the optimal performance for both RFs and GBTs gives an AUC of 0.71. In addition to the classification and the information on the applicability domain, the web-based tool provides a list of structurally similar analogues together with their experimental data to facilitate expert review for read-across purposes.


Asunto(s)
Irritantes/química , Aprendizaje Automático , Relación Estructura-Actividad Cuantitativa , Sistema Respiratorio/efectos de los fármacos , Administración por Inhalación , Alternativas a las Pruebas en Animales/métodos , Medición de Riesgo
4.
BMC Med Inform Decis Mak ; 22(1): 33, 2022 02 05.
Artículo en Inglés | MEDLINE | ID: mdl-35123470

RESUMEN

BACKGROUND: Semantic similarity is a valuable tool for analysis in biomedicine. When applied to phenotype profiles derived from clinical text, they have the capacity to enable and enhance 'patient-like me' analyses, automated coding, differential diagnosis, and outcome prediction. While a large body of work exists exploring the use of semantic similarity for multiple tasks, including protein interaction prediction, and rare disease differential diagnosis, there is less work exploring comparison of patient phenotype profiles for clinical tasks. Moreover, there are no experimental explorations of optimal parameters or better methods in the area. METHODS: We develop a platform for reproducible benchmarking and comparison of experimental conditions for patient phentoype similarity. Using the platform, we evaluate the task of ranking shared primary diagnosis from uncurated phenotype profiles derived from all text narrative associated with admissions in the medical information mart for intensive care (MIMIC-III). RESULTS: 300 semantic similarity configurations were evaluated, as well as one embedding-based approach. On average, measures that did not make use of an external information content measure performed slightly better, however the best-performing configurations when measured by area under receiver operating characteristic curve and Top Ten Accuracy used term-specificity and annotation-frequency measures. CONCLUSION: We identified and interpreted the performance of a large number of semantic similarity configurations for the task of classifying diagnosis from text-derived phenotype profiles in one setting. We also provided a basis for further research on other settings and related tasks in the area.


Asunto(s)
Enfermedades Raras , Semántica , Humanos , Fenotipo , Curva ROC
5.
BMC Med ; 19(1): 23, 2021 01 21.
Artículo en Inglés | MEDLINE | ID: mdl-33472631

RESUMEN

BACKGROUND: The National Early Warning Score (NEWS2) is currently recommended in the UK for the risk stratification of COVID-19 patients, but little is known about its ability to detect severe cases. We aimed to evaluate NEWS2 for the prediction of severe COVID-19 outcome and identify and validate a set of blood and physiological parameters routinely collected at hospital admission to improve upon the use of NEWS2 alone for medium-term risk stratification. METHODS: Training cohorts comprised 1276 patients admitted to King's College Hospital National Health Service (NHS) Foundation Trust with COVID-19 disease from 1 March to 30 April 2020. External validation cohorts included 6237 patients from five UK NHS Trusts (Guy's and St Thomas' Hospitals, University Hospitals Southampton, University Hospitals Bristol and Weston NHS Foundation Trust, University College London Hospitals, University Hospitals Birmingham), one hospital in Norway (Oslo University Hospital), and two hospitals in Wuhan, China (Wuhan Sixth Hospital and Taikang Tongji Hospital). The outcome was severe COVID-19 disease (transfer to intensive care unit (ICU) or death) at 14 days after hospital admission. Age, physiological measures, blood biomarkers, sex, ethnicity, and comorbidities (hypertension, diabetes, cardiovascular, respiratory and kidney diseases) measured at hospital admission were considered in the models. RESULTS: A baseline model of 'NEWS2 + age' had poor-to-moderate discrimination for severe COVID-19 infection at 14 days (area under receiver operating characteristic curve (AUC) in training cohort = 0.700, 95% confidence interval (CI) 0.680, 0.722; Brier score = 0.192, 95% CI 0.186, 0.197). A supplemented model adding eight routinely collected blood and physiological parameters (supplemental oxygen flow rate, urea, age, oxygen saturation, C-reactive protein, estimated glomerular filtration rate, neutrophil count, neutrophil/lymphocyte ratio) improved discrimination (AUC = 0.735; 95% CI 0.715, 0.757), and these improvements were replicated across seven UK and non-UK sites. However, there was evidence of miscalibration with the model tending to underestimate risks in most sites. CONCLUSIONS: NEWS2 score had poor-to-moderate discrimination for medium-term COVID-19 outcome which raises questions about its use as a screening tool at hospital admission. Risk stratification was improved by including readily available blood and physiological parameters measured at hospital admission, but there was evidence of miscalibration in external sites. This highlights the need for a better understanding of the use of early warning scores for COVID.


Asunto(s)
COVID-19/diagnóstico , Puntuación de Alerta Temprana , Anciano , COVID-19/epidemiología , COVID-19/virología , Estudios de Cohortes , Registros Electrónicos de Salud , Femenino , Humanos , Masculino , Persona de Mediana Edad , Pandemias , Pronóstico , SARS-CoV-2/aislamiento & purificación , Medicina Estatal , Reino Unido/epidemiología
6.
Front Med (Lausanne) ; 11: 1354070, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38686369

RESUMEN

Introduction: The echocardiographic measurement of left ventricular ejection fraction (LVEF) is fundamental to the diagnosis and classification of patients with heart failure (HF). Methods: This paper aimed to quantify LVEF automatically and accurately with the proposed pipeline method based on deep neural networks and ensemble learning. Within the pipeline, an Atrous Convolutional Neural Network (ACNN) was first trained to segment the left ventricle (LV), before employing the area-length formulation based on the ellipsoid single-plane model to calculate LVEF values. This formulation required inputs of LV area, derived from segmentation using an improved Jeffrey's method, as well as LV length, derived from a novel ensemble learning model. To further improve the pipeline's accuracy, an automated peak detection algorithm was used to identify end-diastolic and end-systolic frames, avoiding issues with human error. Subsequently, single-beat LVEF values were averaged across all cardiac cycles to obtain the final LVEF. Results: This method was developed and internally validated in an open-source dataset containing 10,030 echocardiograms. The Pearson's correlation coefficient was 0.83 for LVEF prediction compared to expert human analysis (p < 0.001), with a subsequent area under the receiver operator curve (AUROC) of 0.98 (95% confidence interval 0.97 to 0.99) for categorisation of HF with reduced ejection (HFrEF; LVEF<40%). In an external dataset with 200 echocardiograms, this method achieved an AUC of 0.90 (95% confidence interval 0.88 to 0.91) for HFrEF assessment. Conclusion: The automated neural network-based calculation of LVEF is comparable to expert clinicians performing time-consuming, frame-by-frame manual evaluations of cardiac systolic function.

7.
JAMIA Open ; 7(2): ooae049, 2024 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-38895652

RESUMEN

Objective: To enable reproducible research at scale by creating a platform that enables health data users to find, access, curate, and re-use electronic health record phenotyping algorithms. Materials and Methods: We undertook a structured approach to identifying requirements for a phenotype algorithm platform by engaging with key stakeholders. User experience analysis was used to inform the design, which we implemented as a web application featuring a novel metadata standard for defining phenotyping algorithms, access via Application Programming Interface (API), support for computable data flows, and version control. The application has creation and editing functionality, enabling researchers to submit phenotypes directly. Results: We created and launched the Phenotype Library in October 2021. The platform currently hosts 1049 phenotype definitions defined against 40 health data sources and >200K terms across 16 medical ontologies. We present several case studies demonstrating its utility for supporting and enabling research: the library hosts curated phenotype collections for the BREATHE respiratory health research hub and the Adolescent Mental Health Data Platform, and it is supporting the development of an informatics tool to generate clinical evidence for clinical guideline development groups. Discussion: This platform makes an impact by being open to all health data users and accepting all appropriate content, as well as implementing key features that have not been widely available, including managing structured metadata, access via an API, and support for computable phenotypes. Conclusions: We have created the first openly available, programmatically accessible resource enabling the global health research community to store and manage phenotyping algorithms. Removing barriers to describing, sharing, and computing phenotypes will help unleash the potential benefit of health data for patients and the public.

8.
Comput Biol Med ; 153: 106425, 2023 02.
Artículo en Inglés | MEDLINE | ID: mdl-36638616

RESUMEN

Annotation of biomedical entities with ontology classes provides for formal semantic analysis and mobilisation of background knowledge in determining their relationships. To date, enrichment analysis has been routinely employed to identify classes that are over-represented in annotations across sets of groups, such as biosample gene expression profiles or patient phenotypes, and is useful for a range of tasks including differential diagnosis and causative variant prioritisation. These approaches, however, usually consider only univariate relationships, make limited use of the semantic features of ontologies, and provide limited information and evaluation of the explanatory power of both singular and grouped candidate classes. Moreover, they are not designed to solve the problem of deriving cohesive, characteristic, and discriminatory sets of classes for entity groups. We have developed a new tool, called Klarigi, which introduces multiple scoring heuristics for identification of classes that are both compositional and discriminatory for groups of entities annotated with ontology classes. The tool includes a novel algorithm for derivation of multivariable semantic explanations for entity groups, makes use of semantic inference through live use of an ontology reasoner, and includes a classification method for identifying the discriminatory power of candidate sets, in addition to significance testing apposite to traditional enrichment approaches. We describe the design and implementation of Klarigi, including its scoring and explanation determination methods, and evaluate its use in application to two test cases with clinical significance, comparing and contrasting methods and results with literature-based and enrichment analysis methods. We demonstrate that Klarigi produces characteristic and discriminatory explanations for groups of biomedical entities in two settings. We also show that these explanations recapitulate and extend the knowledge held in existing biomedical databases and literature for several diseases. We conclude that Klarigi provides a distinct and valuable perspective on biomedical datasets when compared with traditional enrichment methods, and therefore constitutes a new method by which biomedical datasets can be explored, contributing to improved insight into semantic data.


Asunto(s)
Ontologías Biológicas , Semántica , Algoritmos , Fenotipo , Bases de Datos Factuales
9.
Sci Rep ; 12(1): 13094, 2022 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-35908043

RESUMEN

In the extensive search for new physics, the precise measurement of the Higgs boson continues to play an important role. To this end, machine learning techniques have been recently applied to processes like the Higgs production via vector-boson fusion. In this paper, we propose to use algorithms for learning to rank, i.e., to rank events into a sorting order, first signal, then background, instead of algorithms for the classification into two classes, for this task. The fact that training is then performed on pairwise comparisons of signal and background events can effectively increase the amount of training data due to the quadratic number of possible combinations. This makes it robust to unbalanced data set scenarios and can improve the overall performance compared to pointwise models like the state-of-the-art boosted decision tree approach. In this work we compare our pairwise neural network algorithm, which is a combination of a convolutional neural network and the DirectRanker, with convolutional neural networks, multilayer perceptrons or boosted decision trees, which are commonly used algorithms in multiple Higgs production channels. Furthermore, we use so-called transfer learning techniques to improve overall performance on different data types.

10.
JAMA Psychiatry ; 79(5): 498-507, 2022 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-35353173

RESUMEN

Importance: Previous in vitro and postmortem research suggests that inflammation may lead to structural brain changes via activation of microglia and/or astrocytic dysfunction in a range of neuropsychiatric disorders. Objective: To investigate the relationship between inflammation and changes in brain structures in vivo and to explore a transcriptome-driven functional basis with relevance to mental illness. Design, Setting, and Participants: This study used multistage linked analyses, including mendelian randomization (MR), gene expression correlation, and connectivity analyses. A total of 20 688 participants in the UK Biobank, which includes clinical, genomic, and neuroimaging data, and 6 postmortem brains from neurotypical individuals in the Allen Human Brain Atlas (AHBA), including RNA microarray data. Data were extracted in February 2021 and analyzed between March and October 2021. Exposures: Genetic variants regulating levels and activity of circulating interleukin 1 (IL-1), IL-2, IL-6, C-reactive protein (CRP), and brain-derived neurotrophic factor (BDNF) were used as exposures in MR analyses. Main Outcomes and Measures: Brain imaging measures, including gray matter volume (GMV) and cortical thickness (CT), were used as outcomes. Associations were considered significant at a multiple testing-corrected threshold of P < 1.1 × 10-4. Differential gene expression in AHBA data was modeled in brain regions mapped to areas significant in MR analyses; genes were tested for biological and disease overrepresentation in annotation databases and for connectivity in protein-protein interaction networks. Results: Of 20 688 participants in the UK Biobank sample, 10 828 (52.3%) were female, and the mean (SD) age was 55.5 (7.5) years. In the UK Biobank sample, genetically predicted levels of IL-6 were associated with GMV in the middle temporal cortex (z score, 5.76; P = 8.39 × 10-9), inferior temporal (z score, 3.38; P = 7.20 × 10-5), fusiform (z score, 4.70; P = 2.60 × 10-7), and frontal (z score, -3.59; P = 3.30 × 10-5) cortex together with CT in the superior frontal region (z score, -5.11; P = 3.22 × 10-7). No significant associations were found for IL-1, IL-2, CRP, or BDNF after correction for multiple comparison. In the AHBA sample, 5 of 6 participants (83%) were male, and the mean (SD) age was 42.5 (13.4) years. Brain-wide coexpression analysis showed a highly interconnected network of genes preferentially expressed in the middle temporal gyrus (MTG), which further formed a highly connected protein-protein interaction network with IL-6 (enrichment test of expected vs observed network given the prevalence and degree of interactions in the STRING database: 43 nodes/30 edges observed vs 8 edges expected; mean node degree, 1.4; genome-wide significance, P = 4.54 × 10-9). MTG differentially expressed genes that were functionally enriched for biological processes in schizophrenia, autism spectrum disorder, and epilepsy. Conclusions and Relevance: In this study, genetically determined IL-6 was associated with brain structure and potentially affects areas implicated in developmental neuropsychiatric disorders, including schizophrenia and autism.


Asunto(s)
Trastorno del Espectro Autista , Esquizofrenia , Adulto , Encéfalo/diagnóstico por imagen , Factor Neurotrófico Derivado del Encéfalo/genética , Proteína C-Reactiva/genética , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Inflamación/epidemiología , Inflamación/genética , Interleucina-1/genética , Interleucina-2/genética , Interleucina-6/genética , Imagen por Resonancia Magnética , Masculino , Análisis de la Aleatorización Mendeliana , Persona de Mediana Edad , Esquizofrenia/genética
11.
NPJ Digit Med ; 5(1): 186, 2022 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-36544046

RESUMEN

Much of the knowledge and information needed for enabling high-quality clinical research is stored in free-text format. Natural language processing (NLP) has been used to extract information from these sources at scale for several decades. This paper aims to present a comprehensive review of clinical NLP for the past 15 years in the UK to identify the community, depict its evolution, analyse methodologies and applications, and identify the main barriers. We collect a dataset of clinical NLP projects (n = 94; £ = 41.97 m) funded by UK funders or the European Union's funding programmes. Additionally, we extract details on 9 funders, 137 organisations, 139 persons and 431 research papers. Networks are created from timestamped data interlinking all entities, and network analysis is subsequently applied to generate insights. 431 publications are identified as part of a literature review, of which 107 are eligible for final analysis. Results show, not surprisingly, clinical NLP in the UK has increased substantially in the last 15 years: the total budget in the period of 2019-2022 was 80 times that of 2007-2010. However, the effort is required to deepen areas such as disease (sub-)phenotyping and broaden application domains. There is also a need to improve links between academia and industry and enable deployments in real-world settings for the realisation of clinical NLP's great potential in care delivery. The major barriers include research and development access to hospital data, lack of capable computational resources in the right places, the scarcity of labelled data and barriers to sharing of pretrained models.

12.
Front Digit Health ; 3: 781227, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34939069

RESUMEN

Semantic similarity is a useful approach for comparing patient phenotypes, and holds the potential of an effective method for exploiting text-derived phenotypes for differential diagnosis, text and document classification, and outcome prediction. While approaches for context disambiguation are commonly used in text mining applications, forming a standard component of information extraction pipelines, their effects on semantic similarity calculations have not been widely explored. In this work, we evaluate how inclusion and disclusion of negated and uncertain mentions of concepts from text-derived phenotypes affects similarity of patients, and the use of those profiles to predict diagnosis. We report on the effectiveness of these approaches and report a very small, yet significant, improvement in performance when classifying primary diagnosis over MIMIC-III patient visits.

13.
Comput Biol Med ; 133: 104360, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33836447

RESUMEN

Ontology-based phenotype profiles have been utilised for the purpose of differential diagnosis of rare genetic diseases, and for decision support in specific disease domains. Particularly, semantic similarity facilitates diagnostic hypothesis generation through comparison with disease phenotype profiles. However, the approach has not been applied for differential diagnosis of common diseases, or generalised clinical diagnostics from uncurated text-derived phenotypes. In this work, we describe the development of an approach for deriving patient phenotype profiles from clinical narrative text, and apply this to text associated with MIMIC-III patient visits. We then explore the use of semantic similarity with those text-derived phenotypes to classify primary patient diagnosis, comparing the use of patient-patient similarity and patient-disease similarity using phenotype-disease profiles previously mined from literature. We also consider a combined approach, in which literature-derived phenotypes are extended with the content of text-derived phenotypes we mined from 500 patients. The results reveal a powerful approach, showing that in one setting, uncurated text phenotypes can be used for differential diagnosis of common diseases, making use of information both inside and outside the setting. While the methods themselves should be explored for further optimisation, they could be applied to a variety of clinical tasks, such as differential diagnosis, cohort discovery, document and text classification, and outcome prediction.


Asunto(s)
Enfermedades Raras , Semántica , Diagnóstico Diferencial , Humanos , Fenotipo , Enfermedades Raras/diagnóstico , Enfermedades Raras/genética
14.
Comput Biol Med ; 138: 104904, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34600327

RESUMEN

Identification of ontology concepts in clinical narrative text enables the creation of phenotype profiles that can be associated with clinical entities, such as patients or drugs. Constructing patient phenotype profiles using formal ontologies enables their analysis via semantic similarity, in turn enabling the use of background knowledge in clustering or classification analyses. However, traditional semantic similarity approaches collapse complex relationships between patient phenotypes into a unitary similarity scores for each pair of patients. Moreover, single scores may be based only on matching terms with the greatest information content (IC), ignoring other dimensions of patient similarity. This process necessarily leads to a loss of information in the resulting representation of patient similarity, and is especially apparent when using very large text-derived and highly multi-morbid phenotype profiles. Moreover, it renders finding a biological explanation for similarity very difficult; the black box problem. In this article, we explore the generation of multiple semantic similarity scores for patients based on different facets of their phenotypic manifestation, which we define through different sub-graphs in the Human Phenotype Ontology. We further present a new methodology for deriving sets of qualitative class descriptions for groups of entities described by ontology terms. Leveraging this strategy to obtain meaningful explanations for our semantic clusters alongside other evaluation techniques, we show that semantic clustering with ontology-derived facets enables the representation, and thus identification of, clinically relevant phenotype relationships not easily recoverable using overall clustering alone. In this way, we demonstrate the potential of faceted semantic clustering for gaining a deeper and more nuanced understanding of text-derived patient phenotypes.


Asunto(s)
Semántica , Análisis por Conglomerados , Humanos , Fenotipo
15.
Comput Biol Med ; 135: 104542, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34139439

RESUMEN

BACKGROUND: Unstructured text created by patients represents a rich, but relatively inaccessible resource for advancing patient-centred care. This study aimed to develop an ontology for ocular immune-mediated inflammatory diseases (OcIMIDo), as a tool to facilitate data extraction and analysis, illustrating its application to online patient support forum data. METHODS: We developed OcIMIDo using clinical guidelines, domain expertise, and cross-references to classes from other biomedical ontologies. We developed an approach to add patient-preferred synonyms text-mined from oliviasvision.org online forum, using statistical ranking. We validated the approach with split-sampling and comparison to manual extraction. Using OcIMIDo, we then explored the frequency of OcIMIDo classes and synonyms, and their potential association with natural language sentiment expressed in each online forum post. FINDINGS: OcIMIDo (version 1.2) includes 661 classes, describing anatomy, clinical phenotype, disease activity status, complications, investigations, interventions and functional impacts. It contains 1661 relationships and axioms, 2851 annotations, including 1131 database cross-references, and 187 patient-preferred synonyms. To illustrate OcIMIDo's potential applications, we explored 9031 forum posts, revealing frequent mention of different clinical phenotypes, treatments, and complications. Language sentiment analysis of each post was generally positive (median 0.12, IQR 0.01-0.24). In multivariable logistic regression, the odds of a post expressing negative sentiment were significantly associated with first posts as compared to replies (OR 3.3, 95% CI 2.8 to 3.9, p < 0.001). CONCLUSION: We report the development and validation of a new ontology for inflammatory eye diseases, which includes patient-preferred synonyms, and can be used to explore unstructured patient or physician-reported text data, with many potential applications.


Asunto(s)
Ontologías Biológicas , Bases de Datos Factuales , Humanos , Lenguaje , Fenotipo
16.
Gigascience ; 10(9)2021 09 11.
Artículo en Inglés | MEDLINE | ID: mdl-34508578

RESUMEN

BACKGROUND: High-quality phenotype definitions are desirable to enable the extraction of patient cohorts from large electronic health record repositories and are characterized by properties such as portability, reproducibility, and validity. Phenotype libraries, where definitions are stored, have the potential to contribute significantly to the quality of the definitions they host. In this work, we present a set of desiderata for the design of a next-generation phenotype library that is able to ensure the quality of hosted definitions by combining the functionality currently offered by disparate tooling. METHODS: A group of researchers examined work to date on phenotype models, implementation, and validation, as well as contemporary phenotype libraries developed as a part of their own phenomics communities. Existing phenotype frameworks were also examined. This work was translated and refined by all the authors into a set of best practices. RESULTS: We present 14 library desiderata that promote high-quality phenotype definitions, in the areas of modelling, logging, validation, and sharing and warehousing. CONCLUSIONS: There are a number of choices to be made when constructing phenotype libraries. Our considerations distil the best practices in the field and include pointers towards their further development to support portable, reproducible, and clinically valid phenotype design. The provision of high-quality phenotype definitions enables electronic health record data to be more effectively used in medical domains.


Asunto(s)
Registros Electrónicos de Salud , Humanos , Fenotipo , Reproducibilidad de los Resultados
17.
Heart ; 107(11): 902-908, 2021 06.
Artículo en Inglés | MEDLINE | ID: mdl-33692093

RESUMEN

OBJECTIVE: To improve the echocardiographic assessment of heart failure in patients with atrial fibrillation (AF) by comparing conventional averaging of consecutive beats with an index-beat approach, whereby measurements are taken after two cycles with similar R-R interval. METHODS: Transthoracic echocardiography was performed using a standardised and blinded protocol in patients enrolled in the RATE-AF (RAte control Therapy Evaluation in permanent Atrial Fibrillation) randomised trial. We compared reproducibility of the index-beat and conventional consecutive-beat methods to calculate left ventricular ejection fraction (LVEF), global longitudinal strain (GLS) and E/e' (mitral E wave max/average diastolic tissue Doppler velocity), and assessed intraoperator/interoperator variability, time efficiency and validity against natriuretic peptides. RESULTS: 160 patients were included, 46% of whom were women, with a median age of 75 years (IQR 69-82) and a median heart rate of 100 beats per minute (IQR 86-112). The index-beat had the lowest within-beat coefficient of variation for LVEF (32%, vs 51% for 5 consecutive beats and 53% for 10 consecutive beats), GLS (26%, vs 43% and 42%) and E/e' (25%, vs 41% and 41%). Intraoperator (n=50) and interoperator (n=18) reproducibility were both superior for index-beats and this method was quicker to perform (p<0.001): 35.4 s to measure E/e' (95% CI 33.1 to 37.8) compared with 44.7 s for 5-beat (95% CI 41.8 to 47.5) and 98.1 s for 10-beat (95% CI 91.7 to 104.4) analyses. Using a single index-beat did not compromise the association of LVEF, GLS or E/e' with natriuretic peptide levels. CONCLUSIONS: Compared with averaging of multiple beats in patients with AF, the index-beat approach improves reproducibility and saves time without a negative impact on validity, potentially improving the diagnosis and classification of heart failure in patients with AF.


Asunto(s)
Fibrilación Atrial/fisiopatología , Ecocardiografía Doppler de Pulso , Insuficiencia Cardíaca/diagnóstico , Anciano , Anciano de 80 o más Años , Biomarcadores/sangre , Diástole/fisiología , Femenino , Humanos , Masculino , Péptido Natriurético Encefálico/sangre , Fragmentos de Péptidos/sangre , Reproducibilidad de los Resultados , Volumen Sistólico/fisiología , Sístole/fisiología , Función Ventricular Izquierda/fisiología
18.
J Am Med Inform Assoc ; 28(4): 791-800, 2021 03 18.
Artículo en Inglés | MEDLINE | ID: mdl-33185672

RESUMEN

OBJECTIVE: Risk prediction models are widely used to inform evidence-based clinical decision making. However, few models developed from single cohorts can perform consistently well at population level where diverse prognoses exist (such as the SARS-CoV-2 [severe acute respiratory syndrome coronavirus 2] pandemic). This study aims at tackling this challenge by synergizing prediction models from the literature using ensemble learning. MATERIALS AND METHODS: In this study, we selected and reimplemented 7 prediction models for COVID-19 (coronavirus disease 2019) that were derived from diverse cohorts and used different implementation techniques. A novel ensemble learning framework was proposed to synergize them for realizing personalized predictions for individual patients. Four diverse international cohorts (2 from the United Kingdom and 2 from China; N = 5394) were used to validate all 8 models on discrimination, calibration, and clinical usefulness. RESULTS: Results showed that individual prediction models could perform well on some cohorts while poorly on others. Conversely, the ensemble model achieved the best performances consistently on all metrics quantifying discrimination, calibration, and clinical usefulness. Performance disparities were observed in cohorts from the 2 countries: all models achieved better performances on the China cohorts. DISCUSSION: When individual models were learned from complementary cohorts, the synergized model had the potential to achieve better performances than any individual model. Results indicate that blood parameters and physiological measurements might have better predictive powers when collected early, which remains to be confirmed by further studies. CONCLUSIONS: Combining a diverse set of individual prediction models, the ensemble method can synergize a robust and well-performing model by choosing the most competent ones for individual patients.


Asunto(s)
COVID-19/mortalidad , Modelos Estadísticos , Pronóstico , Adulto , Anciano , Anciano de 80 o más Años , COVID-19/epidemiología , COVID-19/prevención & control , China/epidemiología , Femenino , Humanos , Masculino , Persona de Mediana Edad , Medición de Riesgo/métodos , SARS-CoV-2 , Reino Unido/epidemiología
19.
Sci Rep ; 9(1): 17405, 2019 11 22.
Artículo en Inglés | MEDLINE | ID: mdl-31757986

RESUMEN

Identifying and distinguishing cancer driver genes among thousands of candidate mutations remains a major challenge. Accurate identification of driver genes and driver mutations is critical for advancing cancer research and personalizing treatment based on accurate stratification of patients. Due to inter-tumor genetic heterogeneity many driver mutations within a gene occur at low frequencies, which make it challenging to distinguish them from non-driver mutations. We have developed a novel method for identifying cancer driver genes. Our approach utilizes multiple complementary types of information, specifically cellular phenotypes, cellular locations, functions, and whole body physiological phenotypes as features. We demonstrate that our method can accurately identify known cancer driver genes and distinguish between their role in different types of cancer. In addition to confirming known driver genes, we identify several novel candidate driver genes. We demonstrate the utility of our method by validating its predictions in nasopharyngeal cancer and colorectal cancer using whole exome and whole genome sequencing.


Asunto(s)
Biología Computacional/métodos , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Neoplasias/etiología , Oncogenes , Biomarcadores de Tumor , Exoma , Ontología de Genes , Estudios de Asociación Genética/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Aprendizaje Automático , Anotación de Secuencia Molecular , Mutación , Neoplasias/diagnóstico , Curva ROC
20.
BMC Bioinformatics ; 3: 11, 2002 Apr 23.
Artículo en Inglés | MEDLINE | ID: mdl-11972320

RESUMEN

BACKGROUND: The inference of homology between proteins is a key problem in molecular biology The current best approaches only identify approximately 50% of homologies (with a false positive rate set at 1/1000). RESULTS: We present Homology Induction (HI), a new approach to inferring homology. HI uses machine learning to bootstrap from standard sequence similarity search methods. First a standard method is run, then HI learns rules which are true for sequences of high similarity to the target (assumed homologues) and not true for general sequences, these rules are then used to discriminate sequences in the twilight zone. To learn the rules HI describes the sequences in a novel way based on a bioinformatic knowledge base, and the machine learning method of inductive logic programming. To evaluate HI we used the PDB40D benchmark which lists sequences of known homology but low sequence similarity. We compared the HI methodology with PSI-BLAST alone and found HI performed significantly better. In addition, Receiver Operating Characteristic (ROC) curve analysis showed that these improvements were robust for all reasonable error costs. The predictive homology rules learnt by HI by can be interpreted biologically to provide insight into conserved features of homologous protein families. CONCLUSIONS: HI is a new technique for the detection of remote protein homology--a central bioinformatic problem. HI with PSI-BLAST is shown to outperform PSI-BLAST for all error costs. It is expect that similar improvements would be obtained using HI with any sequence similarity method.


Asunto(s)
Inteligencia Artificial , Alineación de Secuencia/métodos , Homología de Secuencia de Aminoácido , Algoritmos , Animales , Biología Computacional/métodos , Bases de Datos de Proteínas , Proteínas Fúngicas/genética , Internet , Ratones , Oryza , Proteínas de Plantas/genética , Valor Predictivo de las Pruebas , Proteínas de los Retroviridae/genética , Sensibilidad y Especificidad
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA