Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Patterns (N Y) ; 4(6): 100741, 2023 Jun 09.
Artigo em Inglês | MEDLINE | ID: mdl-37409055

RESUMO

High-dimensional data analysis starts with projecting the data to low dimensions to visualize and understand the underlying data structure. Several methods have been developed for dimensionality reduction, but they are limited to cross-sectional datasets. The recently proposed Aligned-UMAP, an extension of the uniform manifold approximation and projection (UMAP) algorithm, can visualize high-dimensional longitudinal datasets. We demonstrated its utility for researchers to identify exciting patterns and trajectories within enormous datasets in biological sciences. We found that the algorithm parameters also play a crucial role and must be tuned carefully to utilize the algorithm's potential fully. We also discussed key points to remember and directions for future extensions of Aligned-UMAP. Further, we made our code open source to enhance the reproducibility and applicability of our work. We believe our benchmarking study becomes more important as more and more high-dimensional longitudinal data in biomedical research become available.

2.
NPJ Parkinsons Dis ; 8(1): 172, 2022 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-36526647

RESUMO

The clinical manifestations of Parkinson's disease (PD) are characterized by heterogeneity in age at onset, disease duration, rate of progression, and the constellation of motor versus non-motor features. There is an unmet need for the characterization of distinct disease subtypes as well as improved, individualized predictions of the disease course. We used unsupervised and supervised machine learning methods on comprehensive, longitudinal clinical data from the Parkinson's Disease Progression Marker Initiative (n = 294 cases) to identify patient subtypes and to predict disease progression. The resulting models were validated in an independent, clinically well-characterized cohort from the Parkinson's Disease Biomarker Program (n = 263 cases). Our analysis distinguished three distinct disease subtypes with highly predictable progression rates, corresponding to slow, moderate, and fast disease progression. We achieved highly accurate projections of disease progression 5 years after initial diagnosis with an average area under the curve (AUC) of 0.92 (95% CI: 0.95 ± 0.01) for the slower progressing group (PDvec1), 0.87 ± 0.03 for moderate progressors, and 0.95 ± 0.02 for the fast-progressing group (PDvec3). We identified serum neurofilament light as a significant indicator of fast disease progression among other key biomarkers of interest. We replicated these findings in an independent cohort, released the analytical code, and developed models in an open science manner. Our data-driven study provides insights to deconstruct PD heterogeneity. This approach could have immediate implications for clinical trials by improving the detection of significant clinical outcomes. We anticipate that machine learning models will improve patient counseling, clinical trial design, and ultimately individualized patient care.

3.
Stud Health Technol Inform ; 290: 694-698, 2022 Jun 06.
Artigo em Inglês | MEDLINE | ID: mdl-35673106

RESUMO

The COVID-19 pandemic has caused millions of infections and deaths worldwide in an ongoing pandemic. With the passage of time, several variants of this virus have surfaced. Machine learning methods and algorithms have been very useful in understanding the virus and its implications so far. In this paper, we have studied a set of novelty detection algorithms and applied it to the problem of detecting COVID-19 variants. Our results show accuracies of 79.64% and 82.43% on the B.1.1.7 and B.1.351 variants respectively on ProtVec unaligned COVID-19 spike protein sequences using One Class SVM with fine-tuned parameters. We believe that a system for automated and timely detection of variants will help countries formulate mitigation measures and study remedies in terms of medicines and vaccines that can protect against the new variants.


Assuntos
COVID-19 , SARS-CoV-2 , Humanos , Pandemias/prevenção & controle , Glicoproteína da Espícula de Coronavírus/metabolismo
4.
NPJ Parkinsons Dis ; 8(1): 35, 2022 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-35365675

RESUMO

Personalized medicine promises individualized disease prediction and treatment. The convergence of machine learning (ML) and available multimodal data is key moving forward. We build upon previous work to deliver multimodal predictions of Parkinson's disease (PD) risk and systematically develop a model using GenoML, an automated ML package, to make improved multi-omic predictions of PD, validated in an external cohort. We investigated top features, constructed hypothesis-free disease-relevant networks, and investigated drug-gene interactions. We performed automated ML on multimodal data from the Parkinson's progression marker initiative (PPMI). After selecting the best performing algorithm, all PPMI data was used to tune the selected model. The model was validated in the Parkinson's Disease Biomarker Program (PDBP) dataset. Our initial model showed an area under the curve (AUC) of 89.72% for the diagnosis of PD. The tuned model was then tested for validation on external data (PDBP, AUC 85.03%). Optimizing thresholds for classification increased the diagnosis prediction accuracy and other metrics. Finally, networks were built to identify gene communities specific to PD. Combining data modalities outperforms the single biomarker paradigm. UPSIT and PRS contributed most to the predictive power of the model, but the accuracy of these are supplemented by many smaller effect transcripts and risk SNPs. Our model is best suited to identifying large groups of individuals to monitor within a health registry or biobank to prioritize for further testing. This approach allows complex predictive models to be reproducible and accessible to the community, with the package, code, and results publicly available.

5.
Lancet Digit Health ; 4(5): e359-e369, 2022 05.
Artigo em Inglês | MEDLINE | ID: mdl-35341712

RESUMO

BACKGROUND: Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of ALS subtypes to obtain a better understanding of this heterogeneity, enhance our understanding of the disease, and improve clinical care. METHODS: In this retrospective study, we applied unsupervised Uniform Manifold Approximation and Projection [UMAP]) modelling, semi-supervised (neural network UMAP) modelling, and supervised (ensemble learning based on LightGBM) modelling to a population-based discovery cohort of patients who were diagnosed with ALS while living in the Piedmont and Valle d'Aosta regions of Italy, for whom detailed clinical data, such as age at symptom onset, were available. We excluded patients with missing Revised ALS Functional Rating Scale (ALSFRS-R) feature values from the unsupervised and semi-supervised steps. We replicated our findings in an independent population-based cohort of patients who were diagnosed with ALS while living in the Emilia Romagna region of Italy. FINDINGS: Between Jan 1, 1995, and Dec 31, 2015, 2858 patients were entered in the discovery cohort. After excluding 497 (17%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 2361 (83%) patients were available for the unsupervised and semi-supervised analysis. We found that semi-supervised machine learning produced the optimum clustering of the patients with ALS. These clusters roughly corresponded to the six clinical subtypes defined by the Chiò classification system (ie, bulbar, respiratory, flail arm, classical, pyramidal, and flail leg ALS). Between Jan 1, 2009, and March 1, 2018, 1097 patients were entered in the replication cohort. After excluding 108 (10%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 989 patients were available for the unsupervised and semi-supervised analysis. All 1097 patients were included in the supervised analysis. The same clusters were identified in the replication cohort. By contrast, other ALS classification schemes, such as the El Escorial categories, Milano-Torino clinical staging, and King's clinical stages, did not adequately label the clusters. Supervised learning identified 11 clinical parameters that predicted ALS clinical subtypes with high accuracy (area under the curve 0·982 [95% CI 0·980-0·983]). INTERPRETATION: Our data-driven study provides insight into the ALS population substructure and confirms that the Chiò classification system successfully identifies ALS subtypes. Additional validation is required to determine the accuracy and clinical use of these algorithms in assigning clinical subtypes. Nevertheless, our algorithms offer a broad insight into the clinical heterogeneity of ALS and help to determine the actual subtypes of disease that exist within this fatal neurodegenerative syndrome. The systematic identification of ALS subtypes will improve clinical care and clinical trial design. FUNDING: US National Institute on Aging, US National Institutes of Health, Italian Ministry of Health, European Commission, University of Torino Rita Levi Montalcini Department of Neurosciences, Emilia Romagna Regional Health Authority, and Italian Ministry of Education, University, and Research. TRANSLATIONS: For the Italian and German translations of the abstract see Supplementary Materials section.


Assuntos
Esclerose Lateral Amiotrófica , Esclerose Lateral Amiotrófica/diagnóstico , Esclerose Lateral Amiotrófica/epidemiologia , Análise por Conglomerados , Estudos de Coortes , Humanos , Aprendizado de Máquina , Estudos Retrospectivos , Estados Unidos
6.
Chaos Solitons Fractals ; 138: 110140, 2020 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-32834585

RESUMO

The COrona VIrus Disease (COVID-19) pandemic caused by the severe acute respiratory syndrome coronavirus 2 (SARS-CoV2) has resulted in a challenging number of infections and deaths worldwide. In order to combat the pandemic, several countries worldwide enforced mitigation measures in the forms of lockdowns, social distancing, and disinfection measures. In an effort to understand the dynamics of this disease, we propose a Long Short-Term Memory (LSTM) based model. We train our model on more than four months of cumulative COVID-19 cases and deaths. Our model can be adjusted based on the parameters in order to provide predictions as needed. We provide results at both the country and county levels. We also perform a quantitative comparison of mitigation measures in various counties in the United States based on the rate of difference of a short and long window parameter of the proposed LSTM model. The analyses provided by our model can provide valuable insights based on the trends in the rate of infections and deaths. This can also be of help for countries and counties deciding on mitigation and reopening strategies. We believe that the results obtained from the proposed method will contribute to societal benefits for a current global concern.

7.
PLoS One ; 14(7): e0218942, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-31283759

RESUMO

BACKGROUND: Unplanned readmission of a hospitalized patient is an indicator of patients' exposure to risk and an avoidable waste of medical resources. In addition to hospital readmission, intensive care unit (ICU) readmission brings further financial risk, along with morbidity and mortality risks. Identification of high-risk patients who are likely to be readmitted can provide significant benefits for both patients and medical providers. The emergence of machine learning solutions to detect hidden patterns in complex, multi-dimensional datasets provides unparalleled opportunities for developing an efficient discharge decision-making support system for physicians and ICU specialists. METHODS AND FINDINGS: We used supervised machine learning approaches for ICU readmission prediction. We used machine learning methods on comprehensive, longitudinal clinical data from the MIMIC-III to predict the ICU readmission of patients within 30 days of their discharge. We incorporate multiple types of features including chart events, demographic, and ICD-9 embeddings. We have utilized recent machine learning techniques such as Recurrent Neural Networks (RNN) with Long Short-Term Memory (LSTM), by this we have been able to incorporate the multivariate features of EHRs and capture sudden fluctuations in chart event features (e.g. glucose and heart rate). We show that our LSTM-based solution can better capture high volatility and unstable status in ICU patients, an important factor in ICU readmission. Our machine learning models identify ICU readmissions at a higher sensitivity rate of 0.742 (95% CI, 0.718-0.766) and an improved Area Under the Curve of 0.791 (95% CI, 0.782-0.800) compared with traditional methods. We perform in-depth deep learning performance analysis, as well as the analysis of each feature contribution to the predictive model. CONCLUSION: Our manuscript highlights the ability of machine learning models to improve our ICU decision-making accuracy and is a real-world example of precision medicine in hospitals. These data-driven solutions hold the potential for substantial clinical impact by augmenting clinical decision-making for physicians and ICU specialists. We anticipate that machine learning models will improve patient counseling, hospital administration, allocation of healthcare resources and ultimately individualized clinical care.


Assuntos
Memória de Longo Prazo/fisiologia , Memória de Curto Prazo/fisiologia , Readmissão do Paciente/estatística & dados numéricos , Bases de Dados Genéticas , Feminino , Humanos , Unidades de Terapia Intensiva/estatística & dados numéricos , Aprendizado de Máquina , Masculino , Pessoa de Meia-Idade , Redes Neurais de Computação , Oxigênio/metabolismo , Alta do Paciente
8.
Neurobiol Aging ; 57: 247.e9-247.e13, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-28602509

RESUMO

Genetics has proven to be a powerful approach in neurodegenerative diseases research, resulting in the identification of numerous causal and risk variants. Previously, we introduced the NeuroX Illumina genotyping array, a fast and efficient genotyping platform designed for the investigation of genetic variation in neurodegenerative diseases. Here, we present its updated version, named NeuroChip. The NeuroChip is a low-cost, custom-designed array containing a tagging variant backbone of about 306,670 variants complemented with a manually curated custom content comprised of 179,467 variants implicated in diverse neurological diseases, including Alzheimer's disease, Parkinson's disease, Lewy body dementia, amyotrophic lateral sclerosis, frontotemporal dementia, progressive supranuclear palsy, corticobasal degeneration, and multiple system atrophy. The tagging backbone was chosen because of the low cost and good genome-wide resolution; the custom content can be combined with other backbones, like population or drug development arrays. Using the NeuroChip, we can accurately identify rare variants and impute over 5.3 million common SNPs from the latest release of the Haplotype Reference Consortium. In summary, we describe the design and usage of the NeuroChip array and show its capability for detecting rare pathogenic variants in numerous neurodegenerative diseases. The NeuroChip has a more comprehensive and improved content, which makes it a reliable, high-throughput, cost-effective screening tool for genetic research and molecular diagnostics in neurodegenerative diseases.


Assuntos
Variação Genética/genética , Estudo de Associação Genômica Ampla/métodos , Técnicas de Genotipagem/métodos , Ensaios de Triagem em Larga Escala/métodos , Doenças Neurodegenerativas/genética , Alelos , Apolipoproteínas E/genética , Humanos , Risco
9.
PLoS Biol ; 13(7): e1002195, 2015 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-26151137

RESUMO

Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our estimates show that genomics is a "four-headed beast"--it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and analysis. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the "genomical" challenges of the next decade.


Assuntos
Genômica/tendências , Astronomia/tendências , Armazenamento e Recuperação da Informação , Mídias Sociais/tendências , Estatística como Assunto
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA