Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 6 de 6
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
NPJ Digit Med ; 7(1): 117, 2024 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-38714751

RESUMO

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients: Intensive care unit admission. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (cor(Xu1, Zv1) = 0.596, p value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

2.
Res Sq ; 2023 Nov 20.
Artigo em Inglês | MEDLINE | ID: mdl-38045288

RESUMO

Through technological innovations, patient cohorts can be examined from multiple views with high-dimensional, multiscale biomedical data to classify clinical phenotypes and predict outcomes. Here, we aim to present our approach for analyzing multimodal data using unsupervised and supervised sparse linear methods in a COVID-19 patient cohort. This prospective cohort study of 149 adult patients was conducted in a tertiary care academic center. First, we used sparse canonical correlation analysis (CCA) to identify and quantify relationships across different data modalities, including viral genome sequencing, imaging, clinical data, and laboratory results. Then, we used cooperative learning to predict the clinical outcome of COVID-19 patients. We show that serum biomarkers representing severe disease and acute phase response correlate with original and wavelet radiomics features in the LLL frequency channel (corr(Xu1, Zv1) = 0.596, p-value < 0.001). Among radiomics features, histogram-based first-order features reporting the skewness, kurtosis, and uniformity have the lowest negative, whereas entropy-related features have the highest positive coefficients. Moreover, unsupervised analysis of clinical data and laboratory results gives insights into distinct clinical phenotypes. Leveraging the availability of global viral genome databases, we demonstrate that the Word2Vec natural language processing model can be used for viral genome encoding. It not only separates major SARS-CoV-2 variants but also allows the preservation of phylogenetic relationships among them. Our quadruple model using Word2Vec encoding achieves better prediction results in the supervised task. The model yields area under the curve (AUC) and accuracy values of 0.87 and 0.77, respectively. Our study illustrates that sparse CCA analysis and cooperative learning are powerful techniques for handling high-dimensional, multimodal data to investigate multivariate associations in unsupervised and supervised tasks.

4.
Cancer Cell ; 40(12): 1521-1536.e7, 2022 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-36400020

RESUMO

Ductal carcinoma in situ (DCIS) is the most common precursor of invasive breast cancer (IBC), with variable propensity for progression. We perform multiscale, integrated molecular profiling of DCIS with clinical outcomes by analyzing 774 DCIS samples from 542 patients with 7.3 years median follow-up from the Translational Breast Cancer Research Consortium 038 study and the Resource of Archival Breast Tissue cohorts. We identify 812 genes associated with ipsilateral recurrence within 5 years from treatment and develop a classifier that predicts DCIS or IBC recurrence in both cohorts. Pathways associated with recurrence include proliferation, immune response, and metabolism. Distinct stromal expression patterns and immune cell compositions are identified. Our multiscale approach employed in situ methods to generate a spatially resolved atlas of breast precancers, where complementary modalities can be directly compared and correlated with conventional pathology findings, disease states, and clinical outcome.


Assuntos
Neoplasias da Mama , Carcinoma Ductal de Mama , Carcinoma Intraductal não Infiltrante , Humanos , Feminino , Carcinoma Intraductal não Infiltrante/genética , Carcinoma Intraductal não Infiltrante/metabolismo , Carcinoma Intraductal não Infiltrante/patologia , Carcinoma Ductal de Mama/genética , Carcinoma Ductal de Mama/metabolismo , Carcinoma Ductal de Mama/patologia , Progressão da Doença , Neoplasias da Mama/patologia , Biomarcadores , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/análise
5.
Proc Natl Acad Sci U S A ; 119(38): e2202113119, 2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-36095183

RESUMO

We propose a method for supervised learning with multiple sets of features ("views"). The multiview problem is especially important in biology and medicine, where "-omics" data, such as genomics, proteomics, and radiomics, are measured on a common set of samples. "Cooperative learning" combines the usual squared-error loss of predictions with an "agreement" penalty to encourage the predictions from different data views to agree. By varying the weight of the agreement penalty, we get a continuum of solutions that include the well-known early and late fusion approaches. Cooperative learning chooses the degree of agreement (or fusion) in an adaptive manner, using a validation set or cross-validation to estimate test set prediction error. One version of our fitting procedure is modular, where one can choose different fitting mechanisms (e.g., lasso, random forests, boosting, or neural networks) appropriate for different data views. In the setting of cooperative regularized linear regression, the method combines the lasso penalty with the agreement penalty, yielding feature sparsity. The method can be especially powerful when the different data views share some underlying relationship in their signals that can be exploited to boost the signals. We show that cooperative learning achieves higher predictive accuracy on simulated data and real multiomics examples of labor-onset prediction. By leveraging aligned signals and allowing flexible fitting mechanisms for different modalities, cooperative learning offers a powerful approach to multiomics data fusion.


Assuntos
Genômica , Redes Neurais de Computação , Aprendizado de Máquina Supervisionado , Genômica/métodos
6.
Pac Symp Biocomput ; 24: 18-29, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30864307

RESUMO

Electronic phenotyping is the task of ascertaining whether an individual has a medical condition of interest by analyzing their medical record and is foundational in clinical informatics. Increasingly, electronic phenotyping is performed via supervised learning. We investigate the effectiveness of multitask learning for phenotyping using electronic health records (EHR) data. Multitask learning aims to improve model performance on a target task by jointly learning additional auxiliary tasks and has been used in disparate areas of machine learning. However, its utility when applied to EHR data has not been established, and prior work suggests that its benefits are inconsistent. We present experiments that elucidate when multitask learning with neural nets improves performance for phenotyping using EHR data relative to neural nets trained for a single phenotype and to well-tuned baselines. We find that multitask neural nets consistently outperform single-task neural nets for rare phenotypes but underperform for relatively more common phenotypes. The effect size increases as more auxiliary tasks are added. Moreover, multitask learning reduces the sensitivity of neural nets to hyperparameter settings for rare phenotypes. Last, we quantify phenotype complexity and find that neural nets trained with or without multitask learning do not improve on simple baselines unless the phenotypes are sufficiently complex.


Assuntos
Registros Eletrônicos de Saúde/estatística & dados numéricos , Aprendizado de Máquina , Algoritmos , Biologia Computacional , Bases de Dados Factuais , Aprendizado Profundo , Humanos , Modelos Logísticos , Informática Médica , Redes Neurais de Computação , Fenótipo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA