Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
BMC Bioinformatics ; 25(1): 69, 2024 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-38350879

RESUMO

BACKGROUND: Technological advances have enabled the generation of unique and complementary types of data or views (e.g. genomics, proteomics, metabolomics) and opened up a new era in multiview learning research with the potential to lead to new biomedical discoveries. RESULTS: We propose iDeepViewLearn (Interpretable Deep Learning Method for Multiview Learning) to learn nonlinear relationships in data from multiple views while achieving feature selection. iDeepViewLearn combines deep learning flexibility with the statistical benefits of data and knowledge-driven feature selection, giving interpretable results. Deep neural networks are used to learn view-independent low-dimensional embedding through an optimization problem that minimizes the difference between observed and reconstructed data, while imposing a regularization penalty on the reconstructed data. The normalized Laplacian of a graph is used to model bilateral relationships between variables in each view, therefore, encouraging selection of related variables. iDeepViewLearn is tested on simulated and three real-world data for classification, clustering, and reconstruction tasks. For the classification tasks, iDeepViewLearn had competitive classification results with state-of-the-art methods in various settings. For the clustering task, we detected molecular clusters that differed in their 10-year survival rates for breast cancer. For the reconstruction task, we were able to reconstruct handwritten images using a few pixels while achieving competitive classification accuracy. The results of our real data application and simulations with small to moderate sample sizes suggest that iDeepViewLearn may be a useful method for small-sample-size problems compared to other deep learning methods for multiview learning. CONCLUSION: iDeepViewLearn is an innovative deep learning model capable of capturing nonlinear relationships between data from multiple views while achieving feature selection. It is fully open source and is freely available at https://github.com/lasandrall/iDeepViewLearn .


Assuntos
Aprendizado Profundo , Análise por Conglomerados , Genômica , Conhecimento , Metabolômica
2.
AIDS Behav ; 2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38326668

RESUMO

We investigate risk factors for severe COVID-19 in persons living with HIV (PWH), including among racialized PWH, using the U.S. population-sampled National COVID Cohort Collaborative (N3C) data released from January 1, 2020 to October 10, 2022. We defined severe COVID-19 as hospitalized with invasive mechanical ventilation, extracorporeal membrane oxygenation, discharge to hospice or death. We used machine learning methods to identify highly ranked, uncorrelated factors predicting severe COVID-19, and used multivariable logistic regression models to assess the associations of these variables with severe COVID-19 in several models, including race-stratified models. There were 3 241 627 individuals with incident COVID-19 cases and 81 549 (2.5%) with severe COVID-19, of which 17 445 incident COVID-19 and 1 020 (5.8%) severe cases were among PWH. The top highly ranked factors of severe COVID-19 were age, congestive heart failure (CHF), dementia, renal disease, sodium concentration, smoking status, and sex. Among PWH, age and sodium concentration were important predictors of COVID-19 severity, and the effect of sodium concentration was more pronounced in Hispanics (aOR 4.11 compared to aOR range: 1.47-1.88 for Black, White, and Other non-Hispanics). Dementia, CHF, and renal disease was associated with higher odds of severe COVID-19 among Black, Hispanic, and Other non-Hispanics PWH, respectively. Our findings suggest that the impact of factors, especially clinical comorbidities, predictive of severe COVID-19 among PWH varies by racialized groups, highlighting a need to account for race and comorbidity burden when assessing the risk of PWH developing severe COVID-19.

3.
J Infect Dis ; 227(8): 951-960, 2023 04 18.
Artigo em Inglês | MEDLINE | ID: mdl-36580481

RESUMO

BACKGROUND: There is an incompletely understood increased risk for cardiovascular disease (CVD) among people with HIV (PWH). We investigated if a collection of biomarkers were associated with CVD among PWH. Mendelian randomization (MR) was used to identify potentially causal associations. METHODS: Data from follow-up in 4 large trials among PWH were used to identify 131 incident CVD cases and they were matched to 259 participants without incident CVD (controls). Tests of associations between 460 baseline protein levels and case status were conducted. RESULTS: Univariate analysis found CLEC6A, HGF, IL-6, IL-10RB, and IGFBP7 as being associated with case status and a multivariate model identified 3 of these: CLEC6A (odds ratio [OR] = 1.48, P = .037), HGF (OR = 1.83, P = .012), and IL-6 (OR = 1.45, P = .016). MR methods identified 5 significantly associated proteins: AXL, CHI3L1, GAS6, IL-6RA, and SCGB3A2. CONCLUSIONS: These results implicate inflammatory and fibrotic processes as contributing to CVD. While some of these biomarkers are well established in the general population and in PWH (IL-6 and its receptor), some are novel to PWH (HGF, AXL, and GAS6) and some are novel overall (CLEC6A). Further investigation into the uniqueness of these biomarkers in PWH and the role of these biomarkers as targets among PWH is warranted.


Assuntos
Doenças Cardiovasculares , Infecções por HIV , Humanos , Doenças Cardiovasculares/epidemiologia , Fatores de Risco , Interleucina-6 , Biomarcadores , Infecções por HIV/complicações
4.
BMC Genomics ; 24(1): 319, 2023 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-37308820

RESUMO

BACKGROUND: There is still more to learn about the pathobiology of COVID-19. A multi-omic approach offers a holistic view to better understand the mechanisms of COVID-19. We used state-of-the-art statistical learning methods to integrate genomics, metabolomics, proteomics, and lipidomics data obtained from 123 patients experiencing COVID-19 or COVID-19-like symptoms for the purpose of identifying molecular signatures and corresponding pathways associated with the disease. RESULTS: We constructed and validated molecular scores and evaluated their utility beyond clinical factors known to impact disease status and severity. We identified inflammation- and immune response-related pathways, and other pathways, providing insights into possible consequences of the disease. CONCLUSIONS: The molecular scores we derived were strongly associated with disease status and severity and can be used to identify individuals at a higher risk for developing severe disease. These findings have the potential to provide further, and needed, insights into why certain individuals develop worse outcomes.


Assuntos
COVID-19 , Multiômica , Humanos , Metabolômica , Genômica , Inflamação
5.
Biostatistics ; 24(1): 124-139, 2022 12 12.
Artigo em Inglês | MEDLINE | ID: mdl-33969382

RESUMO

The problem of associating data from multiple sources and predicting an outcome simultaneously is an important one in modern biomedical research. It has potential to identify multidimensional array of variables predictive of a clinical outcome and to enhance our understanding of the pathobiology of complex diseases. Incorporating functional knowledge in association and prediction models can reveal pathways contributing to disease risk. We propose Bayesian hierarchical integrative analysis models that associate multiple omics data, predict a clinical outcome, allow for prior functional information, and can accommodate clinical covariates. The models, motivated by available data and the need for exploring other risk factors of atherosclerotic cardiovascular disease (ASCVD), are used for integrative analysis of clinical, demographic, and genomics data to identify genetic variants, genes, and gene pathways likely contributing to 10-year ASCVD risk in healthy adults. Our findings revealed several genetic variants, genes, and gene pathways that are highly associated with ASCVD risk, with some already implicated in cardiovascular disease (CVD) risk. Extensive simulations demonstrate the merit of joint association and prediction models over two-stage methods: association followed by prediction.


Assuntos
Aterosclerose , Doenças Cardiovasculares , Adulto , Humanos , Teorema de Bayes , Doenças Cardiovasculares/etiologia , Doenças Cardiovasculares/genética , Aterosclerose/etiologia , Aterosclerose/genética , Fatores de Risco , Genômica/métodos , Medição de Risco
6.
BMC Bioinformatics ; 23(1): 168, 2022 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-35525975

RESUMO

BACKGROUND: Dimension reduction and variable selection play a critical role in the analysis of contemporary high-dimensional data. The semi-parametric multi-index model often serves as a reasonable model for analysis of such high-dimensional data. The sliced inverse regression (SIR) method, which can be formulated as a generalized eigenvalue decomposition problem, offers a model-free estimation approach for the indices in the semi-parametric multi-index model. Obtaining sparse estimates of the eigenvectors that constitute the basis matrix that is used to construct the indices is desirable to facilitate variable selection, which in turn facilitates interpretability and model parsimony. RESULTS: To this end, we propose a group-Dantzig selector type formulation that induces row-sparsity to the sliced inverse regression dimension reduction vectors. Extensive simulation studies are carried out to assess the performance of the proposed method, and compare it with other state of the art methods in the literature. CONCLUSION: The proposed method is shown to yield competitive estimation, prediction, and variable selection performance. Three real data applications, including a metabolomics depression study, are presented to demonstrate the method's effectiveness in practice.

7.
Biometrics ; 78(2): 612-623, 2022 06.
Artigo em Inglês | MEDLINE | ID: mdl-33739448

RESUMO

Classification methods that leverage the strengths of data from multiple sources (multiview data) simultaneously have enormous potential to yield more powerful findings than two-step methods: association followed by classification. We propose two methods, sparse integrative discriminant analysis (SIDA), and SIDA with incorporation of network information (SIDANet), for joint association and classification studies. The methods consider the overall association between multiview data, and the separation within each view in choosing discriminant vectors that are associated and optimally separate subjects into different classes. SIDANet is among the first methods to incorporate prior structural information in joint association and classification studies. It uses the normalized Laplacian of a graph to smooth coefficients of predictor variables, thus encouraging selection of predictors that are connected. We demonstrate the effectiveness of our methods on a set of synthetic datasets and explore their use in identifying potential nontraditional risk factors that discriminate healthy patients at low versus high risk for developing atherosclerosis cardiovascular disease in 10 years. Our findings underscore the benefit of joint association and classification methods if the goal is to correlate multiview data and to perform classification.


Assuntos
Análise Discriminante , Humanos
8.
Artigo em Inglês | MEDLINE | ID: mdl-36119152

RESUMO

Analyzing multi-source data, which are multiple views of data on the same subjects, has become increasingly common in molecular biomedical research. Recent methods have sought to uncover underlying structure and relationships within and/or between the data sources, and other methods have sought to build a predictive model for an outcome using all sources. However, existing methods that do both are presently limited because they either (1) only consider data structure shared by all datasets while ignoring structures unique to each source, or (2) they extract underlying structures first without consideration to the outcome. The proposed method, supervised joint and individual variation explained (sJIVE), can simultaneously (1) identify shared (joint) and source-specific (individual) underlying structure and (2) build a linear prediction model for an outcome using these structures. These two components are weighted to compromise between explaining variation in the multi-source data and in the outcome. Simulations show sJIVE to outperform existing methods when large amounts of noise are present in the multi-source data. An application to data from the COPDGene study explores gene expression and proteomic patterns associated with lung function.

9.
BMC Bioinformatics ; 21(1): 283, 2020 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-32620072

RESUMO

BACKGROUND: The problem of assessing associations between multiple omics data including genomics and metabolomics data to identify biomarkers potentially predictive of complex diseases has garnered considerable research interest nowadays. A popular epidemiology approach is to consider an association of each of the predictors with each of the response using a univariate linear regression model, and to select predictors that meet a priori specified significance level. Although this approach is simple and intuitive, it tends to require larger sample size which is costly. It also assumes variables for each data type are independent, and thus ignores correlations that exist between variables both within each data type and across the data types. RESULTS: We consider a multivariate linear regression model that relates multiple predictors with multiple responses, and to identify multiple relevant predictors that are simultaneously associated with the responses. We assume the coefficient matrix of the responses on the predictors is both row-sparse and of low-rank, and propose a group Dantzig type formulation to estimate the coefficient matrix. CONCLUSION: Extensive simulations demonstrate the competitive performance of our proposed method when compared to existing methods in terms of estimation, prediction, and variable selection. We use the proposed method to integrate genomics and metabolomics data to identify genetic variants that are potentially predictive of atherosclerosis cardiovascular disease (ASCVD) beyond well-established risk factors. Our analysis shows some genetic variants that increase prediction of ASCVD beyond some well-established factors of ASCVD, and also suggest a potential utility of the identified genetic variants in explaining possible association between certain metabolites and ASCVD.


Assuntos
Genômica/métodos , Metabolômica/métodos , Aterosclerose/genética , Variação Genética , Humanos , Modelos Lineares , Análise Multivariada
10.
Bioinformatics ; 35(6): 1018-1025, 2019 03 15.
Artigo em Inglês | MEDLINE | ID: mdl-30165424

RESUMO

MOTIVATION: Co-inertia analysis (CIA) is a multivariate statistical analysis method that can assess relationships and trends in two sets of data. Recently CIA has been used for an integrative analysis of multiple high-dimensional omics data. However, for classical CIA, all elements in the loading vectors are nonzero, presenting a challenge for the interpretation when analyzing omics data. For other multivariate statistical methods such as canonical correlation analysis (CCA), penalized least squares (PLS), various approaches have been proposed to produce sparse loading vectors via l1-penalization/constraint. We propose a novel CIA method that uses l1-penalization to induce sparsity in estimators of loading vectors. Our method simultaneously conducts model fitting and variable selection. Also, we propose another CIA method that incorporates structure/network information such as those from functional genomics, besides using sparsity penalty so that one can get biologically meaningful and interpretable results. RESULTS: Extensive simulations demonstrate that our proposed penalized CIA methods achieve the best or close to the best performance compared to the existing CIA method in terms of feature selection and recovery of true loading vectors. Also, we apply our methods to the integrative analysis of gene expression data and protein abundance data from the NCI-60 cancer cell lines. Our analysis of the NCI-60 cancer cell line data reveals meaningful variables for cancer diseases and biologically meaningful results that are consistent with previous studies. AVAILABILITY AND IMPLEMENTATION: Our algorithms are implemented as an R package which is freely available at: https://www.med.upenn.edu/long-lab/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Biologia Computacional , Biometria , Análise dos Mínimos Quadrados , Análise Multivariada
11.
Biometrics ; 74(1): 300-312, 2018 03.
Artigo em Inglês | MEDLINE | ID: mdl-28482123

RESUMO

Integrative analysis of high dimensional omics data is becoming increasingly popular. At the same time, incorporating known functional relationships among variables in analysis of omics data has been shown to help elucidate underlying mechanisms for complex diseases. In this article, our goal is to assess association between transcriptomic and metabolomic data from a Predictive Health Institute (PHI) study that includes healthy adults at a high risk of developing cardiovascular diseases. Adopting a strategy that is both data-driven and knowledge-based, we develop statistical methods for sparse canonical correlation analysis (CCA) with incorporation of known biological information. Our proposed methods use prior network structural information among genes and among metabolites to guide selection of relevant genes and metabolites in sparse CCA, providing insight on the molecular underpinning of cardiovascular disease. Our simulations demonstrate that the structured sparse CCA methods outperform several existing sparse CCA methods in selecting relevant genes and metabolites when structural information is informative and are robust to mis-specified structural information. Our analysis of the PHI study reveals that a number of gene and metabolic pathways including some known to be associated with cardiovascular diseases are enriched in the set of genes and metabolites selected by our proposed approach.


Assuntos
Biometria/métodos , Correlação de Dados , Metaboloma , Modelos Estatísticos , Transcriptoma , Adulto , Doenças Cardiovasculares/genética , Doenças Cardiovasculares/metabolismo , Simulação por Computador , Humanos
12.
Biometrics ; 74(4): 1362-1371, 2018 12.
Artigo em Inglês | MEDLINE | ID: mdl-29750830

RESUMO

We present a method for individual and integrative analysis of high dimension, low sample size data that capitalizes on the recurring theme in multivariate analysis of projecting higher dimensional data onto a few meaningful directions that are solutions to a generalized eigenvalue problem. We propose a general framework, called SELP (Sparse Estimation with Linear Programming), with which one can obtain a sparse estimate for a solution vector of a generalized eigenvalue problem. We demonstrate the utility of SELP on canonical correlation analysis for an integrative analysis of methylation and gene expression profiles from a breast cancer study, and we identify some genes known to be associated with breast carcinogenesis, which indicates that the proposed method is capable of generating biologically meaningful insights. Simulation studies suggest that the proposed method performs competitive in comparison with some existing methods in identifying true signals in various underlying covariance structures.


Assuntos
Biometria/métodos , Neoplasias da Mama/genética , Carcinogênese/genética , Simulação por Computador/estatística & dados numéricos , Metilação de DNA , Humanos , Análise Multivariada , Tamanho da Amostra , Transcriptoma
13.
BMC Bioinformatics ; 18(1): 332, 2017 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-28697740

RESUMO

BACKGROUND: Sparse principal component analysis (PCA) is a popular tool for dimensionality reduction, pattern recognition, and visualization of high dimensional data. It has been recognized that complex biological mechanisms occur through concerted relationships of multiple genes working in networks that are often represented by graphs. Recent work has shown that incorporating such biological information improves feature selection and prediction performance in regression analysis, but there has been limited work on extending this approach to PCA. In this article, we propose two new sparse PCA methods called Fused and Grouped sparse PCA that enable incorporation of prior biological information in variable selection. RESULTS: Our simulation studies suggest that, compared to existing sparse PCA methods, the proposed methods achieve higher sensitivity and specificity when the graph structure is correctly specified, and are fairly robust to misspecified graph structures. Application to a glioblastoma gene expression dataset identified pathways that are suggested in the literature to be related with glioblastoma. CONCLUSIONS: The proposed sparse PCA methods Fused and Grouped sparse PCA can effectively incorporate prior biological information in variable selection, leading to improved feature selection and more interpretable principal component loadings and potentially providing insights on molecular underpinnings of complex diseases.


Assuntos
Genômica/métodos , Análise de Componente Principal , Algoritmos , Humanos
14.
Bioinform Adv ; 4(1): vbae005, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38304121

RESUMO

Summary: The package mvlearnR and accompanying Shiny App is intended for integrating data from multiple sources or views or modalities (e.g. genomics, proteomics, clinical, and demographic data). Most existing software packages for multiview learning are decentralized and offer limited capabilities, making it difficult for users to perform comprehensive integrative analysis. The new package wraps statistical and machine learning methods and graphical tools, providing a convenient and easy data integration workflow. For users with limited programming language, we provide a Shiny Application to facilitate data integration anywhere and on any device. The methods have potential to offer deeper insights into complex disease mechanisms. Availability and implementation: mvlearnR is available from the following GitHub repository: https://github.com/lasandrall/mvlearnR. The web application is hosted on shinyapps.io and available at: https://multi-viewlearn.shinyapps.io/MultiView_Modeling/.

15.
J Am Heart Assoc ; 12(13): e027273, 2023 07 04.
Artigo em Inglês | MEDLINE | ID: mdl-37345752

RESUMO

Background Cardiovascular disease risk prediction models underestimate CVD risk in people living with HIV (PLWH). Our goal is to derive a risk score based on protein biomarkers that could be used to predict CVD in PLWH. Methods and Results In a matched case-control study, we analyzed normalized protein expression data for participants enrolled in 1 of 4 trials conducted by INSIGHT (International Network for Strategic Initiatives in Global HIV Trials). We used dimension reduction, variable selection and resampling methods, and multivariable conditional logistic regression models to determine candidate protein biomarkers and to generate a protein score for predicting CVD in PLWH. We internally validated our findings using bootstrap. A protein score that was derived from 8 proteins (including HGF [hepatocyte growth factor] and interleukin-6) was found to be associated with an increased risk of CVD after adjustment for CVD and HIV factors (odds ratio: 2.17 [95% CI: 1.58-2.99]). The protein score improved CVD prediction when compared with predicting CVD risk using the individual proteins that comprised the protein score. Individuals with a protein score above the median score were 3.10 (95% CI, 1.83-5.41) times more likely to develop CVD than those with a protein score below the median score. Conclusions A panel of blood biomarkers may help identify PLWH at a high risk for developing CVD. If validated, such a score could be used in conjunction with established factors to identify CVD at-risk individuals who might benefit from aggressive risk reduction, ultimately shedding light on CVD pathogenesis in PLWH.


Assuntos
Doenças Cardiovasculares , Infecções por HIV , Humanos , Doenças Cardiovasculares/diagnóstico , Doenças Cardiovasculares/epidemiologia , Doenças Cardiovasculares/complicações , Estudos de Casos e Controles , Infecções por HIV/diagnóstico , Infecções por HIV/epidemiologia , Infecções por HIV/complicações , Fatores de Risco , Biomarcadores
16.
Metabolites ; 13(11)2023 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-37999202

RESUMO

Metabolic disease is a significant risk factor for severe COVID-19 infection, but the contributing pathways are not yet fully elucidated. Using data from two randomized controlled trials across 13 U.S. academic centers, our goal was to characterize metabolic features that predict severe COVID-19 and define a novel baseline metabolomic signature. Individuals (n = 133) were dichotomized as having mild or moderate/severe COVID-19 disease based on the WHO ordinal scale. Blood samples were analyzed using the Biocrates platform, providing 630 targeted metabolites for analysis. Resampling techniques and machine learning models were used to determine metabolomic features associated with severe disease. Ingenuity Pathway Analysis (IPA) was used for functional enrichment analysis. To aid in clinical decision making, we created baseline metabolomics signatures of low-correlated molecules. Multivariable logistic regression models were fit to associate these signatures with severe disease on training data. A three-metabolite signature, lysophosphatidylcholine a C17:0, dihydroceramide (d18:0/24:1), and triacylglyceride (20:4_36:4), resulted in the best discrimination performance with an average test AUROC of 0.978 and F1 score of 0.942. Pathways related to amino acids were significantly enriched from the IPA analyses, and the mitogen-activated protein kinase kinase 5 (MAP2K5) was differentially activated between groups. In conclusion, metabolites related to lipid metabolism efficiently discriminated between mild vs. moderate/severe disease. SDMA and GABA demonstrated the potential to discriminate between these two groups as well. The mitogen-activated protein kinase kinase 5 (MAP2K5) regulator is differentially activated between groups, suggesting further investigation as a potential therapeutic pathway.

17.
Sci Rep ; 13(1): 20315, 2023 11 20.
Artigo em Inglês | MEDLINE | ID: mdl-37985892

RESUMO

Significant progress has been made in preventing severe COVID-19 disease through the development of vaccines. However, we still lack a validated baseline predictive biologic signature for the development of more severe disease in both outpatients and inpatients infected with SARS-CoV-2. The objective of this study was to develop and externally validate, via 5 international outpatient and inpatient trials and/or prospective cohort studies, a novel baseline proteomic signature, which predicts the development of moderate or severe (vs mild) disease in patients with COVID-19 from a proteomic analysis of 7000 + proteins. The secondary objective was exploratory, to identify (1) individual baseline protein levels and/or (2) protein level changes within the first 2 weeks of acute infection that are associated with the development of moderate/severe (vs mild) disease. For model development, samples collected from 2 randomized controlled trials were used. Plasma was isolated and the SomaLogic SomaScan platform was used to characterize protein levels for 7301 proteins of interest for all studies. We dichotomized 113 patients as having mild or moderate/severe COVID-19 disease. An elastic net approach was used to develop a predictive proteomic signature. For validation, we applied our signature to data from three independent prospective biomarker studies. We found 4110 proteins measured at baseline that significantly differed between patients with mild COVID-19 and those with moderate/severe COVID-19 after adjusting for multiple hypothesis testing. Baseline protein expression was associated with predicted disease severity with an error rate of 4.7% (AUC = 0.964). We also found that five proteins (Afamin, I-309, NKG2A, PRS57, LIPK) and patient age serve as a signature that separates patients with mild COVID-19 and patients with moderate/severe COVID-19 with an error rate of 1.77% (AUC = 0.9804). This panel was validated using data from 3 external studies with AUCs of 0.764 (Harvard University), 0.696 (University of Colorado), and 0.893 (Karolinska Institutet). In this study we developed and externally validated a baseline COVID-19 proteomic signature associated with disease severity for potential use in both outpatients and inpatients with COVID-19.


Assuntos
COVID-19 , Humanos , Estudos Prospectivos , SARS-CoV-2 , Proteômica , Biomarcadores
18.
PLoS One ; 17(4): e0267047, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35468151

RESUMO

COVID-19 is a disease characterized by its seemingly unpredictable clinical outcomes. In order to better understand the molecular signature of the disease, a recent multi-omics study was done which looked at correlations between biomolecules and used a tree- based machine learning approach to predict clinical outcomes. This study specifically looked at patients admitted to the hospital experiencing COVID-19 or COVID-19 like symptoms. In this paper we examine the same multi-omics data, however we take a different approach, and we identify stable molecules of interest for further pathway analysis. We used stability selection, regularized regression models, enrichment analysis, and principal components analysis on proteomics, metabolomics, lipidomics, and RNA sequencing data, and we determined key molecules and biological pathways in disease severity, and disease status. In addition to the individual omics analyses, we perform the integrative method Sparse Multiple Canonical Correlation Analysis to analyse relationships of the different view of data. Our findings suggest that COVID-19 status is associated with the cell cycle and death, as well as the inflammatory response. This relationship is reflected in all four sets of molecules analyzed. We further observe that the metabolic processes, particularly processes to do with vitamin absorption and cholesterol are implicated in COVID-19 status and severity.


Assuntos
COVID-19 , Humanos , Aprendizado de Máquina , Metabolômica/métodos , Proteômica/métodos
19.
Stat Methods Med Res ; 31(11): 2201-2216, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-36113157

RESUMO

In many biomedical research, multiple views of data (e.g. genomics, proteomics) are available, and a particular interest might be the detection of sample subgroups characterized by specific groups of variables. Biclustering methods are well-suited for this problem as they assume that specific groups of variables might be relevant only to specific groups of samples. Many biclustering methods exist for detecting row-column clusters in a view but few methods exist for data from multiple views. The few existing algorithms are heavily dependent on regularization parameters for getting row-column clusters, and they impose unnecessary burden on users thus limiting their use in practice. We extend an existing biclustering method based on sparse singular value decomposition for single-view data to data from multiple views. Our method, integrative sparse singular value decomposition (iSSVD), incorporates stability selection to control Type I error rates, estimates the probability of samples and variables to belong to a bicluster, finds stable biclusters, and results in interpretable row-column associations. Simulations and real data analyses show that integrative sparse singular value decomposition outperforms several other single- and multi-view biclustering methods and is able to detect meaningful biclusters. iSSVD is a user-friendly, computationally efficient algorithm that will be useful in many disease subtyping applications.


Assuntos
Algoritmos , Perfilação da Expressão Gênica , Análise por Conglomerados , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Perfilação da Expressão Gênica/métodos
20.
ArXiv ; 2021 Nov 18.
Artigo em Inglês | MEDLINE | ID: mdl-34815984

RESUMO

COVID-19 severity is due to complications from SARS-Cov-2 but the clinical course of the infection varies for individuals, emphasizing the need to better understand the disease at the molecular level. We use clinical and multiple molecular data (or views) obtained from patients with and without COVID-19 who were (or not) admitted to the intensive care unit to shed light on COVID-19 severity. Methods for jointly associating the views and separating the COVID-19 groups (i.e., one-step methods) have focused on linear relationships. The relationships between the views and COVID-19 patient groups, however, are too complex to be understood solely by linear methods. Existing nonlinear one-step methods cannot be used to identify signatures to aid in our understanding of the complexity of the disease. We propose Deep IDA (Integrative Discriminant Analysis) to address analytical challenges in our problem of interest. Deep IDA learns nonlinear projections of two or more views that maximally associate the views and separate the classes in each view, and permits feature ranking for interpretable findings. Our applications demonstrate that Deep IDA has competitive classification rates compared to other state-of-the-art methods and is able to identify molecular signatures that facilitate an understanding of COVID-19 severity.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA