Pesquisa | BVS Aleitamento Materno

Variable-selection ANOVA Simultaneous Component Analysis (VASCA).

Camacho, José; Vitale, Raffaele; Morales-Jiménez, David; Gómez-Llorente, Carolina.

Bioinformatics ; 39(1)2023 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-36495189

RESUMO

MOTIVATION: ANOVA Simultaneous Component Analysis (ASCA) is a popular method for the analysis of multivariate data yielded by designed experiments. Meaningful associations between factors/interactions of the experimental design and measured variables in the dataset are typically identified via significance testing, with permutation tests being the standard go-to choice. However, in settings with large numbers of variables, like omics (genomics, transcriptomics, proteomics and metabolomics) experiments, the 'holistic' testing approach of ASCA (all variables considered) often overlooks statistically significant effects encoded by only a few variables (biomarkers). RESULTS: We hereby propose Variable-selection ASCA (VASCA), a method that generalizes ASCA through variable selection, augmenting its statistical power without inflating the Type-I error risk. The method is evaluated with simulations and with a real dataset from a multi-omic clinical experiment. We show that VASCA is more powerful than both ASCA and the widely adopted false discovery rate controlling procedure; the latter is used as a benchmark for variable selection based on multiple significance testing. We further illustrate the usefulness of VASCA for exploratory data analysis in comparison to the popular partial least squares discriminant analysis method and its sparse counterpart. AVAILABILITY AND IMPLEMENTATION: The code for VASCA is available in the MEDA Toolbox at https://github.com/josecamachop/MEDA-Toolbox (release v1.3). The simulation results and motivating example can be reproduced using the repository at https://github.com/josecamachop/VASCA/tree/v1.0.0 (DOI 10.5281/zenodo.7410623). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Genômica , Proteômica , Genômica/métodos , Simulação por Computador , Metabolômica , Análise de Variância

RocaSec: a standalone GUI-based package for robust co-evolutionary analysis of proteins.

Quadeer, Ahmed A; Morales-Jimenez, David; McKay, Matthew R.

Bioinformatics ; 36(7): 2262-2263, 2020 04 01.

Artigo em Inglês | MEDLINE | ID: mdl-31800008

RESUMO

SUMMARY: Patterns of mutational correlations, learnt from protein sequences, have been shown to be informative of co-evolutionary sectors that are tightly linked to functional and/or structural properties of proteins. Previously, we developed a statistical inference method, robust co-evolutionary analysis (RoCA), to reliably predict co-evolutionary sectors of proteins, while controlling for statistical errors caused by limited data. RoCA was demonstrated on multiple viral proteins, with the inferred sectors showing close correspondences with experimentally-known biochemical domains. To facilitate seamless use of RoCA and promote more widespread application to protein data, here we present a standalone cross-platform package 'RocaSec' which features an easy-to-use GUI. The package only requires the multiple sequence alignment of a protein for inferring the co-evolutionary sectors. In addition, when information on the protein biochemical domains is provided, RocaSec returns the corresponding statistical association between the inferred sectors and biochemical domains. AVAILABILITY AND IMPLEMENTATION: The RocaSec software is publicly available under the MIT License at https://github.com/ahmedaq/RocaSec. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Evolução Biológica , Software , Domínios Proteicos , Alinhamento de Sequência , Proteínas Virais

Asymptotics of eigenstructure of sample correlation matrices for high-dimensional spiked models.

Morales-Jimenez, David; Johnstone, Iain M; McKay, Matthew R; Yang, Jeha.

Stat Sin ; 31(2): 571-601, 2021 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-33833489

RESUMO

Sample correlation matrices are widely used, but for high-dimensional data little is known about their spectral properties beyond "null models", which assume the data have independent coordinates. In the class of spiked models, we apply random matrix theory to derive asymptotic first-order and distributional results for both leading eigenvalues and eigenvectors of sample correlation matrices, assuming a high-dimensional regime in which the ratio p/n, of number of variables p to sample size n, converges to a positive constant. While the first-order spectral properties of sample correlation matrices match those of sample covariance matrices, their asymptotic distributions can differ significantly. Indeed, the correlation-based fluctuations of both sample eigenvalues and eigenvectors are often remarkably smaller than those of their sample covariance counterparts.

Sub-dominant principal components inform new vaccine targets for HIV Gag.

Ahmed, Syed Faraz; Quadeer, Ahmed A; Morales-Jimenez, David; McKay, Matthew R.

Bioinformatics ; 35(20): 3884-3889, 2019 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-31250884

RESUMO

MOTIVATION: Patterns of mutational correlations, learnt from patient-derived sequences of human immunodeficiency virus (HIV) proteins, are informative of biochemically linked networks of interacting sites that may enable viral escape from the host immune system. Accurate identification of these networks is important for rationally designing vaccines which can effectively block immune escape pathways. Previous computational methods have partly identified such networks by examining the principal components (PCs) of the mutational correlation matrix of HIV Gag proteins. However, driven by a conservative approach, these methods analyze the few dominant (strongest) PCs, potentially missing information embedded within the sub-dominant (relatively weaker) ones that may be important for vaccine design. RESULTS: By using sequence data for HIV Gag, complemented by model-based simulations, we revealed that certain networks of interacting sites that appear important for vaccine design purposes are not accurately reflected by the dominant PCs. Rather, these networks are encoded jointly by both dominant and sub-dominant PCs. By incorporating information from the sub-dominant PCs, we identified a network of interacting sites of HIV Gag that associated very strongly with viral control. Based on this network, we propose several new candidates for a potent T-cell-based HIV vaccine. AVAILABILITY AND IMPLEMENTATION: Accession numbers of all sequences used and the source code scripts for all analysis and figures reported in this work are available online at https://github.com/faraz107/HIV-Gag-Immunogens. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Vacinas contra a AIDS , Infecções por HIV , Sequência de Aminoácidos , Humanos , Produtos do Gene gag do Vírus da Imunodeficiência Humana

Co-evolution networks of HIV/HCV are modular with direct association to structure and function.

Quadeer, Ahmed Abdul; Morales-Jimenez, David; McKay, Matthew R.

PLoS Comput Biol ; 14(9): e1006409, 2018 09.

Artigo em Inglês | MEDLINE | ID: mdl-30192744

RESUMO

Mutational correlation patterns found in population-level sequence data for the Human Immunodeficiency Virus (HIV) and the Hepatitis C Virus (HCV) have been demonstrated to be informative of viral fitness. Such patterns can be seen as footprints of the intrinsic functional constraints placed on viral evolution under diverse selective pressures. Here, considering multiple HIV and HCV proteins, we demonstrate that these mutational correlations encode a modular co-evolutionary structure that is tightly linked to the structural and functional properties of the respective proteins. Specifically, by introducing a robust statistical method based on sparse principal component analysis, we identify near-disjoint sets of collectively-correlated residues (sectors) having mostly a one-to-one association to largely distinct structural or functional domains. This suggests that the distinct phenotypic properties of HIV/HCV proteins often give rise to quasi-independent modes of evolution, with each mode involving a sparse and localized network of mutational interactions. Moreover, individual inferred sectors of HIV are shown to carry immunological significance, providing insight for guiding targeted vaccine strategies.

Assuntos

Infecções por HIV/virologia , HIV-1 , Hepacivirus , Hepatite C/virologia , Algoritmos , Alelos , Biologia Computacional , Simulação por Computador , Análise Mutacional de DNA , DNA Viral , Progressão da Doença , Evolução Molecular , Proteína do Núcleo p24 do HIV/fisiologia , Antígenos HLA/química , Humanos , Sistema Imunitário , Distribuição Normal , Fenótipo , Análise de Componente Principal , Domínios Proteicos , Relação Estrutura-Atividade , Produtos do Gene nef do Vírus da Imunodeficiência Humana/fisiologia

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA