Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 3 de 3
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Brief Bioinform ; 22(6)2021 11 05.
Artigo em Inglês | MEDLINE | ID: mdl-34254998

RESUMO

Statistical analysis of ultrahigh-dimensional omics scale data has long depended on univariate hypothesis testing. With growing data features and samples, the obvious next step is to establish multivariable association analysis as a routine method to describe genotype-phenotype association. Here we present ParProx, a state-of-the-art implementation to optimize overlapping and non-overlapping group lasso regression models for time-to-event and classification analysis, with selection of variables grouped by biological priors. ParProx enables multivariable model fitting for ultrahigh-dimensional data within an architecture for parallel or distributed computing via latent variable group representation. It thereby aims to produce interpretable regression models consistent with known biological relationships among independent variables, a property often explored post hoc, not during model estimation. Simulation studies clearly demonstrate the scalability of ParProx with graphics processing units in comparison to existing implementations. We illustrate the tool using three different omics data sets featuring moderate to large numbers of variables, where we use genomic regions and biological pathways as variable groups, rendering the selected independent variables directly interpretable with respect to those groups. ParProx is applicable to a wide range of studies using ultrahigh-dimensional omics data, from genome-wide association analysis to multi-omics studies where model estimation is computationally intractable with existing implementation.


Assuntos
Algoritmos , Biologia Computacional/métodos , Genômica/métodos , Análise de Regressão , Software , Biomarcadores , Suscetibilidade a Doenças , Perfilação da Expressão Gênica , Humanos , Mutação , Prognóstico , Modelos de Riscos Proporcionais , Mapeamento de Interação de Proteínas
2.
Hum Mutat ; 41(5): 934-945, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-31930623

RESUMO

Somatic mutations are early drivers of tumorigenesis and tumor progression. However, the mutations typically occur at variable positions across different individuals, resulting in the data being too sparse to test meaningful associations between variants and phenotypes. To overcome this challenge, we devised a novel approach called Gene-to-Protein-to-Disease (GPD) which accumulates variants into new sequence units as the degree of genetic assault on structural or functional units of each protein. The variant frequencies in the sequence units were highly reproducible between two large cancer cohorts. Survival analysis identified 232 sequence units in which somatic mutations had deleterious effects on overall survival, including consensus driver mutations obtained from multiple calling algorithms. By contrast, around 76% of the survival predictive units had been undetected by conventional gene-level analysis. We demonstrate the ability of these signatures to separate patient groups according to overall survival, therefore, providing novel prognostic tools for various cancers. GPD also identified sequence units with somatic mutations whose impact on survival was modified by the occupancy of germline variants in the surrounding regions. The findings indicate that a patient's genetic predisposition interacts with the effect of somatic mutations on survival outcomes in some cancers.


Assuntos
Sequenciamento do Exoma , Exoma , Estudos de Associação Genética , Predisposição Genética para Doença , Variação Genética , Proteômica , Algoritmos , Mapeamento Cromossômico , Biologia Computacional/métodos , Bases de Dados Genéticas , Estudos de Associação Genética/métodos , Testes Genéticos , Genômica/métodos , Humanos , Estimativa de Kaplan-Meier , Mutação , Neoplasias/genética , Neoplasias/mortalidade , Neoplasias/patologia , Fenótipo , Prognóstico , Proteômica/métodos , Reprodutibilidade dos Testes
3.
Mol Omics ; 14(3): 197-209, 2018 Jun 12.
Artigo em Inglês | MEDLINE | ID: mdl-29876573

RESUMO

While tandem mass spectrometry can detect post-translational modifications (PTM) at the proteome scale, reported PTM sites are often incomplete and include false positives. Computational approaches can complement these datasets by additional predictions, but most available tools use prediction models pre-trained for single PTM type by the developers and it remains a difficult task to perform large-scale batch prediction for multiple PTMs with flexible user control, including the choice of training data. We developed an R package called PTMscape which predicts PTM sites across the proteome based on a unified and comprehensive set of descriptors of the physico-chemical microenvironment of modified sites, with additional downstream analysis modules to test enrichment of individual or pairs of PTMs in protein domains. PTMscape is flexible in the ability to process any major modifications, such as phosphorylation and ubiquitination, while achieving the sensitivity and specificity comparable to single-PTM methods and outperforming other multi-PTM tools. Applying this framework, we expanded proteome-wide coverage of five major PTMs affecting different residues by prediction, especially for lysine and arginine modifications. Using a combination of experimentally acquired sites (PSP) and newly predicted sites, we discovered that the crosstalk among multiple PTMs occur more frequently than by random chance in key protein domains such as histone, protein kinase, and RNA recognition motifs, spanning various biological processes such as RNA processing, DNA damage response, signal transduction, and regulation of cell cycle. These results provide a proteome-scale analysis of crosstalk among major PTMs and can be easily extended to other types of PTM.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA