Pesquisa | BVS IEC

Gene expression variability and the analysis of large-scale RNA-seq studies with the MDSeq.

Ran, Di; Daye, Z John.

Nucleic Acids Res ; 45(13): e127, 2017 Jul 27.

Artigo em Inglês | MEDLINE | ID: mdl-28535263

RESUMO

Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. The MDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing large-scale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved.

Assuntos

Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Cerebelo/metabolismo , Córtex Cerebral/metabolismo , Perfilação da Expressão Gênica/estatística & dados numéricos , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Modelos Lineares , RNA/genética , Análise de Sequência de RNA/estatística & dados numéricos , Pele/metabolismo , Pele/efeitos da radiação , Luz Solar/efeitos adversos

Ridle for sparse regression with mandatory covariates with application to the genetic assessment of histologic grades of breast cancer.

Zhai, Jing; Hsu, Chiu-Hsieh; Daye, Z John.

BMC Med Res Methodol ; 17(1): 12, 2017 01 25.

Artigo em Inglês | MEDLINE | ID: mdl-28122498

RESUMO

BACKGROUND: Many questions in statistical genomics can be formulated in terms of variable selection of candidate biological factors for modeling a trait or quantity of interest. Often, in these applications, additional covariates describing clinical, demographical or experimental effects must be included a priori as mandatory covariates while allowing the selection of a large number of candidate or optional variables. As genomic studies routinely require mandatory covariates, it is of interest to propose principled methods of variable selection that can incorporate mandatory covariates. METHODS: In this article, we propose the ridge-lasso hybrid estimator (ridle), a new penalized regression method that simultaneously estimates coefficients of mandatory covariates while allowing selection for others. The ridle provides a principled approach to mitigate effects of multicollinearity among the mandatory covariates and possible dependency between mandatory and optional variables. We provide detailed empirical and theoretical studies to evaluate our method. In addition, we develop an efficient algorithm for the ridle. Software, based on efficient Fortran code with R-language wrappers, is publicly and freely available at https://sites.google.com/site/zhongyindaye/software . RESULTS: The ridle is useful when mandatory predictors are known to be significant due to prior knowledge or must be kept for additional analysis. Both theoretical and comprehensive simulation studies have shown that the ridle to be advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. A microarray gene expression analysis of the histologic grades of breast cancer has identified 24 genes, in which 2 genes are selected only by the ridle among current methods and found to be associated with tumor grade. CONCLUSIONS: In this article, we proposed the ridle as a principled sparse regression method for the selection of optional variables while incorporating mandatory ones. Results suggest that the ridle is advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves.

Assuntos

Algoritmos , Neoplasias da Mama/genética , Biologia Computacional/métodos , Genômica/métodos , Modelos Lineares , Biomarcadores Tumorais/genética , Neoplasias da Mama/patologia , Simulação por Computador , Regulação Neoplásica da Expressão Gênica , Humanos , Gradação de Tumores , Reprodutibilidade dos Testes , Medição de Risco , Fatores de Risco , Software

Accuracy of Unenhanced MR Imaging in the Detection of Acute Appendicitis: Single-Institution Clinical Performance Review.

Petkovska, Iva; Martin, Diego R; Covington, Matthew F; Urbina, Shannon; Duke, Eugene; Daye, Z John; Stolz, Lori A; Keim, Samuel M; Costello, James R; Chundru, Surya; Arif-Tiwari, Hina; Gilbertson-Dahdal, Dorothy; Gries, Lynn; Kalb, Bobby.

Radiology ; 279(2): 451-60, 2016 May.

Artigo em Inglês | MEDLINE | ID: mdl-26807893

RESUMO

PURPOSE: To determine the accuracy of unenhanced magnetic resonance (MR) imaging in the detection of acute appendicitis in patients younger than 50 years who present to the emergency department with right lower quadrant (RLQ) pain. MATERIALS AND METHODS: The institutional review board approved this retrospective study of 403 patients from August 1, 2012, to July 30, 2014, and waived the informed consent requirement. A cross-department strategy was instituted to use MR imaging as the primary diagnostic modality in patients aged 3-49 years who presented to the emergency department with RLQ pain. All MR examinations were performed with a 1.5- or 3.0-T system. Images were acquired without breath holding by using multiplanar half-Fourier single-shot T2-weighted imaging without and with spectral adiabatic inversion recovery fat suppression without oral or intravenous contrast material. MR imaging room time was measured for each patient. Prospective image interpretations from clinical records were reviewed to document acute appendicitis or other causes of abdominal pain. Final clinical outcomes were determined by using (a) surgical results (n = 77), (b) telephone follow-up combined with review of the patient's medical records (n = 291), or (c) consensus expert panel assessment if no follow-up data were available (n = 35). Logistic regression analysis was performed to evaluate the sensitivity and specificity of MR imaging in the detection of acute appendicitis, and corresponding 95% confidence intervals were determined. RESULTS: Of the 403 patients, 67 had MR imaging findings that were positive for acute appendicitis, and 336 had negative findings. MR imaging had a sensitivity of 97.0% (65 of 67) and a specificity of 99.4% (334 of 336). The mean total room time was 14 minutes (range, 8-62 minutes). An alternate diagnosis was offered in 173 (51.5%) of 336 patients. CONCLUSION: MR imaging is a highly sensitive and specific test in the evaluation of patients younger than 50 years with acute RLQ pain that uses a rapid imaging protocol performed without intravenous or oral contrast material.

Assuntos

Apendicite/diagnóstico , Imageamento por Ressonância Magnética/métodos , Adolescente , Adulto , Criança , Pré-Escolar , Feminino , Humanos , Interpretação de Imagem Assistida por Computador , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Sensibilidade e Especificidade

Distinct gene-expression profiles associated with the susceptibility of pathogen-specific CD4 T cells to HIV-1 infection.

Hu, Haitao; Nau, Martin; Ehrenberg, Phil; Chenine, Agnes-Laurence; Macedo, Camila; Zhou, Yu; Daye, Z John; Wei, Zhi; Vahey, Maryanne; Michael, Nelson L; Kim, Jerome H; Marovich, Mary; Ratto-Kim, Silvia.

Blood ; 121(7): 1136-44, 2013 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-23258923

RESUMO

In HIV infection, CD4 responses to opportunistic pathogens such as Candida albicans are lost early, but CMV-specific CD4 response persists. Little is currently known about HIV infection of CD4 T cells of different pathogen/antigen specificities. CFSE-labeled PBMCs were stimulated with CMV, tetanus toxoid (TT), and C albicans antigens and subsequently exposed to HIV. HIV infection was monitored by intracellular p24 in CFSE(low) population. We found that although TT- and C albicans-specific CD4 T cells were permissive, CMV-specific CD4 T cells were highly resistant to both R5 and X4 HIV. Quantification of HIV DNA in CFSE(low) cells showed a reduction of strong-stop and full-length DNA in CMV-specific cells compared with TT- and C albicans-specific cells. ß-Chemokine neutralization enhanced HIV infection in TT- and C albicans-specific cells, whereas HIV infection in CMV-specific cells remained low despite increased entry by ß-chemokine neutralization, suggesting postentry HIV restriction by CMV-specific cells. Microarray analysis (Gene Expression Omnibus accession number: GSE42853) revealed distinct transcriptional profiles that involved selective up-regulation of comprehensive innate antiviral genes in CMV-specific cells, whereas TT- and C albicans-specific cells mainly up-regulated Th17 inflammatory response. Our data suggest a mechanism for the persistence of CMV-specific CD4 response and earlier loss of mucosal Th17-associated TT- and C albicans-specific CD4 response in AIDS.

Assuntos

Linfócitos T CD4-Positivos/imunologia , Linfócitos T CD4-Positivos/virologia , Infecções por HIV/genética , Infecções por HIV/imunologia , HIV-1 , Candida albicans/imunologia , Candida albicans/patogenicidade , Citomegalovirus/imunologia , Citomegalovirus/patogenicidade , Infecções por HIV/virologia , HIV-1/imunologia , HIV-1/patogenicidade , HIV-1/fisiologia , Interações Hospedeiro-Patógeno/genética , Interações Hospedeiro-Patógeno/imunologia , Humanos , Imunidade Inata/genética , Imunidade nas Mucosas/genética , Toxoide Tetânico/imunologia , Células Th17/imunologia , Células Th17/virologia , Transcriptoma , Internalização do Vírus , Replicação Viral

A powerful test for multiple rare variants association studies that incorporates sequencing qualities.

Daye, Z John; Li, Hongzhe; Wei, Zhi.

Nucleic Acids Res ; 40(8): e60, 2012 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-22262732

RESUMO

Next-generation sequencing data will soon become routinely available for association studies between complex traits and rare variants. Sequencing data, however, are characterized by the presence of sequencing errors at each individual genotype. This makes it especially challenging to perform association studies of rare variants, which, due to their low minor allele frequencies, can be easily perturbed by genotype errors. In this article, we develop the quality-weighted multivariate score association test (qMSAT), a new procedure that allows powerful association tests between complex traits and multiple rare variants under the presence of sequencing errors. Simulation results based on quality scores from real data show that the qMSAT often dominates over current methods, that do not utilize quality information. In particular, the qMSAT can dramatically increase power over existing methods under moderate sample sizes and relatively low coverage. Moreover, in an obesity data study, we identified using the qMSAT two functional regions (MGLL promoter and MGLL 3'-untranslated region) where rare variants are associated with extreme obesity. Due to the high cost of sequencing data, the qMSAT is especially valuable for large-scale studies involving rare variants, as it can potentially increase power without additional experimental cost. qMSAT is freely available at http://qmsat.sourceforge.net/.

Assuntos

Estudos de Associação Genética , Variação Genética , Feminino , Frequência do Gene , Humanos , Desequilíbrio de Ligação , Masculino , Monoacilglicerol Lipases/genética , Obesidade/genética , Razão de Chances , Tamanho da Amostra , Análise de Sequência de DNA

High-Dimensional Heteroscedastic Regression with an Application to eQTL Data Analysis.

Daye, Z John; Chen, Jinbo; Li, Hongzhe.

Biometrics ; 68(1): 316-326, 2012 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-22547833

RESUMO

We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.

A Sparse Structured Shrinkage Estimator for Nonparametric Varying-Coefficient Model with an Application in Genomics.

Daye, Z John; Xie, Jichun; Li, Hongzhe.

J Comput Graph Stat ; 21(1): 110-133, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22904608

RESUMO

Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes' promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA