Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Nucleic Acids Res ; 45(13): e127, 2017 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-28535263

RESUMEN

Rapidly decreasing cost of next-generation sequencing has led to the recent availability of large-scale RNA-seq data, that empowers the analysis of gene expression variability, in addition to gene expression means. In this paper, we present the MDSeq, based on the coefficient of dispersion, to provide robust and computationally efficient analysis of both gene expression means and variability on RNA-seq counts. The MDSeq utilizes a novel reparametrization of the negative binomial to provide flexible generalized linear models (GLMs) on both the mean and dispersion. We address challenges of analyzing large-scale RNA-seq data via several new developments to provide a comprehensive toolset that models technical excess zeros, identifies outliers efficiently, and evaluates differential expressions at biologically interesting levels. We evaluated performances of the MDSeq using simulated data when the ground truths are known. Results suggest that the MDSeq often outperforms current methods for the analysis of gene expression mean and variability. Moreover, the MDSeq is applied in two real RNA-seq studies, in which we identified functionally relevant genes and gene pathways. Specifically, the analysis of gene expression variability with the MDSeq on the GTEx human brain tissue data has identified pathways associated with common neurodegenerative disorders when gene expression means were conserved.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Cerebelo/metabolismo , Corteza Cerebral/metabolismo , Perfilación de la Expresión Génica/estadística & datos numéricos , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Modelos Lineales , ARN/genética , Análisis de Secuencia de ARN/estadística & datos numéricos , Piel/metabolismo , Piel/efectos de la radiación , Luz Solar/efectos adversos
2.
BMC Med Res Methodol ; 17(1): 12, 2017 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-28122498

RESUMEN

BACKGROUND: Many questions in statistical genomics can be formulated in terms of variable selection of candidate biological factors for modeling a trait or quantity of interest. Often, in these applications, additional covariates describing clinical, demographical or experimental effects must be included a priori as mandatory covariates while allowing the selection of a large number of candidate or optional variables. As genomic studies routinely require mandatory covariates, it is of interest to propose principled methods of variable selection that can incorporate mandatory covariates. METHODS: In this article, we propose the ridge-lasso hybrid estimator (ridle), a new penalized regression method that simultaneously estimates coefficients of mandatory covariates while allowing selection for others. The ridle provides a principled approach to mitigate effects of multicollinearity among the mandatory covariates and possible dependency between mandatory and optional variables. We provide detailed empirical and theoretical studies to evaluate our method. In addition, we develop an efficient algorithm for the ridle. Software, based on efficient Fortran code with R-language wrappers, is publicly and freely available at https://sites.google.com/site/zhongyindaye/software . RESULTS: The ridle is useful when mandatory predictors are known to be significant due to prior knowledge or must be kept for additional analysis. Both theoretical and comprehensive simulation studies have shown that the ridle to be advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves. A microarray gene expression analysis of the histologic grades of breast cancer has identified 24 genes, in which 2 genes are selected only by the ridle among current methods and found to be associated with tumor grade. CONCLUSIONS: In this article, we proposed the ridle as a principled sparse regression method for the selection of optional variables while incorporating mandatory ones. Results suggest that the ridle is advantageous when mandatory covariates are correlated with the irrelevant optional predictors or are highly correlated among themselves.


Asunto(s)
Algoritmos , Neoplasias de la Mama/genética , Biología Computacional/métodos , Genómica/métodos , Modelos Lineales , Biomarcadores de Tumor/genética , Neoplasias de la Mama/patología , Simulación por Computador , Regulación Neoplásica de la Expresión Génica , Humanos , Clasificación del Tumor , Reproducibilidad de los Resultados , Medición de Riesgo , Factores de Riesgo , Programas Informáticos
3.
Radiology ; 279(2): 451-60, 2016 May.
Artículo en Inglés | MEDLINE | ID: mdl-26807893

RESUMEN

PURPOSE: To determine the accuracy of unenhanced magnetic resonance (MR) imaging in the detection of acute appendicitis in patients younger than 50 years who present to the emergency department with right lower quadrant (RLQ) pain. MATERIALS AND METHODS: The institutional review board approved this retrospective study of 403 patients from August 1, 2012, to July 30, 2014, and waived the informed consent requirement. A cross-department strategy was instituted to use MR imaging as the primary diagnostic modality in patients aged 3-49 years who presented to the emergency department with RLQ pain. All MR examinations were performed with a 1.5- or 3.0-T system. Images were acquired without breath holding by using multiplanar half-Fourier single-shot T2-weighted imaging without and with spectral adiabatic inversion recovery fat suppression without oral or intravenous contrast material. MR imaging room time was measured for each patient. Prospective image interpretations from clinical records were reviewed to document acute appendicitis or other causes of abdominal pain. Final clinical outcomes were determined by using (a) surgical results (n = 77), (b) telephone follow-up combined with review of the patient's medical records (n = 291), or (c) consensus expert panel assessment if no follow-up data were available (n = 35). Logistic regression analysis was performed to evaluate the sensitivity and specificity of MR imaging in the detection of acute appendicitis, and corresponding 95% confidence intervals were determined. RESULTS: Of the 403 patients, 67 had MR imaging findings that were positive for acute appendicitis, and 336 had negative findings. MR imaging had a sensitivity of 97.0% (65 of 67) and a specificity of 99.4% (334 of 336). The mean total room time was 14 minutes (range, 8-62 minutes). An alternate diagnosis was offered in 173 (51.5%) of 336 patients. CONCLUSION: MR imaging is a highly sensitive and specific test in the evaluation of patients younger than 50 years with acute RLQ pain that uses a rapid imaging protocol performed without intravenous or oral contrast material.


Asunto(s)
Apendicitis/diagnóstico , Imagen por Resonancia Magnética/métodos , Adolescente , Adulto , Niño , Preescolar , Femenino , Humanos , Interpretación de Imagen Asistida por Computador , Masculino , Persona de Mediana Edad , Estudios Retrospectivos , Sensibilidad y Especificidad
4.
Blood ; 121(7): 1136-44, 2013 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-23258923

RESUMEN

In HIV infection, CD4 responses to opportunistic pathogens such as Candida albicans are lost early, but CMV-specific CD4 response persists. Little is currently known about HIV infection of CD4 T cells of different pathogen/antigen specificities. CFSE-labeled PBMCs were stimulated with CMV, tetanus toxoid (TT), and C albicans antigens and subsequently exposed to HIV. HIV infection was monitored by intracellular p24 in CFSE(low) population. We found that although TT- and C albicans-specific CD4 T cells were permissive, CMV-specific CD4 T cells were highly resistant to both R5 and X4 HIV. Quantification of HIV DNA in CFSE(low) cells showed a reduction of strong-stop and full-length DNA in CMV-specific cells compared with TT- and C albicans-specific cells. ß-Chemokine neutralization enhanced HIV infection in TT- and C albicans-specific cells, whereas HIV infection in CMV-specific cells remained low despite increased entry by ß-chemokine neutralization, suggesting postentry HIV restriction by CMV-specific cells. Microarray analysis (Gene Expression Omnibus accession number: GSE42853) revealed distinct transcriptional profiles that involved selective up-regulation of comprehensive innate antiviral genes in CMV-specific cells, whereas TT- and C albicans-specific cells mainly up-regulated Th17 inflammatory response. Our data suggest a mechanism for the persistence of CMV-specific CD4 response and earlier loss of mucosal Th17-associated TT- and C albicans-specific CD4 response in AIDS.


Asunto(s)
Linfocitos T CD4-Positivos/inmunología , Linfocitos T CD4-Positivos/virología , Infecciones por VIH/genética , Infecciones por VIH/inmunología , VIH-1 , Candida albicans/inmunología , Candida albicans/patogenicidad , Citomegalovirus/inmunología , Citomegalovirus/patogenicidad , Infecciones por VIH/virología , VIH-1/inmunología , VIH-1/patogenicidad , VIH-1/fisiología , Interacciones Huésped-Patógeno/genética , Interacciones Huésped-Patógeno/inmunología , Humanos , Inmunidad Innata/genética , Inmunidad Mucosa/genética , Toxoide Tetánico/inmunología , Células Th17/inmunología , Células Th17/virología , Transcriptoma , Internalización del Virus , Replicación Viral
5.
Nucleic Acids Res ; 40(8): e60, 2012 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-22262732

RESUMEN

Next-generation sequencing data will soon become routinely available for association studies between complex traits and rare variants. Sequencing data, however, are characterized by the presence of sequencing errors at each individual genotype. This makes it especially challenging to perform association studies of rare variants, which, due to their low minor allele frequencies, can be easily perturbed by genotype errors. In this article, we develop the quality-weighted multivariate score association test (qMSAT), a new procedure that allows powerful association tests between complex traits and multiple rare variants under the presence of sequencing errors. Simulation results based on quality scores from real data show that the qMSAT often dominates over current methods, that do not utilize quality information. In particular, the qMSAT can dramatically increase power over existing methods under moderate sample sizes and relatively low coverage. Moreover, in an obesity data study, we identified using the qMSAT two functional regions (MGLL promoter and MGLL 3'-untranslated region) where rare variants are associated with extreme obesity. Due to the high cost of sequencing data, the qMSAT is especially valuable for large-scale studies involving rare variants, as it can potentially increase power without additional experimental cost. qMSAT is freely available at http://qmsat.sourceforge.net/.


Asunto(s)
Estudios de Asociación Genética , Variación Genética , Femenino , Frecuencia de los Genes , Humanos , Desequilibrio de Ligamiento , Masculino , Monoacilglicerol Lipasas/genética , Obesidad/genética , Oportunidad Relativa , Tamaño de la Muestra , Análisis de Secuencia de ADN
6.
Biometrics ; 68(1): 316-326, 2012 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-22547833

RESUMEN

We consider the problem of high-dimensional regression under non-constant error variances. Despite being a common phenomenon in biological applications, heteroscedasticity has, so far, been largely ignored in high-dimensional analysis of genomic data sets. We propose a new methodology that allows non-constant error variances for high-dimensional estimation and model selection. Our method incorporates heteroscedasticity by simultaneously modeling both the mean and variance components via a novel doubly regularized approach. Extensive Monte Carlo simulations indicate that our proposed procedure can result in better estimation and variable selection than existing methods when heteroscedasticity arises from the presence of predictors explaining error variances and outliers. Further, we demonstrate the presence of heteroscedasticity in and apply our method to an expression quantitative trait loci (eQTLs) study of 112 yeast segregants. The new procedure can automatically account for heteroscedasticity in identifying the eQTLs that are associated with gene expression variations and lead to smaller prediction errors. These results demonstrate the importance of considering heteroscedasticity in eQTL data analysis.

7.
J Comput Graph Stat ; 21(1): 110-133, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22904608

RESUMEN

Many problems in genomics are related to variable selection where high-dimensional genomic data are treated as covariates. Such genomic covariates often have certain structures and can be represented as vertices of an undirected graph. Biological processes also vary as functions depending upon some biological state, such as time. High-dimensional variable selection where covariates are graph-structured and underlying model is nonparametric presents an important but largely unaddressed statistical challenge. Motivated by the problem of regression-based motif discovery, we consider the problem of variable selection for high-dimensional nonparametric varying-coefficient models and introduce a sparse structured shrinkage (SSS) estimator based on basis function expansions and a novel smoothed penalty function. We present an efficient algorithm for computing the SSS estimator. Results on model selection consistency and estimation bounds are derived. Moreover, finite-sample performances are studied via simulations, and the effects of high-dimensionality and structural information of the covariates are especially highlighted. We apply our method to motif finding problem using a yeast cell-cycle gene expression dataset and word counts in genes' promoter sequences. Our results demonstrate that the proposed method can result in better variable selection and prediction for high-dimensional regression when the underlying model is nonparametric and covariates are structured. Supplemental materials for the article are available online.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA