Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
PLoS One ; 10(3): e0119254, 2015.
Article in English | MEDLINE | ID: mdl-25787144

ABSTRACT

This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.


Subject(s)
Models, Statistical , Sequence Analysis, RNA/methods , Computer Simulation , Regression Analysis
2.
Stat Appl Genet Mol Biol ; 12(1): 49-70, 2013 Mar 26.
Article in English | MEDLINE | ID: mdl-23502340

ABSTRACT

RNA sequencing (RNA-Seq) is the current method of choice for characterizing transcriptomes and quantifying gene expression changes. This next generation sequencing-based method provides unprecedented depth and resolution. The negative binomial (NB) probability distribution has been shown to be a useful model for frequencies of mapped RNA-Seq reads and consequently provides a basis for statistical analysis of gene expression. Negative binomial exact tests are available for two-group comparisons but do not extend to negative binomial regression analysis, which is important for examining gene expression as a function of explanatory variables and for adjusted group comparisons accounting for other factors. We address the adequacy of available large-sample tests for the small sample sizes typically available from RNA-Seq studies and consider a higher-order asymptotic (HOA) adjustment to likelihood ratio tests. We demonstrate that 1) the HOA-adjusted likelihood ratio test is practically indistinguishable from the exact test in situations where the exact test is available, 2) the type I error of the HOA test matches the nominal specification in regression settings we examined via simulation, and 3) the power of the likelihood ratio test does not appear to be affected by the HOA adjustment. This work helps clarify the accuracy of the unadjusted likelihood ratio test and the degree of improvement available with the HOA adjustment. Furthermore, the HOA test may be preferable even when the exact test is available because it does not require ad hoc library size adjustments.


Subject(s)
Gene Expression Profiling/methods , Models, Genetic , Sequence Analysis, RNA , Algorithms , Arabidopsis/genetics , Base Sequence , Computer Simulation , High-Throughput Nucleotide Sequencing , Likelihood Functions , Models, Statistical , Poisson Distribution , Pseudomonas syringae/genetics , RNA, Bacterial/genetics , RNA, Plant/genetics , Regression Analysis
3.
PLoS One ; 6(10): e25279, 2011.
Article in English | MEDLINE | ID: mdl-21998647

ABSTRACT

GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts.


Subject(s)
Computational Biology/methods , Gene Expression Profiling/methods , Sequence Analysis, RNA , Arabidopsis/genetics , Arabidopsis/immunology , Benchmarking , Conserved Sequence , Data Interpretation, Statistical , Databases, Genetic , Genomics , Oligonucleotide Array Sequence Analysis
4.
Radiat Res ; 166(1 Pt 2): 303-12, 2006 Jul.
Article in English | MEDLINE | ID: mdl-16808615

ABSTRACT

Statistical dose-response analyses in radiation epidemiology can produce misleading results if they fail to account for radiation dose uncertainties. While dosimetries may differ substantially depending on the ways in which the subjects were exposed, the statistical problems typically involve a predominantly linear dose-response curve, multiple sources of uncertainty, and uncertainty magnitudes that are best characterized as proportional rather than additive. We discuss some basic statistical issues in this setting, including the bias and shape distortion induced by classical and Berkson uncertainties, the effect of uncertain dose-prediction model parameters on estimated dose-response curves, and some notes on statistical methods for dose-response estimation in the presence of radiation dose uncertainties.


Subject(s)
Artifacts , Data Interpretation, Statistical , Dose-Response Relationship, Radiation , Models, Biological , Models, Statistical , Neoplasms, Radiation-Induced/epidemiology , Radiometry/methods , Risk Assessment/methods , Bias , Body Burden , Computer Simulation , Humans , Radiation Dosage , Relative Biological Effectiveness , Risk Factors
5.
Radiat Res ; 161(3): 359-68, 2004 Mar.
Article in English | MEDLINE | ID: mdl-14982478

ABSTRACT

In the 1940s and 1950s, children in Israel were treated for tinea capitis by irradiation to the scalp to induce epilation. Follow-up studies of these patients and of other radiation- exposed populations show an increased risk of malignant and benign thyroid tumors. Those analyses, however, assume that thyroid dose for individuals is estimated precisely without error. Failure to account for uncertainties in dosimetry may affect standard errors and bias dose-response estimates. For the Israeli tinea capitis study, we discuss sources of uncertainties and adjust dosimetry for uncertainties in the prediction of true dose from X-ray treatment parameters. We also account for missing ages at exposure for patients with multiple X-ray treatments, since only ages at first treatment are known, and for missing data on treatment center, which investigators use to define exposure. Our reanalysis of the dose response for thyroid cancer and benign thyroid tumors indicates that uncertainties in dosimetry have minimal effects on dose-response estimation and for inference on the modifying effects of age at first exposure, time since exposure, and other factors. Since the components of the dose uncertainties we describe are likely to be present in other epidemiological studies of patients treated with radiation, our analysis may provide a model for considering the potential role of these uncertainties.


Subject(s)
Data Interpretation, Statistical , Neoplasms, Radiation-Induced/epidemiology , Radiometry/methods , Radiotherapy/statistics & numerical data , Risk Assessment/methods , Thyroid Neoplasms/epidemiology , Tinea Capitis/epidemiology , Tinea Capitis/radiotherapy , Adolescent , Body Burden , Child , Child, Preschool , Dose-Response Relationship, Radiation , Female , Humans , Incidence , Infant , Infant, Newborn , Male , Models, Biological , Models, Statistical , Radiotherapy Dosage , Reproducibility of Results , Sensitivity and Specificity , Thyroid Gland/radiation effects
6.
Biometrics ; 58(2): 448-53, 2002 Jun.
Article in English | MEDLINE | ID: mdl-12071420

ABSTRACT

This article demonstrates semiparametric maximum likelihood estimation of a nonlinear growth model for fish lengths using imprecisely measured ages. Data on the species corvina reina, found in the Gulf of Nicoya, Costa Rica, consist of lengths and imprecise ages for 168 fish and precise ages for a subset of 16 fish. The statistical problem may therefore be classified as nonlinear errors-in-variables regression with internal validation data. Inferential techniques are based on ideas extracted from several previous works on semiparametric maximum likelihood for errors-in-variables problems. The illustration of the example clarifies practical aspects of the associated computational, inferential, and data analytic techniques.


Subject(s)
Likelihood Functions , Nonlinear Dynamics , Regression Analysis , Algorithms , Animals , Biometry , Data Interpretation, Statistical , Fisheries/statistics & numerical data , Fishes/growth & development
SELECTION OF CITATIONS
SEARCH DETAIL