Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
Genome Biol ; 23(1): 166, 2022 08 01.
Artigo em Inglês | MEDLINE | ID: mdl-35915508

RESUMO

BACKGROUND: Individual and environmental health outcomes are frequently linked to changes in the diversity of associated microbial communities. Thus, deriving health indicators based on microbiome diversity measures is essential. While microbiome data generated using high-throughput 16S rRNA marker gene surveys are appealing for this purpose, 16S surveys also generate a plethora of spurious microbial taxa. RESULTS: When this artificial inflation in the observed number of taxa is ignored, we find that changes in the abundance of detected taxa confound current methods for inferring differences in richness. Experimental evidence, theory-guided exploratory data analyses, and existing literature support the conclusion that most sub-genus discoveries are spurious artifacts of clustering 16S sequencing reads. We proceed to model a 16S survey's systematic patterns of sub-genus taxa generation as a function of genus abundance to derive a robust control for false taxa accumulation. These controls unlock classical regression approaches for highly flexible differential richness inference at various levels of the surveyed microbial assemblage: from sample groups to specific taxa collections. The proposed methodology for differential richness inference is available through an R package, Prokounter. CONCLUSIONS: False species discoveries bias richness estimation and confound differential richness inference. In the case of 16S microbiome surveys, supporting evidence indicate that most sub-genus taxa are spurious. Based on this finding, a flexible method is proposed and is shown to overcome the confounding problem noted with current approaches for differential richness inference. Package availability: https://github.com/mskb01/prokounter.


Assuntos
Bactérias , Microbiota , Artefatos , Bactérias/genética , Análise por Conglomerados , Microbiota/genética , RNA Ribossômico 16S/genética
2.
PLoS Negl Trop Dis ; 14(7): e0008434, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32716983

RESUMO

Dengue fever is a viral disease transmitted by mosquitoes. In recent decades, dengue fever has spread throughout the world. In 2014 and 2015, southern Taiwan experienced its most serious dengue outbreak in recent years. Some statistical models have been established in the past, however, these models may not be suitable for predicting huge outbreaks in 2014 and 2015. The control of dengue fever has become the primary task of local health agencies. This study attempts to predict the occurrence of dengue fever in order to achieve the purpose of timely warning. We applied a newly developed autoregressive model (AR model) to assess the association between daily weather variability and daily dengue case number in 2014 and 2015 in Kaohsiung, the largest city in southern Taiwan. This model also contained additional lagged weather predictors, and developed 5-day-ahead and 15-day-ahead predictive models. Our results indicate that numbers of dengue cases in Kaohsiung are associated with humidity and the biting rate (BR). Our model is simple, intuitive and easy to use. The developed model can be embedded in a "real-time" schedule, and the data (at present) can be updated daily or weekly based on the needs of public health workers. In this study, a simple model using only meteorological factors performed well. The proposed real-time forecast model can help health agencies take public health actions to mitigate the influences of the epidemic.


Assuntos
Dengue/epidemiologia , Surtos de Doenças , Previsões , Humanos , Umidade , Modelos Estatísticos , Taiwan/epidemiologia , Temperatura , Tempo (Meteorologia)
3.
J Surv Stat Methodol ; 7(3): 334-364, 2019 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-31428658

RESUMO

The most widespread method of computing confidence intervals (CIs) in complex surveys is to add and subtract the margin of error (MOE) from the point estimate, where the MOE is the estimated standard error multiplied by the suitable Gaussian quantile. This Wald-type interval is used by the American Community Survey (ACS), the largest US household sample survey. For inferences on small proportions with moderate sample sizes, this method often results in marked under-coverage and lower CI endpoint less than 0. We assess via simulation the coverage and width, in complex sample surveys, of seven alternatives to the Wald interval for a binomial proportion with sample size replaced by the 'effective sample size,' that is, the sample size divided by the design effect. Building on previous work by the present authors, our simulations address the impact of clustering, stratification, different stratum sampling fractions, and stratum-specific proportions. We show that all intervals undercover when there is clustering and design effects are computed from a simple design-based estimator of sampling variance. Coverage can be better calibrated for the alternatives to Wald by improving estimation of the effective sample size through superpopulation modeling. This approach is more effective in our simulations than previously proposed modifications of effective sample size. We recommend intervals of the Wilson or Bayes uniform prior form, with the Jeffreys prior interval not far behind.

4.
BMC Genomics ; 19(1): 799, 2018 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-30400812

RESUMO

BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.


Assuntos
Algoritmos , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Metagenômica/métodos , Microbiota , RNA Ribossômico 16S/genética , Teorema de Bayes
5.
PLoS One ; 12(11): e0187132, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-29145425

RESUMO

Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson's correlation coefficient (r) and Jaccard's index (J)-two of the most common metrics for correlation analysis of presence-absence data-can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (<10% prevalence), explaining why r and J might differ more strongly in microbiome datasets, where there are large numbers of rare taxa. Indeed 74% of all species-pairs in our study had at least one rare species. Next, we show how Pearson's correlation coefficient can result in artificial inflation of positive taxon relationships and how this is a particular problem for microbiome studies. We then illustrate how Jaccard's index of similarity (J) can yield improvements over Pearson's correlation coefficient. However, the standard null model for Jaccard's index is flawed, and thus introduces its own set of spurious conclusions. We thus identify a better null model based on a hypergeometric distribution, which appropriately corrects for species prevalence. This model is available from recent statistics literature, and can be used for evaluating the significance of any value of an empirically observed Jaccard's index. The resulting simple, yet effective method for handling correlation analysis of microbial presence-absence datasets provides a robust means of testing and finding relationships and/or shared environmental responses among microbial taxa.


Assuntos
Conjuntos de Dados como Assunto , Microbiota
6.
J Biopharm Stat ; 27(5): 756-772, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-27669105

RESUMO

Bioequivalence (BE) studies are an essential part of the evaluation of generic drugs. The most common in vivo BE study design is the two-period two-treatment crossover design. AUC (area under the concentration-time curve) and Cmax (maximum concentration) are obtained from the observed concentration-time profiles for each subject from each treatment under each sequence. In the BE evaluation of pharmacokinetic crossover studies, the normality of the univariate response variable, e.g. log(AUC)1 or log(Cmax), is often assumed in the literature without much evidence. Therefore, we investigate the distributional assumption of the normality of response variables, log(AUC) and log(Cmax), by simulating concentration-time profiles from two-stage pharmacokinetic models (commonly used in pharmacokinetic research) for a wide range of pharmacokinetic parameters and measurement error structures. Our simulations show that, under reasonable distributional assumptions on the pharmacokinetic parameters, log(AUC) has heavy tails and log(Cmax) is skewed. Sensitivity analyses are conducted to investigate how the distribution of the standardized log(AUC) (or the standardized log(Cmax)) for a large number of simulated subjects deviates from normality if distributions of errors in the pharmacokinetic model for plasma concentrations deviate from normality and if the plasma concentration can be described by different compartmental models.


Assuntos
Simulação por Computador/estatística & dados numéricos , Medicamentos Genéricos/farmacocinética , Distribuições Estatísticas , Área Sob a Curva , Humanos , Farmacocinética , Equivalência Terapêutica
7.
Pharm Stat ; 14(3): 272, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25807931

RESUMO

This article reflects the views of the authors and should not be construed to be those of the US Food and Drug Administration.


Assuntos
Modelos Estatísticos , Preparações Farmacêuticas , Tamanho da Amostra , Humanos
8.
Pharm Stat ; 14(2): 95-101, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25477145

RESUMO

The number of subjects in a pharmacokinetic two-period two-treatment crossover bioequivalence study is typically small, most often less than 60. The most common approach to testing for bioequivalence is the two one-sided tests procedure. No explicit mathematical formula for the power function in the context of the two one-sided tests procedure exists in the statistical literature, although the exact power based on Owen's special case of bivariate noncentral t-distribution has been tabulated and graphed. Several approximations have previously been published for the probability of rejection in the two one-sided tests procedure for crossover bioequivalence studies. These approximations and associated sample size formulas are reviewed in this article and compared for various parameter combinations with exact power formulas derived here, which are computed analytically as univariate integrals and which have been validated by Monte Carlo simulations. The exact formulas for power and sample size are shown to improve markedly in realistic parameter settings over the previous approximations.


Assuntos
Modelos Estatísticos , Preparações Farmacêuticas , Tamanho da Amostra , Estudos Cross-Over , Humanos , Preparações Farmacêuticas/metabolismo , Equivalência Terapêutica
9.
Lifetime Data Anal ; 20(3): 459-80, 2014 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-23963960

RESUMO

After a brief historical survey of parametric survival models, from actuarial, biomedical, demographical and engineering sources, this paper discusses the persistent reasons why parametric models still play an important role in exploratory statistical research. The phase-type models are advanced as a flexible family of latent-class models with interpretable components. These models are now supported by computational statistical methods that make numerical calculation of likelihoods and statistical estimation of parameters feasible in theory for quite complicated settings. However, consideration of Fisher Information and likelihood-ratio type tests to discriminate between model families indicates that only the simplest phase-type model topologies can be stably estimated in practice, even on rather large datasets. An example of a parametric model with features of mixtures, multiple stages or 'hits', and a trapping-state is given to illustrate simple computational tools in R, both on simulated data and on a large SEER 1992-2002 breast-cancer dataset.


Assuntos
Interpretação Estatística de Dados , Funções Verossimilhança , Análise de Sobrevida , Neoplasias da Mama/mortalidade , Simulação por Computador , Feminino , Humanos , Cadeias de Markov , Modelos Estatísticos
10.
J Multivar Anal ; 130: 176-193, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-28503001

RESUMO

Linear mixed models (LMMs) are widely used for regression analysis of data that are assumed to be clustered or correlated. Assessing model fit is important for valid inference but to date no confirmatory tests are available to assess the adequacy of the fixed effects part of LMMs against general alternatives. We therefore propose a class of goodness-of-fit tests for the mean structure of LMMs. Our test statistic is a quadratic form of the difference between observed values and the values expected under the estimated model in cells defined by a partition of the covariate space. We show that this test statistic has an asymptotic chi-squared distribution when model parameters are estimated by maximum likelihood or by least squares and method of moments, and study its power under local alternatives both analytically and in simulations. Data on repeated measurements of thyroglobulin from individuals exposed to the accident at the Chernobyl power plant in 1986 are used to illustrate the proposed test.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA