Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Commun Biol ; 7(1): 383, 2024 Mar 29.
Artículo en Inglés | MEDLINE | ID: mdl-38553628

RESUMEN

Hepatocellular carcinoma (HCC) is a molecularly heterogeneous solid malignancy, and its fitness may be shaped by how its tumor cells evolve. However, ability to monitor tumor cell evolution is hampered by the presence of numerous passenger mutations that do not provide any biological consequences. Here we develop a strategy to determine the tumor clonality of three independent HCC cohorts of 524 patients with diverse etiologies and race/ethnicity by utilizing somatic mutations in cancer driver genes. We identify two main types of tumor evolution, i.e., linear, and non-linear models where non-linear type could be further divided into classes, which we call shallow branching and deep branching. We find that linear evolving HCC is less aggressive than other types. GTF2IRD2B mutations are enriched in HCC with linear evolution, while TP53 mutations are the most frequent genetic alterations in HCC with non-linear models. Furthermore, we observe significant B cell enrichment in linear trees compared to non-linear trees suggesting the need for further research to uncover potential variations in immune cell types within genomically determined phylogeny types. These results hint at the possibility that tumor cells and their microenvironment may collectively influence the tumor evolution process.


Asunto(s)
Carcinoma Hepatocelular , Neoplasias Hepáticas , Humanos , Carcinoma Hepatocelular/patología , Neoplasias Hepáticas/patología , Filogenia , Oncogenes , Mutación , Microambiente Tumoral/genética
2.
Genome Biol ; 23(1): 166, 2022 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-35915508

RESUMEN

BACKGROUND: Individual and environmental health outcomes are frequently linked to changes in the diversity of associated microbial communities. Thus, deriving health indicators based on microbiome diversity measures is essential. While microbiome data generated using high-throughput 16S rRNA marker gene surveys are appealing for this purpose, 16S surveys also generate a plethora of spurious microbial taxa. RESULTS: When this artificial inflation in the observed number of taxa is ignored, we find that changes in the abundance of detected taxa confound current methods for inferring differences in richness. Experimental evidence, theory-guided exploratory data analyses, and existing literature support the conclusion that most sub-genus discoveries are spurious artifacts of clustering 16S sequencing reads. We proceed to model a 16S survey's systematic patterns of sub-genus taxa generation as a function of genus abundance to derive a robust control for false taxa accumulation. These controls unlock classical regression approaches for highly flexible differential richness inference at various levels of the surveyed microbial assemblage: from sample groups to specific taxa collections. The proposed methodology for differential richness inference is available through an R package, Prokounter. CONCLUSIONS: False species discoveries bias richness estimation and confound differential richness inference. In the case of 16S microbiome surveys, supporting evidence indicate that most sub-genus taxa are spurious. Based on this finding, a flexible method is proposed and is shown to overcome the confounding problem noted with current approaches for differential richness inference. Package availability: https://github.com/mskb01/prokounter.


Asunto(s)
Bacterias , Microbiota , Artefactos , Bacterias/genética , Análisis por Conglomerados , Microbiota/genética , ARN Ribosómico 16S/genética
3.
Sci Adv ; 8(4): eabj9204, 2022 Jan 28.
Artículo en Inglés | MEDLINE | ID: mdl-35080967

RESUMEN

Scientists often need to know whether pairs of entities tend to occur together or independently. Standard approaches to this issue use co-occurrence indices such as Jaccard, Sørensen-Dice, and Simpson. We show that these indices are sensitive to the prevalences of the entities they describe and that this invalidates their interpretability. We propose an index, α, that is insensitive to prevalences. Published datasets reanalyzed with both α and Jaccard's index (J) yield profoundly different biological inferences. For example, a published analysis using J contradicted predictions of the island biogeography theory finding that community stability increased with increasing physical isolation. Reanalysis of the same dataset with the estimator [Formula: see text] reversed that result and supported theoretical predictions. We found similarly marked effects in reanalyses of antibiotic cross-resistance and human disease biomarkers. Our index α is not merely an improvement; its use changes data interpretation in fundamental ways.

4.
PLoS Negl Trop Dis ; 14(7): e0008434, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32716983

RESUMEN

Dengue fever is a viral disease transmitted by mosquitoes. In recent decades, dengue fever has spread throughout the world. In 2014 and 2015, southern Taiwan experienced its most serious dengue outbreak in recent years. Some statistical models have been established in the past, however, these models may not be suitable for predicting huge outbreaks in 2014 and 2015. The control of dengue fever has become the primary task of local health agencies. This study attempts to predict the occurrence of dengue fever in order to achieve the purpose of timely warning. We applied a newly developed autoregressive model (AR model) to assess the association between daily weather variability and daily dengue case number in 2014 and 2015 in Kaohsiung, the largest city in southern Taiwan. This model also contained additional lagged weather predictors, and developed 5-day-ahead and 15-day-ahead predictive models. Our results indicate that numbers of dengue cases in Kaohsiung are associated with humidity and the biting rate (BR). Our model is simple, intuitive and easy to use. The developed model can be embedded in a "real-time" schedule, and the data (at present) can be updated daily or weekly based on the needs of public health workers. In this study, a simple model using only meteorological factors performed well. The proposed real-time forecast model can help health agencies take public health actions to mitigate the influences of the epidemic.


Asunto(s)
Dengue/epidemiología , Brotes de Enfermedades , Predicción , Humanos , Humedad , Modelos Estadísticos , Taiwán/epidemiología , Temperatura , Tiempo (Meteorología)
5.
J Surv Stat Methodol ; 7(3): 334-364, 2019 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-31428658

RESUMEN

The most widespread method of computing confidence intervals (CIs) in complex surveys is to add and subtract the margin of error (MOE) from the point estimate, where the MOE is the estimated standard error multiplied by the suitable Gaussian quantile. This Wald-type interval is used by the American Community Survey (ACS), the largest US household sample survey. For inferences on small proportions with moderate sample sizes, this method often results in marked under-coverage and lower CI endpoint less than 0. We assess via simulation the coverage and width, in complex sample surveys, of seven alternatives to the Wald interval for a binomial proportion with sample size replaced by the 'effective sample size,' that is, the sample size divided by the design effect. Building on previous work by the present authors, our simulations address the impact of clustering, stratification, different stratum sampling fractions, and stratum-specific proportions. We show that all intervals undercover when there is clustering and design effects are computed from a simple design-based estimator of sampling variance. Coverage can be better calibrated for the alternatives to Wald by improving estimation of the effective sample size through superpopulation modeling. This approach is more effective in our simulations than previously proposed modifications of effective sample size. We recommend intervals of the Wilson or Bayes uniform prior form, with the Jeffreys prior interval not far behind.

6.
BMC Genomics ; 19(1): 799, 2018 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-30400812

RESUMEN

BACKGROUND: Count data derived from high-throughput deoxy-ribonucliec acid (DNA) sequencing is frequently used in quantitative molecular assays. Due to properties inherent to the sequencing process, unnormalized count data is compositional, measuring relative and not absolute abundances of the assayed features. This compositional bias confounds inference of absolute abundances. Commonly used count data normalization approaches like library size scaling/rarefaction/subsampling cannot correct for compositional or any other relevant technical bias that is uncorrelated with library size. RESULTS: We demonstrate that existing techniques for estimating compositional bias fail with sparse metagenomic 16S count data and propose an empirical Bayes normalization approach to overcome this problem. In addition, we clarify the assumptions underlying frequently used scaling normalization methods in light of compositional bias, including scaling methods that were not designed directly to address it. CONCLUSIONS: Compositional bias, induced by the sequencing machine, confounds inferences of absolute abundances. We present a normalization technique for compositional bias correction in sparse sequencing count data, and demonstrate its improved performance in metagenomic 16s survey data. Based on the distribution of technical bias estimates arising from several publicly available large scale 16s count datasets, we argue that detailed experiments specifically addressing the influence of compositional bias in metagenomics are needed.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Metagenómica/métodos , Microbiota , ARN Ribosómico 16S/genética , Teorema de Bayes
7.
PLoS One ; 12(11): e0187132, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29145425

RESUMEN

Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson's correlation coefficient (r) and Jaccard's index (J)-two of the most common metrics for correlation analysis of presence-absence data-can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (<10% prevalence), explaining why r and J might differ more strongly in microbiome datasets, where there are large numbers of rare taxa. Indeed 74% of all species-pairs in our study had at least one rare species. Next, we show how Pearson's correlation coefficient can result in artificial inflation of positive taxon relationships and how this is a particular problem for microbiome studies. We then illustrate how Jaccard's index of similarity (J) can yield improvements over Pearson's correlation coefficient. However, the standard null model for Jaccard's index is flawed, and thus introduces its own set of spurious conclusions. We thus identify a better null model based on a hypergeometric distribution, which appropriately corrects for species prevalence. This model is available from recent statistics literature, and can be used for evaluating the significance of any value of an empirically observed Jaccard's index. The resulting simple, yet effective method for handling correlation analysis of microbial presence-absence datasets provides a robust means of testing and finding relationships and/or shared environmental responses among microbial taxa.


Asunto(s)
Conjuntos de Datos como Asunto , Microbiota
8.
J Biopharm Stat ; 27(5): 756-772, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-27669105

RESUMEN

Bioequivalence (BE) studies are an essential part of the evaluation of generic drugs. The most common in vivo BE study design is the two-period two-treatment crossover design. AUC (area under the concentration-time curve) and Cmax (maximum concentration) are obtained from the observed concentration-time profiles for each subject from each treatment under each sequence. In the BE evaluation of pharmacokinetic crossover studies, the normality of the univariate response variable, e.g. log(AUC)1 or log(Cmax), is often assumed in the literature without much evidence. Therefore, we investigate the distributional assumption of the normality of response variables, log(AUC) and log(Cmax), by simulating concentration-time profiles from two-stage pharmacokinetic models (commonly used in pharmacokinetic research) for a wide range of pharmacokinetic parameters and measurement error structures. Our simulations show that, under reasonable distributional assumptions on the pharmacokinetic parameters, log(AUC) has heavy tails and log(Cmax) is skewed. Sensitivity analyses are conducted to investigate how the distribution of the standardized log(AUC) (or the standardized log(Cmax)) for a large number of simulated subjects deviates from normality if distributions of errors in the pharmacokinetic model for plasma concentrations deviate from normality and if the plasma concentration can be described by different compartmental models.


Asunto(s)
Simulación por Computador/estadística & datos numéricos , Medicamentos Genéricos/farmacocinética , Distribuciones Estadísticas , Área Bajo la Curva , Humanos , Farmacocinética , Equivalencia Terapéutica
9.
Pharm Stat ; 14(3): 272, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25807931

RESUMEN

This article reflects the views of the authors and should not be construed to be those of the US Food and Drug Administration.


Asunto(s)
Modelos Estadísticos , Preparaciones Farmacéuticas , Tamaño de la Muestra , Humanos
10.
Pharm Stat ; 14(2): 95-101, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25477145

RESUMEN

The number of subjects in a pharmacokinetic two-period two-treatment crossover bioequivalence study is typically small, most often less than 60. The most common approach to testing for bioequivalence is the two one-sided tests procedure. No explicit mathematical formula for the power function in the context of the two one-sided tests procedure exists in the statistical literature, although the exact power based on Owen's special case of bivariate noncentral t-distribution has been tabulated and graphed. Several approximations have previously been published for the probability of rejection in the two one-sided tests procedure for crossover bioequivalence studies. These approximations and associated sample size formulas are reviewed in this article and compared for various parameter combinations with exact power formulas derived here, which are computed analytically as univariate integrals and which have been validated by Monte Carlo simulations. The exact formulas for power and sample size are shown to improve markedly in realistic parameter settings over the previous approximations.


Asunto(s)
Modelos Estadísticos , Preparaciones Farmacéuticas , Tamaño de la Muestra , Estudios Cruzados , Humanos , Preparaciones Farmacéuticas/metabolismo , Equivalencia Terapéutica
11.
Lifetime Data Anal ; 20(3): 459-80, 2014 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23963960

RESUMEN

After a brief historical survey of parametric survival models, from actuarial, biomedical, demographical and engineering sources, this paper discusses the persistent reasons why parametric models still play an important role in exploratory statistical research. The phase-type models are advanced as a flexible family of latent-class models with interpretable components. These models are now supported by computational statistical methods that make numerical calculation of likelihoods and statistical estimation of parameters feasible in theory for quite complicated settings. However, consideration of Fisher Information and likelihood-ratio type tests to discriminate between model families indicates that only the simplest phase-type model topologies can be stably estimated in practice, even on rather large datasets. An example of a parametric model with features of mixtures, multiple stages or 'hits', and a trapping-state is given to illustrate simple computational tools in R, both on simulated data and on a large SEER 1992-2002 breast-cancer dataset.


Asunto(s)
Interpretación Estadística de Datos , Funciones de Verosimilitud , Análisis de Supervivencia , Neoplasias de la Mama/mortalidad , Simulación por Computador , Femenino , Humanos , Cadenas de Markov , Modelos Estadísticos
12.
J Multivar Anal ; 130: 176-193, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-28503001

RESUMEN

Linear mixed models (LMMs) are widely used for regression analysis of data that are assumed to be clustered or correlated. Assessing model fit is important for valid inference but to date no confirmatory tests are available to assess the adequacy of the fixed effects part of LMMs against general alternatives. We therefore propose a class of goodness-of-fit tests for the mean structure of LMMs. Our test statistic is a quadratic form of the difference between observed values and the values expected under the estimated model in cells defined by a partition of the covariate space. We show that this test statistic has an asymptotic chi-squared distribution when model parameters are estimated by maximum likelihood or by least squares and method of moments, and study its power under local alternatives both analytically and in simulations. Data on repeated measurements of thyroglobulin from individuals exposed to the accident at the Chernobyl power plant in 1986 are used to illustrate the proposed test.

13.
Phonetica ; 59(2-3): 108-33, 2002.
Artículo en Inglés | MEDLINE | ID: mdl-12232463

RESUMEN

This paper uses principal components (PC) analysis to represent coronal tongue contours for the 11 vowels of English in two consonant contexts (/s/, /l/), based upon five replicated measurements in three sessions for each of 6 subjects. Curves from multiple sessions and speakers were overlaid before analysis onto a common (x, y) coordinate system by extensive preprocessing of the curves including: extension (padding) or truncation within session, translation, and truncation to a common x range. Four PCs plus a mean level allow accurate representation of coronal tongue curves, but PC shapes depend strongly on the degree of padding or truncation. The PCs successfully reduced the dimensionality of the curves and reflected vowel height, consonant context, and physiological features.


Asunto(s)
Postura , Lengua/anatomía & histología , Adulto , Etnicidad , Femenino , Humanos , Masculino , Fonética , Habla/fisiología , Medición de la Producción del Habla
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...