Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
Biometrics ; 76(3): 811-820, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-31863595

RESUMEN

In biomedical studies, testing for homogeneity between two groups, where one group is modeled by mixture models, is often of great interest. This paper considers the semiparametric exponential family mixture model proposed by Hong et al. (2017) and studies the score test for homogeneity under this model. The score test is nonregular in the sense that nuisance parameters disappear under the null hypothesis. To address this difficulty, we propose a modification of the score test, so that the resulting test enjoys the Wilks phenomenon. In finite samples, we show that with fixed nuisance parameters the score test is locally most powerful. In large samples, we establish the asymptotic power functions under two types of local alternative hypotheses. Our simulation studies illustrate that the proposed score test is powerful and computationally fast. We apply the proposed score test to an UK ovarian cancer DNA methylation data for identification of differentially methylated CpG sites.


Asunto(s)
Modelos Estadísticos , Simulación por Computador
2.
J Nonparametr Stat ; 24(3): 647-663, 2012 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-23049230

RESUMEN

A local modal estimation procedure is proposed for the regression function in a non-parametric regression model. A distinguishing characteristic of the proposed procedure is that it introduces an additional tuning parameter that is automatically selected using the observed data in order to achieve both robustness and efficiency of the resulting estimate. We demonstrate both theoretically and empirically that the resulting estimator is more efficient than the ordinary local polynomial regression estimator in the presence of outliers or heavy tail error distribution (such as t-distribution). Furthermore, we show that the proposed procedure is as asymptotically efficient as the local polynomial regression estimator when there are no outliers and the error distribution is a Gaussian distribution. We propose an EM type algorithm for the proposed estimation procedure. A Monte Carlo simulation study is conducted to examine the finite sample performance of the proposed method. The simulation results confirm the theoretical findings. The proposed methodology is further illustrated via an analysis of a real data example.

3.
BMC Bioinformatics ; 12: 111, 2011 Apr 21.
Artículo en Inglés | MEDLINE | ID: mdl-21615972

RESUMEN

BACKGROUND: MixtureTree v1.0 is a Linux based program (written in C++) which implements an algorithm based on mixture models for reconstructing phylogeny from binary sequence data, such as single-nucleotide polymorphisms (SNPs). In addition to the mixture algorithm with three different optimization options, the program also implements a bootstrap procedure with majority-rule consensus. RESULTS: The MixtureTree program written in C++ is a Linux based package. The User's Guide and source codes will be available at http://math.asu.edu/~scchen/MixtureTree.html CONCLUSIONS: The efficiency of the mixture algorithm is relatively higher than some classical methods, such as Neighbor-Joining method, Maximum Parsimony method and Maximum Likelihood method. The shortcoming of the mixture tree algorithms, for example timing consuming, can be improved by implementing other revised Expectation-Maximization(EM) algorithms instead of the traditional EM algorithm.


Asunto(s)
Algoritmos , Filogenia , Programas Informáticos , Funciones de Verosimilitud , Polimorfismo de Nucleótido Simple , Factores de Tiempo
4.
BMC Bioinformatics ; 6: 300, 2005 Dec 13.
Artículo en Inglés | MEDLINE | ID: mdl-16351717

RESUMEN

BACKGROUND: In expressed sequence tag (EST) sequencing, we are often interested in how many genes we can capture in an EST sample of a targeted size. This information provides insights to sequencing efficiency in experimental design, as well as clues to the diversity of expressed genes in the tissue from which the library was constructed. RESULTS: We propose a compound Poisson process model that can accurately predict the gene capture in a future EST sample based on an initial EST sample. It also allows estimation of the number of expressed genes in one cDNA library or co-expressed in two cDNA libraries. The superior performance of the new prediction method over an existing approach is established by a simulation study. Our analysis of four Arabidopsis thaliana EST sets suggests that the number of expressed genes present in four different cDNA libraries of Arabidopsis thaliana varies from 9155 (root) to 12005 (silique). An observed fraction of co-expressed genes in two different EST sets as low as 25% can correspond to an actual overlap fraction greater than 65%. CONCLUSION: The proposed method provides a convenient tool for gene capture prediction and cDNA library property diagnosis in EST sequencing.


Asunto(s)
Proteínas de Arabidopsis/genética , ADN de Plantas/genética , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica/métodos , Modelos Genéticos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Algoritmos , Secuencia de Bases , Simulación por Computador , Biblioteca de Genes , Modelos Estadísticos , Datos de Secuencia Molecular , Distribución de Poisson
5.
J Am Stat Assoc ; 109(505): 395-410, 2014 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24764609

RESUMEN

In this article, we study the power properties of quadratic-distance-based goodness-of-fit tests. First, we introduce the concept of a root kernel and discuss the considerations that enter the selection of this kernel. We derive an easy to use normal approximation to the power of quadratic distance goodness-of-fit tests and base the construction of a noncentrality index, an analogue of the traditional noncentrality parameter, on it. This leads to a method akin to the Neyman-Pearson lemma for constructing optimal kernels for specific alternatives. We then introduce a midpower analysis as a device for choosing optimal degrees of freedom for a family of alternatives of interest. Finally, we introduce a new diffusion kernel, called the Pearson-normal kernel, and study the extent to which the normal approximation to the power of tests based on this kernel is valid. Supplementary materials for this article are available online.

6.
Genome Res ; 16(6): 738-49, 2006 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-16702410

RESUMEN

Genomic comparisons provide evidence for ancient genome-wide duplications in a diverse array of animals and plants. We developed a birth-death model to identify evidence for genome duplication in EST data, and applied a mixture model to estimate the age distribution of paralogous pairs identified in EST sets for species representing the basal-most extant flowering plant lineages. We found evidence for episodes of ancient genome-wide duplications in the basal angiosperm lineages including Nuphar advena (yellow water lily: Nymphaeaceae) and the magnoliids Persea americana (avocado: Lauraceae), Liriodendron tulipifera (tulip poplar: Magnoliaceae), and Saruma henryi (Aristolochiaceae). In addition, we detected independent genome duplications in the basal eudicot Eschscholzia californica (California poppy: Papaveraceae) and the basal monocot Acorus americanus (Acoraceae), both of which were distinct from duplications documented for ancestral grass (Poaceae) and core eudicot lineages. Among gymnosperms, we found equivocal evidence for ancient polyploidy in Welwitschia mirabilis (Gnetales) and no evidence for polyploidy in pine, although gymnosperms generally have much larger genomes than the angiosperms investigated. Cross-species sequence divergence estimates suggest that synonymous substitution rates in the basal angiosperms are less than half those previously reported for core eudicots and members of Poaceae. These lower substitution rates permit inference of older duplication events. We hypothesize that evidence of an ancient duplication observed in the Nuphar data may represent a genome duplication in the common ancestor of all or most extant angiosperms, except Amborella.


Asunto(s)
Duplicación de Gen , Genoma de Planta , Magnoliopsida/genética , Cycadopsida/genética , Evolución Molecular , Modelos Genéticos , Modelos Estadísticos , Filogenia , Poliploidía
7.
Bioinformatics ; 20(17): 2973-84, 2004 Nov 22.
Artículo en Inglés | MEDLINE | ID: mdl-15189818

RESUMEN

MOTIVATION: The gene expression intensity information conveyed by (EST) Expressed Sequence Tag data can be used to infer important cDNA library properties, such as gene number and expression patterns. However, EST clustering errors, which often lead to greatly inflated estimates of obtained unique genes, have become a major obstacle in the analyses. The EST clustering error structure, the relationship between clustering error and clustering criteria, and possible error correction methods need to be systematically investigated. RESULTS: We identify and quantify two types of EST clustering error, namely, Type I and II in EST clustering using CAP3 assembling program. A Type I error occurs when ESTs from the same gene do not form a cluster whereas a Type II error occurs when ESTs from distinct genes are falsely clustered together. While the Type II error rate is <1.5% for both 5' and 3' EST clustering, the Type I error in the 5' EST case is approximately 10 times higher than the 3' EST case (30% versus 3%). An over-stringent identity rule, e.g., P >/= 95%, may even inflate the Type I error in both cases. We demonstrate that approximately 80% of the Type I error is due to insufficient overlap among sibling ESTs (ISO error) in 5' EST clustering. A novel statistical approach is proposed to correct ISO error to provide more accurate estimates of the true gene cluster profile.


Asunto(s)
Algoritmos , Arabidopsis/genética , Mapeo Cromosómico/métodos , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Análisis por Conglomerados , Interpretación Estadística de Datos , Variación Genética , Genoma de Planta , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA