Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
1.
Biostatistics ; 24(1): 193-208, 2022 12 12.
Artículo en Inglés | MEDLINE | ID: mdl-34269373

RESUMEN

Medical research institutions have generated massive amounts of biological data by genetically profiling hundreds of cancer cell lines. In parallel, academic biology labs have conducted genetic screens on small numbers of cancer cell lines under custom experimental conditions. In order to share information between these two approaches to scientific discovery, this article proposes a "frequentist assisted by Bayes" (FAB) procedure for hypothesis testing that allows auxiliary information from massive genomics datasets to increase the power of hypothesis tests in specialized studies. The exchange of information takes place through a novel probability model for multimodal genomics data, which distills auxiliary information pertaining to cancer cell lines and genes across a wide variety of experimental contexts. If the relevance of the auxiliary information to a given study is high, then the resulting FAB tests can be more powerful than the corresponding classical tests. If the relevance is low, then the FAB tests yield as many discoveries as the classical tests. Simulations and practical investigations demonstrate that the FAB testing procedure can increase the number of effects discovered in genomics studies while still maintaining strict control of type I error and false discovery rate.


Asunto(s)
Pruebas Genéticas , Genómica , Humanos , Teorema de Bayes , Genómica/métodos , Probabilidad
2.
Ann Appl Stat ; 13(1): 321-339, 2019 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-31428218

RESUMEN

Health exams determine a patient's health status by comparing the patient's measurement with a population reference range, a 95% interval derived from a homogeneous reference population. Similarly, most of the established relation among health problems are assumed to hold for the entire population. We use data from the 2009-2010 National Health and Nutrition Examination Survey (NHANES) on four major health problems in the U.S. and apply a joint mean and covariance model to study how the reference ranges and associations of those health outcomes could vary among subpopulations. We discuss guidelines for model selection and evaluation, using standard criteria such as AIC in conjunction with posterior predictive checks. The results from the proposed model can help identify subpopulations in which more data need to be collected to refine the reference range and to study the specific associations among those health problems.

3.
J R Stat Soc Series B Stat Methodol ; 77(1): 35-58, 2015 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-25663813

RESUMEN

Prior specification for non-parametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. A statistician is unlikely to have informed opinions about all aspects of such a parameter but will have real information about functionals of the parameter, such as the population mean or variance. The paper proposes a new framework for non-parametric Bayes inference in which the prior distribution for a possibly infinite dimensional parameter is decomposed into two parts: an informative prior on a finite set of functionals, and a non-parametric conditional prior for the parameter given the functionals. Such priors can be easily constructed from standard non-parametric prior distributions in common use and inherit the large support of the standard priors on which they are based. Additionally, posterior approximations under these informative priors can generally be made via minor adjustments to existing Markov chain approximation algorithms for standard non-parametric prior distributions. We illustrate the use of such priors in the context of multivariate density estimation using Dirichlet process mixture models, and in the modelling of high dimensional sparse contingency tables.

4.
J Am Stat Assoc ; 110(511): 1037-1046, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-27087713

RESUMEN

Relational data are often represented as a square matrix, the entries of which record the relationships between pairs of objects. Many statistical methods for the analysis of such data assume some degree of similarity or dependence between objects in terms of the way they relate to each other. However, formal tests for such dependence have not been developed. We provide a test for such dependence using the framework of the matrix normal model, a type of multivariate normal distribution parameterized in terms of row- and column-specific covariance matrices. We develop a likelihood ratio test (LRT) for row and column dependence based on the observation of a single relational data matrix. We obtain a reference distribution for the LRT statistic, thereby providing an exact test for the presence of row or column correlations in a square relational data matrix. Additionally, we provide extensions of the test to accommodate common features of such data, such as undefined diagonal entries, a non-zero mean, multiple observations, and deviations from normality. Supplementary materials for this article are available online.

5.
Ann Appl Stat ; 9(3): 1169-1193, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-27458495

RESUMEN

A fundamental aspect of relational data, such as from a social network, is the possibility of dependence among the relations. In particular, the relations between members of one pair of nodes may have an effect on the relations between members of another pair. This article develops a type of regression model to estimate such effects in the context of longitudinal and multivariate relational data, or other data that can be represented in the form of a tensor. The model is based on a general multilinear tensor regression model, a special case of which is a tensor autoregression model in which the tensor of relations at one time point are parsimoniously regressed on relations from previous time points. This is done via a separable, or Kronecker-structured, regression parameter along with a separable covariance model. In the context of an analysis of longitudinal multivariate relational data, it is shown how the multilinear tensor regression model can represent patterns that often appear in relational and network data, such as reciprocity and transitivity.

6.
J Am Stat Assoc ; 110(511): 1047-1056, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26848204

RESUMEN

Network analysis is often focused on characterizing the dependencies between network relations and node-level attributes. Potential relationships are typically explored by modeling the network as a function of the nodal attributes or by modeling the attributes as a function of the network. These methods require specification of the exact nature of the association between the network and attributes, reduce the network data to a small number of summary statistics, and are unable provide predictions simultaneously for missing attribute and network information. Existing methods that model the attributes and network jointly also assume the data are fully observed. In this article we introduce a unified approach to analysis that addresses these shortcomings. We use a previously developed latent variable model to obtain a low dimensional representation of the network in terms of node-specific network factors. We introduce a novel testing procedure to determine if dependencies exist between the network factors and attributes as a surrogate for a test of dependence between the network and attributes. We also present a joint model for the network relations and attributes, for use if the hypothesis of independence is rejected, that can capture a variety of dependence patterns and be used to make inference and predictions for missing observations.

7.
Ann Appl Stat ; 8(1): 120-147, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25489353

RESUMEN

Human mortality data sets can be expressed as multiway data arrays, the dimensions of which correspond to categories by which mortality rates are reported, such as age, sex, country and year. Regression models for such data typically assume an independent error distribution or an error model that allows for dependence along at most one or two dimensions of the data array. However, failing to account for other dependencies can lead to inefficient estimates of regression parameters, inaccurate standard errors and poor predictions. An alternative to assuming independent errors is to allow for dependence along each dimension of the array using a separable covariance model. However, the number of parameters in this model increases rapidly with the dimensions of the array and, for many arrays, maximum likelihood estimates of the covariance parameters do not exist. In this paper, we propose a submodel of the separable covariance model that estimates the covariance matrix for each dimension as having factor analytic structure. This model can be viewed as an extension of factor analysis to array-valued data, as it uses a factor model to estimate the covariance along each dimension of the array. We discuss properties of this model as they relate to ordinary factor analysis, describe maximum likelihood and Bayesian estimation methods, and provide a likelihood ratio testing procedure for selecting the factor model ranks. We apply this methodology to the analysis of data from the Human Mortality Database, and show in a cross-validation experiment how it outperforms simpler methods. Additionally, we use this model to impute mortality rates for countries that have no mortality data for several years. Unlike other approaches, our methodology is able to estimate similarities between the mortality rates of countries, time periods and sexes, and use this information to assist with the imputations.

8.
Ann Appl Stat ; 8(1): 19-47, 2014 Mar 01.
Artículo en Inglés | MEDLINE | ID: mdl-25309641

RESUMEN

ANOVA decompositions are a standard method for describing and estimating heterogeneity among the means of a response variable across levels of multiple categorical factors. In such a decomposition, the complete set of main effects and interaction terms can be viewed as a collection of vectors, matrices and arrays that share various index sets defined by the factor levels. For many types of categorical factors, it is plausible that an ANOVA decomposition exhibits some consistency across orders of effects, in that the levels of a factor that have similar main-effect coefficients may also have similar coefficients in higher-order interaction terms. In such a case, estimation of the higher-order interactions should be improved by borrowing information from the main effects and lower-order interactions. To take advantage of such patterns, this article introduces a class of hierarchical prior distributions for collections of interaction arrays that can adapt to the presence of such interactions. These prior distributions are based on a type of array-variate normal distribution, for which a covariance matrix for each factor is estimated. This prior is able to adapt to potential similarities among the levels of a factor, and incorporate any such information into the estimation of the effects in which the factor appears. In the presence of such similarities, this prior is able to borrow information from well-estimated main effects and lower-order interactions to assist in the estimation of higher-order terms for which data information is limited.

9.
Bernoulli (Andover) ; 20(2): 604-622, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25313292

RESUMEN

Often of primary interest in the analysis of multivariate data are the copula parameters describing the dependence among the variables, rather than the univariate marginal distributions. Since the ranks of a multivariate dataset are invariant to changes in the univariate marginal distributions, rank-based estimators are natural candidates for semiparametric copula estimation. Asymptotic information bounds for such estimators can be obtained from an asymptotic analysis of the rank likelihood, i.e. the probability of the multivariate ranks. In this article, we obtain limiting normal distributions of the rank likelihood for Gaussian copula models. Our results cover models with structured correlation matrices, such as exchangeable or circular correlation models, as well as unstructured correlation matrices. For all Gaussian copula models, the limiting distribution of the rank likelihood ratio is shown to be equal to that of a parametric likelihood ratio for an appropriately chosen multivariate normal model. This implies that the semiparametric information bounds for rank-based estimators are the same as the information bounds for estimators based on the full data, and that the multivariate normal distributions are least favorable.

10.
Clin Trials ; 10(1): 63-80, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23345304

RESUMEN

BACKGROUND: Novel dose-finding designs for Phase I cancer clinical trials, using estimation to assign the best estimated Maximum Tolerated Dose (MTD) at each point in the experiment, most prominently via Bayesian techniques, have been widely discussed and promoted since 1990. PURPOSE: To examine the small-sample behavior of these 'Bayesian Phase I' designs, and also of non-Bayesian designs sharing the same main 'Long-Memory' traits of using likelihood estimation and assigning the estimated MTD to the next patient. METHODS: Data from several recently published experiments are presented and discussed, and Long-Memory designs' operating principles are explained. Simulation studies compare the small-sample behavior of Long-Memory designs with short-memory 'Up-and-Down' designs. RESULTS: In simulation, Long-Memory and Up-and-Down designs achieved similar success rates in finding the MTD. However, for all Long-Memory designs examined, the number n (*) of cohorts treated at the true MTD was highly variable between simulated experiments drawn from the same toxicity-threshold distribution. Further investigation using the same set of thresholds in permuted order indicates that this Long-Memory behavior is driven by sensitivity to the order in which participants enter the experiment. This sensitivity is related to Long-Memory designs' 'winner-takes-all' dose-assignment rule, which grants the early cohorts a disproportionately large influence, and causes many experiments to settle early on a specific dose. Additionally for the Bayesian Long-Memory designs, the prior-predictive distribution over the dose levels has a substantial impact upon MTD-finding performance, long into the experiment. LIMITATIONS: While the numerical evidence for Long-Memory designs' order sensitivity is broad, and plausible explanations for it are provided, we do not present a theoretical proof of the phenomenon. CONCLUSIONS: Method developers, analysts, and practitioners should be aware of Long-Memory designs' order sensitivity and related phenomena. In particular, they should be informed that settling on a single dose does not guarantee that this dose is the MTD. Presently, Up-and-Down designs offer a simpler and more robust alternative for the sample sizes of 10-40 patients used in most Phase I trials. Future designs might benefit from combining the two approaches. We also suggest that the field's paradigm change from dose-selection to dose-estimation.


Asunto(s)
Antineoplásicos/administración & dosificación , Ensayos Clínicos Fase I como Asunto/métodos , Neoplasias/tratamiento farmacológico , Teorema de Bayes , Simulación por Computador , Técnicas de Apoyo para la Decisión , Relación Dosis-Respuesta a Droga , Humanos , Dosis Máxima Tolerada , Proyectos de Investigación , Tamaño de la Muestra
11.
J Comput Graph Stat ; 21(4): 901-919, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-27570438

RESUMEN

Network models are widely used in social sciences and genome sciences. The latent space model proposed by (Hoff et al. 2002), and extended by (Handcock et al. 2007) to incorporate clustering, provides a visually interpretable model-based spatial representation of relational data and takes account of several intrinsic network properties. Due to the structure of the likelihood function of the latent space model, the computational cost is of order O(N2), where N is the number of nodes. This makes it infeasible for large networks. In this paper, we propose an approximation of the log likelihood function. We adopt the case-control idea from epidemiology and construct a case-control likelihood which is an unbiased estimator of the full likelihood. Replacing the full likelihood by the case-control likelihood in the MCMC estimation of the latent space model reduces the computational time from O(N2) to O(N), making it feasible for large networks. We evaluate its performance using simulated and real data. We fit the model to a large protein-protein interaction data using the case-control likelihood and use the model fitted link probabilities to identify false positive links.

12.
Int J Biostat ; 7: Article 39, 2011 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-22718676

RESUMEN

It is common for novel dose-finding designs to be presented without a study of their convergence properties. In this article we suggest that examination of convergence is a necessary quality check for dose-finding designs. We present a new convergence proof for a nonparametric family of methods called "interval designs," under certain conditions on the toxicity-frequency function F. We compare these conditions with the convergence conditions for the popular CRM one-parameter Phase I cancer design, via an innovative numerical sensitivity study generating a diverse sample of dose-toxicity scenarios. Only a small fraction of scenarios meet the Shen-O'Quigley convergence conditions for CRM. Conditions for "interval design" convergence are met more often, but still less than half the time. In the discussion, we illustrate how convergence properties and limitations help provide insight about small-sample behavior.


Asunto(s)
Antineoplásicos/administración & dosificación , Cálculo de Dosificación de Drogas , Antineoplásicos/efectos adversos , Ensayos Clínicos Fase I como Asunto/estadística & datos numéricos , Relación Dosis-Respuesta a Droga , Humanos , Modelos Estadísticos , Neoplasias/tratamiento farmacológico
13.
Mod Pathol ; 23(12): 1624-33, 2010 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-20802465

RESUMEN

Approximately 10% of ulcerative colitis patients develop colorectal neoplasia. At present, identification of this subset is markedly limited and necessitates lifelong colonoscopic surveillance for the entire ulcerative colitis population. Better risk markers are needed to focus surveillance onto the patients who are most likely to benefit. Using array-based comparative genomic hybridization, we analyzed single, non-dysplastic biopsies from three patient groups: ulcerative colitis progressors (n=9) with cancer or high-grade dysplasia at a mean distance of 18 cm from the analyzed site; ulcerative colitis non-progressors (n=8) without dysplasia during long-term surveillance; and non-ulcerative colitis normal controls (n=2). Genomic DNA from fresh colonic epithelium purified from stroma was hybridized to 287 (low-density) and 4342 (higher-density) feature bacterial artificial chromosome arrays. Sample-to-reference fluorescence ratios were calculated for individual chromosomal targets and globally across the genome. The low-density arrays yielded pronounced genomic gains and losses in 3 of 9 (33%) ulcerative colitis progressors but in none of the 10 control patients. Identical DNA samples analyzed on the higher-density arrays, using a combination of global and individual high variance assessments, distinguished all nine progressors from all 10 controls. These data confirm that genomic alterations in ulcerative colitis progressors are widespread, even involving single non-dysplastic biopsies that are far distant from neoplasia. They therefore show promise toward eliminating full colonoscopic surveillance with extensive biopsy sampling in the majority of ulcerative colitis patients.


Asunto(s)
Biomarcadores de Tumor/genética , Colitis Ulcerosa/genética , Colitis Ulcerosa/patología , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Adulto , Edad de Inicio , Biopsia , Niño , Preescolar , Cromosomas Artificiales Bacterianos , Colitis Ulcerosa/complicaciones , Hibridación Genómica Comparativa , Progresión de la Enfermedad , Humanos , Hibridación Fluorescente in Situ , Análisis de Secuencia por Matrices de Oligonucleótidos , Adulto Joven
14.
Stat Med ; 28(13): 1805-20, 2009 Jun 15.
Artículo en Inglés | MEDLINE | ID: mdl-19378270

RESUMEN

The percentile-finding experimental design known variously as 'forced-choice fixed-staircase', 'geometric up-and-down' or 'k-in-a-row' (KR) was introduced by Wetherill four decades ago. To date, KR has been by far the most widely used up-and-down (U&D) design for estimating non-median percentiles; it is implemented most commonly in sensory studies. However, its statistical properties have not been fully documented, and the existence of a unique mode in its asymptotic treatment distribution has been recently disputed.Here we revisit the KR design and its basic properties. We find that KR does generate a unique stationary mode near its target percentile, and also displays better operational characteristics than two other U&D designs that have been studied more extensively. Supporting proofs and numerical calculations are presented. A recent experimental example from anesthesiology serves to highlight some of the 'up-and-down' design family's properties and advantages.


Asunto(s)
Biometría/métodos , Anestesiología/estadística & datos numéricos , Anestésicos Intravenosos/administración & dosificación , Anestésicos Intravenosos/efectos adversos , Ensayos Clínicos Fase I como Asunto/estadística & datos numéricos , Humanos , Cadenas de Markov , Modelos Estadísticos , Dolor/inducido químicamente , Dolor/prevención & control , Propofol/administración & dosificación , Propofol/efectos adversos , Tiopental/administración & dosificación
15.
Soc Networks ; 31(3): 204-213, 2009 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-20191087

RESUMEN

Social network data often involve transitivity, homophily on observed attributes, clustering, and heterogeneity of actor degrees. We propose a latent cluster random effects model to represent all of these features, and we describe a Bayesian estimation method for it. The model is applicable to both binary and non-binary network data. We illustrate the model using two real datasets. We also apply it to two simulated network datasets with the same, highly skewed, degree distribution, but very different network behavior: one unstructured and the other with transitivity and clustering. Models based on degree distributions, such as scale-free, preferential attachment and power-law models, cannot distinguish between these very different situations, but our model does.

16.
J Acquir Immune Defic Syndr ; 46(2): 238-44, 2007 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-17693890

RESUMEN

OBJECTIVE: To assess the efficacy of a peer-delivered intervention to promote short-term (6-month) and long-term (12-month) adherence to HAART in a Mozambican clinic population. DESIGN: A 2-arm randomized controlled trial was conducted between October 2004 and June 2006. PARTICIPANTS: Of 350 men and women (> or = 18 years) initiating HAART, 53.7% were female, and 97% were on 1 fixed-dose combination pill twice a day. INTERVENTION: Participants were randomly assigned to receive 6 weeks (Monday through Friday; 30 daily visits) of peer-delivered, modified directly observed therapy (mDOT) or standard care. Peers provided education about treatment and adherence and sought to identify and mitigate adherence barriers. OUTCOME: Participants' self-reported medication adherence was assessed 6 months and 12 months after starting HAART. Adherence was defined as the proportion of prescribed doses taken over the previous 7 days. Statistical analyses were performed using intention-to-treat (missing = failure). RESULTS: Intervention participants, compared to those in standard care, showed significantly higher mean medication adherence at 6 months (92.7% vs. 84.9%, difference 7.8, 95% confidence interval [CI]: 0.0.02, 13.0) and 12 months (94.4% vs. 87.7%, difference 6.8, 95% CI: 0.9, 12.9). There were no between-arm differences in chart-abstracted CD4 counts. CONCLUSIONS: A peer-delivered mDOT program may be an effective strategy to promote long-term adherence among persons initiating HAART in resource-poor settings.


Asunto(s)
Antirretrovirales/uso terapéutico , Terapia por Observación Directa , Infecciones por VIH/tratamiento farmacológico , Infecciones por VIH/epidemiología , VIH-1 , Adulto , Antirretrovirales/administración & dosificación , Terapia Antirretroviral Altamente Activa , Esquema de Medicación , Femenino , Humanos , Masculino , Mozambique/epidemiología , Cooperación del Paciente , Resultado del Tratamiento
17.
Biometrics ; 61(4): 1027-36, 2005 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-16401276

RESUMEN

This article develops a model-based approach to clustering multivariate binary data, in which the attributes that distinguish a cluster from the rest of the population may depend on the cluster being considered. The clustering approach is based on a multivariate Dirichlet process mixture model, which allows for the estimation of the number of clusters, the cluster memberships, and the cluster-specific parameters in a unified way. Such a clustering approach has applications in the analysis of genomic abnormality data, in which the development of different types of tumors may depend on the presence of certain abnormalities at subsets of locations along the genome. Additionally, such a mixture model provides a nonparametric estimation scheme for dependent sequences of binary data.


Asunto(s)
Aberraciones Cromosómicas , Análisis por Conglomerados , Modelos Genéticos , Modelos Estadísticos , Carcinoma de Células Renales/genética , Genoma Humano/genética , Humanos , Cadenas de Markov , Método de Montecarlo
18.
Proc Natl Acad Sci U S A ; 101(26): 9769-73, 2004 Jun 29.
Artículo en Inglés | MEDLINE | ID: mdl-15210940

RESUMEN

Inherited colorectal cancer syndromes in humans exhibit regional specificity for tumor formation. By using mice with germline mutations in the adenomatous polyposis coli gene (Apc) and/or DNA mismatch repair genes, we have analyzed the genetic control of tumor regionality in the mouse small intestine. In C57BL/6 mice heterozygous for the Apc multiple intestinal neoplasia mutation (Apc(Min)), in which tumors are initiated by loss of heterozygosity by means of somatic recombination, tumors form preferentially in the distal region of the small intestine. By contrast, the formation of tumors initiated by allelic silencing on the AKR Apc(Min) genetic background is strongly skewed toward the ileocecal junction. A third tumor regionality is displayed by tumors that develop in MMR-deficient Apc(Min/+) mice, in which mutation of the Apc gene is responsible for tumor initiation. Thus, tumor regionality in the small intestine of Apc(Min/+) reflects the mechanism by which the wild-type allele of Apc is inactivated. We have reexamined the mechanism of Apc loss in tumors from Apc(1638N/+) mice, in which tumors of the small intestine develop in a regional pattern overlapping that of mismatch repair-deficient mice. In contrast to previous reports, we find that tumors from Apc(1638N/+) mice on a congenic C57BL/6 background maintain the wild-type allele of Apc. Our studies demonstrate a pathway-specific regionality for tumor development in mouse models for inherited intestinal cancer, an observation that is reminiscent of the regional preference for tumor development in the human colon. Perhaps, the power of mouse genetics and biology can be harnessed to identify genetic and other factors that contribute to tumor regionality.


Asunto(s)
Proteína de la Poliposis Adenomatosa del Colon/deficiencia , Proteína de la Poliposis Adenomatosa del Colon/metabolismo , Neoplasias Colorrectales/genética , Neoplasias Colorrectales/patología , Mucosa Intestinal/metabolismo , Intestinos/patología , Proteínas Adaptadoras Transductoras de Señales , Proteína de la Poliposis Adenomatosa del Colon/genética , Alelos , Animales , Animales Congénicos , Proteínas Portadoras , Genes APC , Mutación de Línea Germinal/genética , Heterocigoto , Intestinos/anatomía & histología , Ratones , Ratones Endogámicos C57BL , Homólogo 1 de la Proteína MutL , Proteínas de Neoplasias/deficiencia , Proteínas de Neoplasias/genética , Proteínas Nucleares , Especificidad de Órganos , Fenotipo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...