Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Stat Med ; 40(30): 6792-6817, 2021 12 30.
Artículo en Inglés | MEDLINE | ID: mdl-34596256

RESUMEN

Post-GWAS analysis, in many cases, focuses on fine-mapping targeted genetic regions discovered at GWAS-stage; that is, the aim is to pinpoint potential causal variants and susceptibility genes for complex traits and disease outcomes using next-generation sequencing (NGS) technologies. Large-scale GWAS cohorts are necessary to identify target regions given the typically modest genetic effect sizes. In this context, two-phase sampling design and analysis is a cost-reduction technique that utilizes data collected during phase 1 GWAS to select an informative subsample for phase 2 sequencing. The main goal is to make inference for genetic variants measured via NGS by efficiently combining data from phases 1 and 2. We propose two approaches for selecting a phase 2 design under a budget constraint. The first method identifies sampling fractions that select a phase 2 design yielding an asymptotic variance covariance matrix with certain optimal characteristics, for example, smallest trace, via Lagrange multipliers (LM). The second relies on a genetic algorithm (GA) with a defined fitness function to identify exactly a phase 2 subsample. We perform comprehensive simulation studies to evaluate the empirical properties of the proposed designs for a genetic association study of a quantitative trait. We compare our methods against two ranked designs: residual-dependent sampling and a recently identified optimal design. Our findings demonstrate that the proposed designs, GA in particular, can render competitive power in combined phase 1 and 2 analysis compared with alternative designs while preserving type 1 error control. These results are especially evident under the more practical scenario where design values need to be defined a priori and are subject to misspecification. We illustrate the proposed methods in a study of triglyceride levels in the North Finland Birth Cohort of 1966. R code to reproduce our results is available at github.com/egosv/TwoPhase_postGWAS.


Asunto(s)
Estudio de Asociación del Genoma Completo , Polimorfismo de Nucleótido Simple , Estudios de Asociación Genética , Genotipo , Humanos , Fenotipo
2.
Comput Electron Agric ; 187: None, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34381288

RESUMEN

Collection of accurate and representative data from agricultural fields is required for efficient crop management. Since growers have limited available resources, there is a need for advanced methods to select representative points within a field in order to best satisfy sampling or sensing objectives. The main purpose of this work was to develop a data-driven method for selecting locations across an agricultural field given observations of some covariates at every point in the field. These chosen locations should be representative of the distribution of the covariates in the entire population and represent the spatial variability in the field. They can then be used to sample an unknown target feature whose sampling is expensive and cannot be realistically done at the population scale. An algorithm for determining these optimal sampling locations, namely the multifunctional matching (MFM) criterion, was based on matching of moments (functionals) between sample and population. The selected functionals in this study were standard deviation, mean, and Kendall's tau. An additional algorithm defined the minimal number of observations that could represent the population according to a desired level of accuracy. The MFM was applied to datasets from two agricultural plots: a vineyard and a peach orchard. The data from the plots included measured values of slope, topographic wetness index, normalized difference vegetation index, and apparent soil electrical conductivity. The MFM algorithm selected the number of sampling points according to a representation accuracy of 90% and determined the optimal location of these points. The algorithm was validated against values of vine or tree water status measured as crop water stress index (CWSI). Algorithm performance was then compared to two other sampling methods: the conditioned Latin hypercube sampling (cLHS) model and a uniform random sample with spatial constraints. Comparison among sampling methods was based on measures of similarity between the target variable population distribution and the distribution of the selected sample. MFM represented CWSI distribution better than the cLHS and the uniform random sampling, and the selected locations showed smaller deviations from the mean and standard deviation of the entire population. The MFM functioned better in the vineyard, where spatial variability was larger than in the orchard. In both plots, the spatial pattern of the selected samples captured the spatial variability of CWSI. MFM can be adjusted and applied using other moments/functionals and may be adopted by other disciplines, particularly in cases where small sample sizes are desired.

3.
Stat Med ; 38(23): 4611-4624, 2019 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-31359448

RESUMEN

In public health research, information that is readily available may be insufficient to address the primary question(s) of interest. One cost-efficient way forward, especially in resource-limited settings, is to conduct a two-phase study in which the population is initially stratified, at phase I, by the outcome and/or some categorical risk factor(s). At phase II detailed covariate data is ascertained on a subsample within each phase I strata. While analysis methods for two-phase designs are well established, they have focused exclusively on settings in which participants are assumed to be independent. As such, when participants are naturally clustered (eg, patients within clinics) these methods may yield invalid inference. To address this, we develop a novel analysis approach based on inverse-probability weighting that permits researchers to specify some working covariance structure and appropriately accounts for the sampling design and ensures valid inference via a robust sandwich estimator for which a closed-form expression is provided. To enhance statistical efficiency, we propose a calibrated inverse-probability weighting estimator that makes use of information available at phase I but not used in the design. In addition to describing the technique, practical guidance is provided for the cluster-correlated data settings that we consider. A comprehensive simulation study is conducted to evaluate small-sample operating characteristics, including the impact of using naïve methods that ignore correlation due to clustering, as well as to investigate design considerations. Finally, the methods are illustrated using data from a one-time survey of the national antiretroviral treatment program in Malawi.


Asunto(s)
Análisis por Conglomerados , Modelos Estadísticos , Proyectos de Investigación , Antirretrovirales/uso terapéutico , Ensayos Clínicos como Asunto , Simulación por Computador , Infecciones por VIH/tratamiento farmacológico , Humanos , Malaui , Programas Nacionales de Salud , Factores de Riesgo
4.
J Am Stat Assoc ; 113(522): 882-892, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30555194

RESUMEN

Large prospective cohort studies of rare chronic diseases require thoughtful planning of study designs, especially for biomarker studies when measurements are based on stored tissue or blood specimens. Two-phase designs, including nested case-control (Thomas, 1977) and case-cohort (Prentice, 1986) sampling designs, provide cost-effective strategies for conducting biomarker evaluation studies. Existing literature for biomarker assessment under two-phase designs largely focuses on simple inverse probability weighting (IPW) estimators (Cai and Zheng, 2011; Liu et al., 2012). Drawing on recent theoretical development on the maximum likelihood estimators for relative risk parameters in two-phase studies (Scheike and Martinussen, 2004; Zeng et al., 2006), we propose nonparametric maximum likelihood based estimators to evaluate the accuracy and predictiveness of a risk prediction biomarker under both types of two-phase designs. In addition, hybrid estimators that combine IPW estimators and maximum likelihood estimation procedure are proposed to improve efficiency and alleviate computational burden. We derive large sample properties of proposed estimators and evaluate their finite sample performance using numerical studies. We illustrate new procedures using a two-phase biomarker study aiming to evaluate the accuracy of a novel biomarker, des-γ-carboxy prothrombin, for early detection of hepatocellular carcinoma (Lok et al., 2010).

5.
Ann Appl Stat ; 11(2): 638-654, 2017 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-28943991

RESUMEN

Cost-effective yet efficient designs are critical to the success of biomarker evaluation research. Two-phase sampling designs, under which expensive markers are only measured on a subsample of cases and non-cases within a prospective cohort, are useful in novel biomarker studies for preserving study samples and minimizing cost of biomarker assaying. Statistical methods for quantifying the predictiveness of biomarkers under two-phase studies have been proposed (Cai and Zheng, 2012; Liu, Cai and Zheng, 2012). These methods are based on a class of inverse probability weighted (IPW) estimators where weights are 'true' sampling weights that simply reflect the sampling strategy of the study. While simple to implement, existing IPW estimators are limited by lack of practicality and efficiency. In this manuscript, we investigate a variety of two-phase design options and provide statistical approaches aimed at improving the efficiency of simple IPW estimators by incorporating auxiliary information available for the entire cohort. We consider accuracy summary estimators that accommodate auxiliary information in the context of evaluating the incremental values of novel biomarkers over existing prediction tools. In addition, we evaluate the relative efficiency of a variety of sampling and estimation options under two-phase studies, shedding light on issues pertaining to both the design and analysis of biomarker validation studies. We apply our methods to the evaluation of a novel biomarker for liver cancer risk conducted with a two-phase nested case control design (Lok et al., 2010).

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA