Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
Stat Med ; 37(13): 2174-2186, 2018 06 15.
Article in English | MEDLINE | ID: mdl-29579785

ABSTRACT

Originally, 2-stage group testing was developed for efficiently screening individuals for a disease. In response to the HIV/AIDS epidemic, 1-stage group testing was adopted for estimating prevalences of a single or multiple traits from testing groups of size q, so individuals were not tested. This paper extends the methodology of 1-stage group testing to surveys with sample weighted complex multistage-cluster designs. Sample weighted-generalized estimating equations are used to estimate the prevalences of categorical traits while accounting for the error rates inherent in the tests. Two difficulties arise when using group testing in complex samples: (1) How does one weight the results of the test on each group as the sample weights will differ among observations in the same group. Furthermore, if the sample weights are related to positivity of the diagnostic test, then group-level weighting is needed to reduce bias in the prevalence estimation; (2) How does one form groups that will allow accurate estimation of the standard errors of prevalence estimates under multistage-cluster sampling allowing for intracluster correlation of the test results. We study 5 different grouping methods to address the weighting and cluster sampling aspects of complex designed samples. Finite sample properties of the estimators of prevalences, variances, and confidence interval coverage for these grouping methods are studied using simulations. National Health and Nutrition Examination Survey data are used to illustrate the methods.


Subject(s)
Confidentiality , Epidemiologic Methods , Prevalence , Statistics as Topic , Cluster Analysis , Health Surveys/methods , Humans , Models, Statistical , Sampling Studies
2.
J Theor Biol ; 403: 68-74, 2016 08 21.
Article in English | MEDLINE | ID: mdl-27181372

ABSTRACT

Genetic risks and genetic models are often used in design and analysis of genetic epidemiology studies. A genetic model is defined in terms of two genetic risk measures: genotype relative risk and odds ratio. The impacts of choosing a risk measure on the resulting genetic models are studied in the power to detect association and deviation from Hardy-Weinberg equilibrium in cases using genetic relative risk. Extensive simulations demonstrate that the power of a study to detect associations using odds ratio is lower than that using relative risk with the same value when other parameters are fixed. When the Hardy-Weinberg equilibrium holds in the general population, the genetic model can be inferred by the deviation from Hardy-Weinberg equilibrium in only cases. Furthermore, it is more efficient than that based on the deviation from Hardy-Weinberg equilibrium in all cases and controls.


Subject(s)
Genetic Predisposition to Disease , Models, Genetic , Computer Simulation , Genetic Loci , Genetic Markers , Humans , Odds Ratio , Risk Factors
3.
Stat Appl Genet Mol Biol ; 14(4): 333-45, 2015 Aug.
Article in English | MEDLINE | ID: mdl-26087068

ABSTRACT

Copy number alteration (CNA) data have been collected to study disease related chromosomal amplifications and deletions. The CUSUM procedure and related plots have been used to explore CNA data. In practice, it is possible to observe outliers. Then, modifications of the CUSUM procedure may be required. An outlier reset modification of the CUSUM (ORCUSUM) procedure is developed in this paper. The threshold value for detecting outliers or significant CUSUMs can be derived using results for sums of independent truncated normal random variables. Bartel's non-parametric test for autocorrelation is also introduced to the analysis of copy number variation data. Our simulation results indicate that the ORCUSUM procedure can still be used even in the situation where the degree of autocorrelation level is low. Furthermore, the results show the outlier's impact on the traditional CUSUM's performance and illustrate the advantage of the ORCUSUM's outlier reset feature. Additionally, we discuss how the ORCUSUM can be applied to examine CNA data with a simulated data set. To illustrate the procedure, recently collected single nucleotide polymorphism (SNP) based CNA data from The Cancer Genome Atlas (TCGA) Research Network is analyzed. The method is applied to a data set collected in an ovarian cancer study. Three cytogenetic bands (cytobands) are considered to illustrate the method. The cytobands 11q13 and 9p21 have been shown to be related to ovarian cancer. They are presented as positive examples. The cytoband 3q22, which is less likely to be disease related, is presented as a negative example. These results illustrate the usefulness of the ORCUSUM procedure as an exploratory tool for the analysis of SNP based CNA data.


Subject(s)
Computational Biology/methods , DNA Copy Number Variations , Genomics/methods , Models, Genetic , Algorithms , Computer Simulation , Humans , Neoplasms/genetics
4.
Biometrics ; 66(1): 196-204, 2010 Mar.
Article in English | MEDLINE | ID: mdl-19432788

ABSTRACT

Hidden population substructure in case-control data has the potential to distort the performance of Cochran-Armitage trend tests (CATTs) for genetic associations. Three possible scenarios that may arise are investigated here: (i) heterogeneity of genotype frequencies across unidentified subpopulations (PSI), (ii) heterogeneity of genotype frequencies and disease risk across unidentified subpopulations (PSII), and (iii) cryptic correlations within unidentified subpopulations. A unified approach is presented for deriving the bias and variance distortion under the three scenarios for any CATT in a general family. Using these analytical formulas, we evaluate the excess type I errors of the CATTs numerically in the presence of population substructure. Our results provide insight into the properties of some proposed corrections for bias and variance distortion and show why they may not fully correct for the effects of population substructure.


Subject(s)
Algorithms , Biometry/methods , Case-Control Studies , Data Interpretation, Statistical , Genetic Linkage/genetics , Genetic Predisposition to Disease/genetics , Genetics, Population , Computer Simulation , Epidemiologic Methods , Humans
5.
Stat Sin ; 20(2): 837-852, 2010 Apr.
Article in English | MEDLINE | ID: mdl-22190840

ABSTRACT

Art auction catalogs provide a pre-sale prediction interval for the price each item is expected to fetch. When the owner consigns art work to the auction house, a reserve price is agreed upon, which is not announced to the bidders. If the highest bid does not reach it, the item is brought in. Since only the prices of the sold items are published, analysts only have a biased sample to examine due to the selective sale process. Relying on the published data leads to underestimating the forecast error of the pre-sale estimates. However, we were able to obtain several art auction catalogs with the highest bids for the unsold items as well as those of the sold items. With these data we were able to evaluate the accuracy of the predictions of the sale prices or highest bids for all item obtained from the original Heckman selection model that assumed normal error distributions as well as those derived from an alternative model using the t(2) distribution, which yielded a noticeably better fit to several sets of auction data. The measures of prediction accuracy are of more than academic interest as they are used by auction participants to guide their bidding or selling strategy, and similar appraisals are accepted by the US Internal Revenue Services to justify the deductions for charitable contributions donors make on their tax returns.

6.
Methods Mol Biol ; 1666: 375-389, 2017.
Article in English | MEDLINE | ID: mdl-28980255

ABSTRACT

Methods for single marker association analysis are presented for binary and quantitative traits. For a binary trait, we focus on the analysis of retrospective case-control data using Pearson's chi-squared test, the trend test and a robust test. For a continuous trait, typical methods are based on a linear regression model or the analysis of variance. We illustrate how these tests can be applied using a publicly available R package "Rassoc" and some existing R functions. Guidelines for single-marker analysis are provided.


Subject(s)
Genetic Association Studies/methods , Polymorphism, Single Nucleotide , Quantitative Trait Loci , Analysis of Variance , Case-Control Studies , Genotype , Humans , Linear Models , Models, Genetic , Penetrance , Phenotype , Software
7.
Genet Test ; 8(4): 437-40, 2004.
Article in English | MEDLINE | ID: mdl-15684877

ABSTRACT

Recently it was found that the frequency of familial dysautonomia (FD) carriers in Ashkenazi Jews (AJ) was higher in AJ of Polish descent compared to AJ of non-Polish descent. The study population was classified into groups ranging from no to full Polish origin. The statistical procedure used to compare the frequencies of FD carriers did not incorporate this intrinsic ordering of individuals by degree of Polish ancestry. In this paper we describe a test designed to utilize this information and show that it is more powerful than the standard test of equality of proportions. In particular, the p value of the trend test on their data is noticeably lower (0.003) than 0.012 found by the standard test, providing stronger evidence for a relationship between allele frequency and Polish descent.


Subject(s)
Data Interpretation, Statistical , Mutation , Dysautonomia, Familial/genetics , Genetic Predisposition to Disease , Humans , Jews/genetics , Poland , Reproducibility of Results , Risk Factors
8.
Methods Mol Biol ; 850: 347-58, 2012.
Article in English | MEDLINE | ID: mdl-22307707

ABSTRACT

Methods for single marker association analysis are presented for binary and quantitative traits. For a binary trait, we focus on the analysis of retrospective case-control data using Pearson's chi-squared test, the trend test, and a robust test. For a continuous trait, typical methods are based on a linear regression model or the analysis of variance. We illustrate how these tests can be applied using a public available R package "Rassoc" and some existing R functions. Guidelines for choosing these test statistics are provided.


Subject(s)
Genetic Association Studies , Models, Genetic , Polymorphism, Single Nucleotide , Case-Control Studies , Humans , Retrospective Studies
9.
J Data Sci ; 8(3): 413-427, 2010 Jul 01.
Article in English | MEDLINE | ID: mdl-20664754

ABSTRACT

It is important to examine the symmetry of an underlying distribution before applying some statistical procedures to a data set. For example, in the Zuni School District case, a formula originally developed by the Department of Education trimmed 5% of the data symmetrically from each end. The validity of this procedure was questioned at the hearing by Chief Justice Roberts. Most tests of symmetry (even nonparametric ones) are not distribution free in finite sample sizes. Hence, using asymptotic distribution may not yield an accurate type I error rate or/and loss of power in small samples. Bootstrap resampling from a symmetric empirical distribution function fitted to the data is proposed to improve the accuracy of the calculated p-value of several tests of symmetry. The results show that the bootstrap method is superior to previously used approaches relying on the asymptotic distribution of the tests that assumed the data come from a normal distribution. Incorporating the bootstrap estimate in a recently proposed test due to Miao, Gel and Gastwirth (2006) preserved its level and shows it has reasonable power properties on the family of distribution evaluated.

10.
Philos Trans A Math Phys Eng Sci ; 366(1874): 2377-88, 2008 Jul 13.
Article in English | MEDLINE | ID: mdl-18407900

ABSTRACT

Observational studies, including the case-control design frequently used in epidemiology, are subject to a number of biases and possible confounding factors. Failure to adjust with them may lead to an erroneous conclusion about the existence of a causal relationship between exposure and disease. The Cochran-Mantel-Haenszel (CMH) test is widely used to measure the strength of the association between an exposure and disease or response, after stratifying on the observed covariates. Thus, observed confounders are accounted for in the analysis. In practice, there may be causal variables that are unknown or difficult to obtain. Hence, they are not incorporated into the analysis. Sensitivity analysis enables investigators to assess the robustness of the findings. A method for assessing the sensitivity of the CMH test to an omitted confounder is presented here. The technique is illustrated by re-examining two datasets: one concerns the effect of maternal hypertension as a risk factor for low birth weight infants and the other focuses on the risk of allopurinol on having a rash. The computer code performing the sensitivity analysis is provided in appendix A.


Subject(s)
Biometry/methods , Allopurinol/adverse effects , Databases, Factual , Drug Eruptions/etiology , Female , Humans , Hypertension/complications , Infant, Low Birth Weight , Infant, Newborn , Maternal-Fetal Exchange , Models, Statistical , Pregnancy , Pregnancy Complications, Cardiovascular , Risk Factors , Sensitivity and Specificity
11.
Stat Med ; 25(18): 3150-9, 2006 Sep 30.
Article in English | MEDLINE | ID: mdl-16397860

ABSTRACT

The Cochran-Armitage trend test has been used in case-control studies for testing genetic association. As the variance of the test statistic is a function of unknown parameters, e.g. disease prevalence and allele frequency, it must be estimated. The usual estimator combining data for cases and controls assumes they follow the same distribution under the null hypothesis. Under the alternative hypothesis, however, the cases and controls follow different distributions. Thus, the power of the trend tests may be affected by the variance estimator used. In particular, the usual method combining both cases and controls is not an asymptotically unbiased estimator of the null variance when the alternative is true. Two different estimates of the null variance are available which are consistent under both the null and alternative hypotheses. In this paper, we examine sample size and small sample power performance of trend tests, which are optimal for three common genetic models as well as a robust trend test based on the three estimates of the variance and provide guidelines for choosing an appropriate test.


Subject(s)
Case-Control Studies , Data Interpretation, Statistical , Genetic Predisposition to Disease , Models, Genetic , Computer Simulation , Humans , Lupus Erythematosus, Systemic/genetics , Polymorphism, Genetic , Sample Size
12.
Am J Hum Genet ; 78(2): 350-6, 2006 Feb.
Article in English | MEDLINE | ID: mdl-16400614

ABSTRACT

Population-based case-control studies are a useful method to test for a genetic association between a trait and a marker. However, the analysis of the resulting data can be affected by population stratification or cryptic relatedness, which may inflate the variance of the usual statistics, resulting in a higher-than-nominal rate of false-positive results. One approach to preserving the nominal type I error is to apply genomic control, which adjusts the variance of the Cochran-Armitage trend test by calculating the statistic on data from null loci. This enables one to estimate any additional variance in the null distribution of statistics. When the underlying genetic model (e.g., recessive, additive, or dominant) is known, genomic control can be applied to the corresponding optimal trend tests. In practice, however, the mode of inheritance is unknown. The genotype-based chi (2) test for a general association between the trait and the marker does not depend on the underlying genetic model. Since this general association test has 2 degrees of freedom (df), the existing formulas for estimating the variance factor by use of genomic control are not directly applicable. By expressing the general association test in terms of two Cochran-Armitage trend tests, one can apply genomic control to each of the two trend tests separately, thereby adjusting the chi (2) statistic. The properties of this robust genomic control test with 2 df are examined by simulation. This genomic control-adjusted 2-df test has control of type I error and achieves reasonable power, relative to the optimal tests for each model.


Subject(s)
Genetic Predisposition to Disease , Genetics, Population/methods , Genome, Human , Models, Genetic , Case-Control Studies , Chromosome Mapping , Data Interpretation, Statistical , Humans , Linkage Disequilibrium , Population/genetics
13.
Biostatistics ; 7(1): 41-57, 2006 Jan.
Article in English | MEDLINE | ID: mdl-15947009

ABSTRACT

We propose a useful protocol for the problem of screening populations for low-prevalence characteristics such as HIV or drugs. Current HIV screening of blood that has been donated for transfusion involves the testing of individual blood units with an inexpensive enzyme-linked immunosorbent assay test and follow-up with a more accurate and more expensive western blot test for only those units that tested positive. Our cost-effective pooling strategy would enhance current methods by making it possible to accurately estimate the sensitivity and specificity of the initial screening test, and the proportion of defective units that have passed through the system. We also provide a method of estimating the distribution of prevalences for the characteristic throughout the population or subpopulations of interest.


Subject(s)
Bayes Theorem , Predictive Value of Tests , Prevalence , Sampling Studies , Seroepidemiologic Studies , AIDS Serodiagnosis/economics , Acquired Immunodeficiency Syndrome/epidemiology , Blood Donors/statistics & numerical data , Humans , Midwestern United States/epidemiology , Monte Carlo Method , Sensitivity and Specificity , Southeastern United States/epidemiology , Southwestern United States/epidemiology
14.
Biostatistics ; 6(2): 201-9, 2005 Apr.
Article in English | MEDLINE | ID: mdl-15772100

ABSTRACT

Trend tests are used to assess the relationship between multiple level treatment X and binary response R. In observational studies, however, there may be a confounder U that is associated with treatment X and causally related to response R. When the data for the confounder U are not observed, an approach for assessing the sensitivity of test results to U is provided. Its use is illustrated by examining data from a study of mutation rate after the Chernobyl accident.


Subject(s)
Chernobyl Nuclear Accident , Confounding Factors, Epidemiologic , Data Interpretation, Statistical , Dose-Response Relationship, Radiation , Germ-Line Mutation/radiation effects , Microsatellite Repeats/radiation effects , Environmental Exposure , Humans , Radiation, Ionizing , Radioactive Fallout , Republic of Belarus , Sensitivity and Specificity
15.
Biometrics ; 61(1): 186-92, 2005 Mar.
Article in English | MEDLINE | ID: mdl-15737092

ABSTRACT

Case-control studies are commonly used to study whether a candidate allele and a disease are associated. However, spurious association can arise due to population substructure or cryptic relatedness, which cause the variance of the trend test to increase. Devlin and Roeder derived the appropriate variance inflation factor (VIF) for the trend test and proposed a novel genomic control (GC) approach to estimate VIF and adjust the test statistic. Their results were derived assuming an additive genetic model and the corresponding VIF is independent of the candidate allele frequency. We determine the appropriate VIFs for recessive and dominant models. Unlike the additive test, the VIFs for the optimal tests for these two models depend on the candidate allele frequency. Simulation results show that, when the null loci used to estimate the VIF have allele frequencies similar to that of the candidate gene, the GC tests derived for recessive and dominant models remain optimal. When the underlying genetic model is unknown or the null loci and candidate gene have quite different allele frequencies, the GC tests derived for the recessive or dominant models cannot be used while the GC test derived for the additive model can be.


Subject(s)
Case-Control Studies , Models, Genetic , Biometry , Gene Frequency , Genes, Dominant , Genes, Recessive , Genomics , Genotype , Humans
16.
Stat Med ; 24(17): 2659-68, 2005 Sep 15.
Article in English | MEDLINE | ID: mdl-16118808

ABSTRACT

The Peters-Belson (PB) method uses regression to assess wage discrimination and can also be used to analyse disparities for a variety of health care issues, e.g. cancer screening. The PB method estimates the proportion of an overall disparity that is not explained by the covariates in the regression, e.g. education, which may be due to discrimination. This method first fits a regression model with individual-level covariates to the majority/advantaged group and then uses the fitted model to estimate the expected values for minority-group members had they been members of the majority group. The data on disparities in health care available to biomedical researchers differ from data used in legal cases as it is often obtained from large-scale studies or surveys with complex sample designs involving stratified multi-stage cluster sampling. Sample surveys with a large representative sample of various racial/ethnic groups and the extensive collection of important social-demographic variables provide excellent sources of data for assessing disparity for a wide range of health behaviours. We extend the PB method for multiple logistic and linear regressions of simple random samples to weighted data from complex designed survey samples. Because of the weighting and complex sample designs, we show how to apply the Taylor linearization method and delete-one-group jackknife methods to obtain estimates of standard errors for the estimated disparity. Data from the 1998 National Health Interview Survey on racial differences in cancer screening among women is used to illustrate the PB method.


Subject(s)
Data Collection , Data Interpretation, Statistical , Delivery of Health Care , Logistic Models , Adult , Black People , Diagnostic Techniques and Procedures , Female , Hispanic or Latino , Humans , Middle Aged , Neoplasms/diagnosis , White People
17.
Stat Med ; 22(21): 3383-401, 2003 Nov 15.
Article in English | MEDLINE | ID: mdl-14566922

ABSTRACT

Unlike randomized experimental studies, investigators do not have control over the treatment assignment in observational studies. Hence, the treated and control (non-treated) groups may have widely different distributions of unobserved covariates. Thus, if observational data are analysed as if they had arisen from a controlled study, the analyses are subject to potential bias. Sensitivity analysis is a technique for assessing whether the inference drawn from a study could be altered by a moderate 'imbalance', between the distribution of the covariates in different groups. In this paper, we examine the sensitivity analysis of the test of proportions in 2 x 2 tables from a new perspective: 'could a non-significant result have occurred because the treated group has a higher prevalence of an unobserved risk factor?'. The study was motivated by an analysis of the studies concerning with the possible effect of spermicide use on birth defects that were cited in a legal decision.


Subject(s)
Models, Statistical , Observer Variation , Risk Assessment/statistics & numerical data , Sensitivity and Specificity , Analysis of Variance , Case-Control Studies , Humans , Limb Deformities, Congenital/etiology , Lung Neoplasms/etiology , Odds Ratio , Risk Factors , Smoking/adverse effects , Spermatocidal Agents/adverse effects , Tranquilizing Agents/adverse effects
18.
Hum Hered ; 55(2-3): 117-24, 2003.
Article in English | MEDLINE | ID: mdl-12931050

ABSTRACT

In 1972, Haseman and Elston proposed a pioneering regression method for mapping quantitative trait loci using randomly selected sib pairs. Recently, the statistical power of their method was shown to be increased when extremely discordant sib pairs are ascertained. While the precise genetic model may not be known, prior information that constrains IBD probabilities is often available. We investigate properties of tests that are robust against model uncertainty and show that the power gain from further constraining IBD probabilities is marginal. The additional linkage information contained in the trait values can be incorporated by combining the Haseman-Elston regression method and a robust allele sharing test.


Subject(s)
Chromosome Mapping/statistics & numerical data , Data Interpretation, Statistical , Quantitative Trait Loci , Siblings , Humans
19.
Med Care ; 42(8): 789-800, 2004 Aug.
Article in English | MEDLINE | ID: mdl-15258481

ABSTRACT

BACKGROUND: Cancer screening rates vary substantially by race and ethnicity. We applied the Peters-Belson approach, often used in wage discrimination studies, to analyze disparities in cancer screening rates between different groups using the 1998 National Health Interview Survey. METHODS: A regression model predicting the probability of getting screened is fit to the majority group and then used to estimate the expected values for minority group members had they been members of the majority group. The average difference between the observed and expected values for a minority group is the part of the disparity that is not explained by the covariates. RESULTS: The observed disparities in colorectal cancer screening (5.88%) and digital rectal screening (8.54%) between white and black men were explained fully by the difference in their covariate distributions. Only half of the disparity in the observed screening rates (13.54% for colorectal and 17.47% for digital rectal) between white and Hispanic men was explained by the difference in covariates between the groups. The entire disparity observed in mammography screening rates for black and Hispanic women (2.71% and 6.53%, respectively) compared with white women was explained by the difference in covariate distributions. CONCLUSIONS: We found that the covariates that explain the disparity in screening rates between the white and the black population do not explain the disparity between the white and the Hispanic population. Knowing how much of a health disparity is explained by measured covariates can be used to develop more effective interventions and policies to eliminate disparity.


Subject(s)
Black People/statistics & numerical data , Breast Neoplasms/diagnosis , Colorectal Neoplasms/diagnosis , Diagnostic Tests, Routine/statistics & numerical data , Health Services Accessibility/statistics & numerical data , Hispanic or Latino/statistics & numerical data , Mass Screening/statistics & numerical data , Patient Acceptance of Health Care/ethnology , White People/statistics & numerical data , Adult , Black People/psychology , Breast Neoplasms/ethnology , Colorectal Neoplasms/ethnology , Endoscopy, Gastrointestinal/statistics & numerical data , Female , Health Care Surveys , Hispanic or Latino/psychology , Humans , Male , Mammography/statistics & numerical data , Middle Aged , National Center for Health Statistics, U.S. , Poverty/ethnology , Prejudice , Probability , Regression Analysis , Socioeconomic Factors , United States , White People/psychology
20.
Genet Epidemiol ; 22(1): 94-102, 2002 Jan.
Article in English | MEDLINE | ID: mdl-11754476

ABSTRACT

Pooling DNA samples can yield efficient estimates of the prevalence of genetic variants. We extend methods of analyzing pooled DNA samples to estimate the joint prevalence of variants at two or more loci. If one has a sample from the general population, one can adapt the method for joint prevalence estimation to estimate allele frequencies and D, the measure of linkage disequilibrium. The parameter D is fundamental in population genetics and in determining the power of association studies. In addition, joint allelic prevalences can be used in case-control studies to estimate the relative risks of disease from joint exposures to the genetic variants. Our methods allow for imperfect assay sensitivity and specificity. The expected savings in numbers of assays required when pooling is utilized compared to individual testing are quantified.


Subject(s)
DNA/genetics , Gene Frequency , Linkage Disequilibrium , Models, Genetic , Alleles , Carrier State , Genetic Variation , Humans , Likelihood Functions , Probability
SELECTION OF CITATIONS
SEARCH DETAIL