RESUMO
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.
Assuntos
Variações do Número de Cópias de DNA/genética , Genética Populacional , Genoma Humano/genética , Genômica , Duplicação Gênica/genética , Predisposição Genética para Doença/genética , Genótipo , Humanos , Mutagênese Insercional/genética , Reprodutibilidade dos Testes , Análise de Sequência de DNA , Deleção de Sequência/genéticaRESUMO
Copy number variations (CNVs) are important in the disease association studies and are usually targeted by most recent microarray platforms developed for GWAS studies. However, the probes targeting the same CNV regions could vary greatly in performance, with some of the probes carrying little information more than pure noise. In this paper, we investigate how to best combine measurements of multiple probes to estimate copy numbers of individuals under the framework of Gaussian mixture model (GMM). First we show that under two regularity conditions and assume all the parameters except the mixing proportions are known, optimal weights can be obtained so that the univariate GMM based on the weighted average gives the exactly the same classification as the multivariate GMM does. We then developed an algorithm that iteratively estimates the parameters and obtains the optimal weights, and uses them for classification. The algorithm performs well on simulation data and two sets of real data, which shows clear advantage over classification based on the equal weighted average.
Assuntos
Variações do Número de Cópias de DNA/genética , Dosagem de Genes , Modelos Genéticos , Modelos Estatísticos , Algoritmos , Análise por Conglomerados , Genoma , Genoma Humano , Estudo de Associação Genômica Ampla , Humanos , Distribuição Normal , Análise de Sequência com Séries de OligonucleotídeosRESUMO
Infection with Plasmodium falciparum can lead to a range of severe to minimal symptoms, occasionally resulting in death in young children or nonimmune adults. In areas of high transmission, older children and adults generally suffer only mild or asymptomatic malaria infections and rarely develop severe disease. The immune features underlying this apparent immunity to severe disease remain elusive. To gain insight into host responses associated with severe and mild malaria, we conducted a longitudinal study of five children who first presented with severe malaria and, 1 month later, with mild malaria. Employing peripheral blood whole-genome profiling, we identified 68 genes that were associated with mild malaria compared to their expression in the severe malaria episode (paired Students t test, P < 0.05). These genes reflect the interferon (IFN) pathway and T cell biology and include IFN-induced protein transcripts 1 to 3, oligoadenylate synthetases 1 and 3, and the T cell markers cathepsin W and perforin. Gene set enrichment analysis identified Gene Ontology (GO) pathways associated with mild malaria to include the type I interferon-mediated signaling pathway (GO 0060337), T cell activation (GO 0042110), and other GO pathways representing many aspects of immune activation. In contrast, only six genes were associated with severe malaria, including thymidine kinase 1, which was recently found to be a biomarker of cerebral malaria susceptibility in the murine model, and carbonic anhydrase, reflecting the blood's abnormal acid base environment during severe disease. These data may provide potential insights to inform pathogenesis models and the development of therapeutics to reduce severe disease outcomes due to P. falciparum infection.
Assuntos
Interferons/imunologia , Interferons/metabolismo , Malária Falciparum/imunologia , Malária Falciparum/patologia , Plasmodium falciparum/imunologia , Plasmodium falciparum/patogenicidade , Criança , Pré-Escolar , Feminino , Perfilação da Expressão Gênica , Humanos , Lactente , Leucócitos Mononucleares/imunologia , Estudos Longitudinais , Ativação Linfocitária , Malaui , Masculino , Transdução de SinaisRESUMO
There are four tests--the likelihood ratio (LR) test, Wald's test, the score test and the exact test--commonly employed in genetic association studies. On comparison of the four tests, we found that Wald's test, popular in genome-wide screens due to its low computational demands, exhibited a paradoxical behaviour in that the test statistic decreased as the effect size of the variant increased, resulting in a loss of power. The LR test always achieved the most significant P-values, followed by the exact test. We further examined the results in a real data set composed of high- and low-cholesterol subjects from the Dallas Heart Study (DHS). We also compared the single-variant LR test with two multi-variant analysis approaches--the burden test and the C-alpha test--in analysing the sequencing data by simulation. Our results call for caution in using Wald's test in genome-wide case-control association studies and suggest that the LR test is a better alternative in spite of its computational demands.
Assuntos
Estudos de Associação Genética/métodos , Doenças Raras/genética , Estudos de Casos e Controles , Estudo de Associação Genômica Ampla/métodos , Genótipo , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Estudos de Validação como AssuntoRESUMO
A common technique to control for confounding factors in practice is by regression adjustment. There are various versions of regression modeling in the literature, and in this paper we considered four approaches often seen in genetic association studies. We carried out both analytical and simulation studies comparing the bias of effect size estimates and examining the test sizes under the null hypothesis of no association between an outcome and an exposure. Further, we compared the methods in a nonsynonymous genome-wide scan for plasma lipoprotein(a) levels using a dataset from the Dallas Heart Study. We found that a widely employed approach that models the covariate-adjusted outcome and the exposure leads to an infranominal test size and underestimation of the exposure effect size. In conclusion, we recommend either using multiple regression models or modeling the covariate-adjusted outcome and the covariate-adjusted exposure to control for confounding factors.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Modelos Genéticos , Fatores de Confusão Epidemiológicos , Bases de Dados Genéticas , Coração , Humanos , Análise de Regressão , TexasRESUMO
A combination of common and rare variants is thought to contribute to genetic susceptibility to complex diseases. Recently, next-generation sequencers have greatly lowered sequencing costs, providing an opportunity to identify rare disease variants in large genetic epidemiology studies. At present, it is still expensive and time consuming to resequence large number of individual genomes. However, given that next-generation sequencing technology can provide accurate estimates of allele frequencies from pooled DNA samples, it is possible to detect associations of rare variants using pooled DNA sequencing. Current statistical approaches to the analysis of associations with rare variants are not designed for use with pooled next-generation sequencing data. Hence, they may not be optimal in terms of both validity and power. Therefore, we propose here a new statistical procedure to analyze the output of pooled sequencing data. The test statistic can be computed rapidly, making it feasible to test the association of a large number of variants with disease. By simulation, we compare this approach to Fisher's exact test based either on pooled or individual genotypic data. Our results demonstrate that the proposed method provides good control of the Type I error rate, while yielding substantially higher power than Fisher's exact test using pooled genotypic data for testing rare variants, and has similar or higher power than that of Fisher's exact test using individual genotypic data. Our results also provide guidelines on how various parameters of the pooled sequencing design affect the efficiency of detecting associations.
Assuntos
Predisposição Genética para Doença , Variação Genética , Modelos Genéticos , Modelos Estatísticos , Análise de Sequência de DNA/métodos , Alelos , Estudos de Casos e Controles , Frequência do Gene , Genética Populacional/métodos , Genótipo , Humanos , FenótipoRESUMO
In genetic association studies a conventional test statistic is proportional to the correlation coefficient between the trait and the variant, with the result that it lacks power to detect association for low-frequency variants. Considering the link between the conventional association test statistics and the linkage disequilibrium measure r(2), we propose a test statistic analogous to the standardized linkage disequilibrium D' to increase the power of detecting association for low-frequency variants. By both simulation and real data analysis we show that the proposed D' test is more powerful than the conventional methods for detecting association for low-frequency variants in a genome-wide setting. The optimal coding strategy for the D' test and its asymptotic properties are also investigated. In summary, we advocate using the D' test in a dominant model as a complementary approach to enhancing the power of detecting association for low-frequency variants with moderate to large effect sizes in case-control genome-wide association studies.
Assuntos
Biologia Computacional/métodos , Estudo de Associação Genômica Ampla/métodos , Algoritmos , Frequência do Gene , Variação Genética , Genoma , Desequilíbrio de Ligação , Modelos Genéticos , Modelos Estatísticos , Polimorfismo de Nucleotídeo ÚnicoRESUMO
BACKGROUND: Better techniques are needed to help consumers make lower calorie food choices. This pilot study examined the effect of menu labeling with caloric information and exercise equivalents (EE) on food selection. Participants, 62 females, ages 18-34, recruited for this study, ordered a fast food meal with menus that contained the names of the food (Lunch 1 (L1), control meal). One week later (Lunch 2 (L2), experiment meal), participants ordered a meal from one of three menus with the same items as the previous week: no calorie information, calorie information only, or calorie information and EE. RESULTS: There were no absolute differences between groups in calories ordered from L1 to L2. However, it is noteworthy that calorie only and calorie plus exercise equivalents ordered about 16% (206 kcal) and 14% (162 kcal) fewer calories from Lunch 1 to Lunch 2, respectively; whereas, the no information group ordered only 2% (25 kcal) fewer. CONCLUSIONS: Menu labeling alone may be insufficient to reduce calories; however, further research is needed in finding the most effective ways of presenting the menu labels for general public.
RESUMO
Endometrial cancer is the most common malignancy of the female genital tract, and the incidence and mortality rates from this disease are increasing. Although endometrial carcinoma has been regarded as a tissue-specific disease mediated by female sex steroid pathways, considerable evidence implicates a role for an inflammatory response in the development and propagation of endometrial cancer. We hypothesized that if specific patterns of cytokine expression were found to be predictive of adverse outcome, then selective receptor targeting may be a therapeutic option. This study was therefore undertaken to determine the relationship between cytokine production in primary cell culture and clinical outcome in endometrial adenocarcinoma. Fresh endometrial tissues were fractionated into epithelial and stromal fractions and cultured. After 6-7 days, supernatants were collected and cells enumerated. Batched aliquots were assayed using ELISA kits specific for CSF-1, GMCSF, G-CSF, TNF-α, IL-6, IL-8, and VEGF. Data were compared using ANOVA, Fisher's exact, and log rank tests. Increased epithelial VEGF production was observed more often in tumors with Type 2 variants (p = 0.039) and when GPR30 receptor expression was high (p = 0.038). Although increased stromal VEGF production was detected more often in grade 3 endometrioid tumors (p = 0.050), when EGFR expression was high (p = 0.003), and/or when ER/PR expression was low (p = 0.048), VEGF production did not correlated with overall survival (OS). Increased epithelial CSF-1 and TNF-α production, respectively, were observed more often in tumors with deep myometrial invasion (p = 0.014) and advanced stage (p = 0.018). Increased CSF-1 (89.5% vs. 42.9%, p = 0.032), TNF-α (88.9% vs. 42.9%, p = 0.032, and IL-6 (92.3% vs. 61.5%, p = 0.052) also correlated with low OS. In Cox multivariate models, CSF-1 was an independent predictor of low survival when stratified by grade (p = 0.046) and histology (p = 0.050), and TNF-α, when stratified by histology (p = 0.037). In this study, high CSF-1, TNF-α, and IL-6 production rates identified patients at greatest risk for death, and may signify patients likely to benefit from receptor-specific therapy.