Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 44
Filtrar
1.
Am J Respir Crit Care Med ; 207(10): 1324-1333, 2023 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-36921087

RESUMEN

Rationale: Lung disease is the major cause of morbidity and mortality in persons with cystic fibrosis (pwCF). Variability in CF lung disease has substantial non-CFTR (CF transmembrane conductance regulator) genetic influence. Identification of genetic modifiers has prognostic and therapeutic importance. Objectives: Identify genetic modifier loci and genes/pathways associated with pulmonary disease severity. Methods: Whole-genome sequencing data on 4,248 unique pwCF with pancreatic insufficiency and lung function measures were combined with imputed genotypes from an additional 3,592 patients with pancreatic insufficiency from the United States, Canada, and France. This report describes association of approximately 15.9 million SNPs using the quantitative Kulich normal residual mortality-adjusted (KNoRMA) lung disease phenotype in 7,840 pwCF using premodulator lung function data. Measurements and Main Results: Testing included common and rare SNPs, transcriptome-wide association, gene-level, and pathway analyses. Pathway analyses identified novel associations with genes that have key roles in organ development, and we hypothesize that these genes may relate to dysanapsis and/or variability in lung repair. Results confirmed and extended previous genome-wide association study findings. These whole-genome sequencing data provide finely mapped genetic information to support mechanistic studies. No novel primary associations with common single variants or rare variants were found. Multilocus effects at chr5p13 (SLC9A3/CEP72) and chr11p13 (EHF/APIP) were identified. Variant effect size estimates at associated loci were consistently ordered across the cohorts, indicating possible age or birth cohort effects. Conclusions: This premodulator genomic, transcriptomic, and pathway association study of 7,840 pwCF will facilitate mechanistic and postmodulator genetic studies and the development of novel therapeutics for CF lung disease.


Asunto(s)
Fibrosis Quística , Humanos , Fibrosis Quística/genética , Estudio de Asociación del Genoma Completo/métodos , Regulador de Conductancia de Transmembrana de Fibrosis Quística/genética , Gravedad del Paciente , Pulmón , Proteínas Asociadas a Microtúbulos/genética
2.
HGG Adv ; 3(2): 100090, 2022 Apr 14.
Artículo en Inglés | MEDLINE | ID: mdl-35128485

RESUMEN

Cystic fibrosis (CF) is a severe genetic disorder that can cause multiple comorbidities affecting the lungs, the pancreas, the luminal digestive system and beyond. In our previous genome-wide association studies (GWAS), we genotyped approximately 8,000 CF samples using a mixture of different genotyping platforms. More recently, the Cystic Fibrosis Genome Project (CFGP) performed deep (approximately 30×) whole genome sequencing (WGS) of 5,095 samples to better understand the genetic mechanisms underlying clinical heterogeneity among patients with CF. For mixtures of GWAS array and WGS data, genotype imputation has proven effective in increasing effective sample size. Therefore, we first performed imputation for the approximately 8,000 CF samples with GWAS array genotype using the Trans-Omics for Precision Medicine (TOPMed) freeze 8 reference panel. Our results demonstrate that TOPMed can provide high-quality imputation for patients with CF, boosting genomic coverage from approximately 0.3-4.2 million genotyped markers to approximately 11-43 million well-imputed markers, and significantly improving polygenic risk score (PRS) prediction accuracy. Furthermore, we built a CF-specific CFGP reference panel based on WGS data of patients with CF. We demonstrate that despite having approximately 3% the sample size of TOPMed, our CFGP reference panel can still outperform TOPMed when imputing some CF disease-causing variants, likely owing to allele and haplotype differences between patients with CF and general populations. We anticipate our imputed data for 4,656 samples without WGS data will benefit our subsequent genetic association studies, and the CFGP reference panel built from CF WGS samples will benefit other investigators studying CF.

3.
Genes Dev ; 33(19-20): 1381-1396, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-31488579

RESUMEN

Short telomere syndromes manifest as familial idiopathic pulmonary fibrosis; they are the most common premature aging disorders. We used genome-wide linkage to identify heterozygous loss of function of ZCCHC8, a zinc-knuckle containing protein, as a cause of autosomal dominant pulmonary fibrosis. ZCCHC8 associated with TR and was required for telomerase function. In ZCCHC8 knockout cells and in mutation carriers, genomically extended telomerase RNA (TR) accumulated at the expense of mature TR, consistent with a role for ZCCHC8 in mediating TR 3' end targeting to the nuclear RNA exosome. We generated Zcchc8-null mice and found that heterozygotes, similar to human mutation carriers, had TR insufficiency but an otherwise preserved transcriptome. In contrast, Zcchc8-/- mice developed progressive and fatal neurodevelopmental pathology with features of a ciliopathy. The Zcchc8-/- brain transcriptome was highly dysregulated, showing accumulation and 3' end misprocessing of other low-abundance RNAs, including those encoding cilia components as well as the intronless replication-dependent histones. Our data identify a novel cause of human short telomere syndromes-familial pulmonary fibrosis and uncover nuclear exosome targeting as an essential 3' end maturation mechanism that vertebrate TR shares with replication-dependent histones.


Asunto(s)
Proteínas Portadoras/genética , Fibrosis Pulmonar Idiopática/genética , Mutación con Pérdida de Función , Proteínas Nucleares/genética , ARN/metabolismo , Telomerasa/metabolismo , Animales , Encéfalo/enzimología , Encéfalo/fisiopatología , Línea Celular , Cilios/genética , Femenino , Ligamiento Genético , Células HCT116 , Humanos , Fibrosis Pulmonar Idiopática/enzimología , Fibrosis Pulmonar Idiopática/fisiopatología , Masculino , Ratones , Ratones Noqueados , Trastornos del Neurodesarrollo/genética , Linaje , Procesamiento Postranscripcional del ARN/genética , Acortamiento del Telómero/genética
4.
BMC Proc ; 10(Suppl 7): 147-152, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27980627

RESUMEN

Current findings from genetic studies of complex human traits often do not explain a large proportion of the estimated variation of these traits due to genetic factors. This could be, in part, due to overly stringent significance thresholds in traditional statistical methods, such as linear and logistic regression. Machine learning methods, such as Random Forests (RF), are an alternative approach to identify potentially interesting variants. One major issue with these methods is that there is no clear way to distinguish between probable true hits and noise variables based on the importance metric calculated. To this end, we are developing a method called the Relative Recurrency Variable Importance Metric (r2VIM), a RF-based variable selection method. Here, we apply r2VIM to the unrelated Genetic Analysis Workshop 19 data with simulated systolic blood pressure as the phenotype. We compare the number of "true" functional variants identified by r2VIM with those identified by linear regression analyses that use a Bonferroni correction to calculate a significance threshold. Our results show that r2VIM performed comparably to linear regression. Our findings are proof-of-concept for r2VIM, as it identifies a similar number of functional and nonfunctional variants as a more commonly used technique when the optimal importance score threshold is used.

5.
BMC Genet ; 17 Suppl 2: 8, 2016 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-26866982

RESUMEN

High-density genetic marker data, especially sequence data, imply an immense multiple testing burden. This can be ameliorated by filtering genetic variants, exploiting or accounting for correlations between variants, jointly testing variants, and by incorporating informative priors. Priors can be based on biological knowledge or predicted variant function, or even be used to integrate gene expression or other omics data. Based on Genetic Analysis Workshop (GAW) 19 data, this article discusses diversity and usefulness of functional variant scores provided, for example, by PolyPhen2, SIFT, or RegulomeDB annotations. Incorporating functional scores into variant filters or weights and adjusting the significance level for correlations between variants yielded significant associations with blood pressure traits in a large family study of Mexican Americans (GAW19 data set). Marker rs218966 in gene PHF14 and rs9836027 in MAP4 significantly associated with hypertension; additionally, rare variants in SNUPN significantly associated with systolic blood pressure. Variant weights strongly influenced the power of kernel methods and burden tests. Apart from variant weights in test statistics, prior weights may also be used when combining test statistics or to informatively weight p values while controlling false discovery rate (FDR). Indeed, power improved when gene expression data for FDR-controlled informative weighting of association test p values of genes was used. Finally, approaches exploiting variant correlations included identity-by-descent mapping and the optimal strategy for joint testing rare and common variants, which was observed to depend on linkage disequilibrium structure.


Asunto(s)
Variación Genética , Americanos Mexicanos/genética , Proteínas Asociadas a Microtúbulos/genética , Proteínas de Unión a Caperuzas de ARN/genética , Receptores Citoplasmáticos y Nucleares/genética , Factores de Transcripción/genética , Presión Sanguínea/genética , Marcadores Genéticos/genética , Humanos , Hipertensión/genética , Programas Informáticos
6.
Front Public Health ; 2: 112, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25147783

RESUMEN

BACKGROUND: B vitamins play an important role in homocysteine metabolism, with vitamin deficiencies resulting in increased levels of homocysteine and increased risk for stroke. We performed a genome-wide association study (GWAS) in 2,100 stroke patients from the Vitamin Intervention for Stroke Prevention (VISP) trial, a clinical trial designed to determine whether the daily intake of high-dose folic acid, vitamins B6, and B12 reduce recurrent cerebral infarction. METHODS: Extensive quality control (QC) measures resulted in a total of 737,081 SNPs for analysis. Genome-wide association analyses for baseline quantitative measures of folate, Vitamins B12, and B6 were completed using linear regression approaches, implemented in PLINK. RESULTS: Six associations met or exceeded genome-wide significance (P ≤ 5 × 10(-08)). For baseline Vitamin B12, the strongest association was observed with a non-synonymous SNP (nsSNP) located in the CUBN gene (P = 1.76 × 10(-13)). Two additional CUBN intronic SNPs demonstrated strong associations with B12 (P = 2.92 × 10(-10) and 4.11 × 10(-10)), while a second nsSNP, located in the TCN1 gene, also reached genome-wide significance (P = 5.14 × 10(-11)). For baseline measures of Vitamin B6, we identified genome-wide significant associations for SNPs at the ALPL locus (rs1697421; P = 7.06 × 10(-10) and rs1780316; P = 2.25 × 10(-08)). In addition to the six genome-wide significant associations, nine SNPs (two for Vitamin B6, six for Vitamin B12, and one for folate measures) provided suggestive evidence for association (P ≤ 10(-07)). CONCLUSION: Our GWAS study has identified six genome-wide significant associations, nine suggestive associations, and successfully replicated 5 of 16 SNPs previously reported to be associated with measures of B vitamins. The six genome-wide significant associations are located in gene regions that have shown previous associations with measures of B vitamins; however, four of the nine suggestive associations represent novel finding and warrant further investigation in additional populations.

7.
G3 (Bethesda) ; 3(10): 1795-807, 2013 Oct 03.
Artículo en Inglés | MEDLINE | ID: mdl-23979933

RESUMEN

Microarray single-nucleotide polymorphism genotyping, combined with imputation of untyped variants, has been widely adopted as an efficient means to interrogate variation across the human genome. "Genomic coverage" is the total proportion of genomic variation captured by an array, either by direct observation or through an indirect means such as linkage disequilibrium or imputation. We have performed imputation-based genomic coverage assessments of eight current genotyping arrays that assay from ~0.3 to ~5 million variants. Coverage was determined separately in each of the four continental ancestry groups in the 1000 Genomes Project phase 1 release. We used the subset of 1000 Genomes variants present on each array to impute the remaining variants and assessed coverage based on correlation between imputed and observed allelic dosages. More than 75% of common variants (minor allele frequency > 0.05) are covered by all arrays in all groups except for African ancestry, and up to ~90% in all ancestries for the highest density arrays. In contrast, less than 40% of less common variants (0.01 < minor allele frequency < 0.05) are covered by low density arrays in all ancestries and 50-80% in high density arrays, depending on ancestry. We also calculated genome-wide power to detect variant-trait association in a case-control design, across varying sample sizes, effect sizes, and minor allele frequency ranges, and compare these array-based power estimates with a hypothetical array that would type all variants in 1000 Genomes. These imputation-based genomic coverage and power analyses are intended as a practical guide to researchers planning genetic studies.


Asunto(s)
Genoma Humano , Técnicas de Genotipaje/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Sensibilidad y Especificidad
8.
Nat Genet ; 45(2): 197-201, 2013 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-23263489

RESUMEN

Insulin secretion has a crucial role in glucose homeostasis, and failure to secrete sufficient insulin is a hallmark of type 2 diabetes. Genome-wide association studies (GWAS) have identified loci contributing to insulin processing and secretion; however, a substantial fraction of the genetic contribution remains undefined. To examine low-frequency (minor allele frequency (MAF) 0.5-5%) and rare (MAF < 0.5%) nonsynonymous variants, we analyzed exome array data in 8,229 nondiabetic Finnish males using the Illumina HumanExome Beadchip. We identified low-frequency coding variants associated with fasting proinsulin concentrations at the SGSM2 and MADD GWAS loci and three new genes with low-frequency variants associated with fasting proinsulin or insulinogenic index: TBC1D30, KANK1 and PAM. We also show that the interpretation of single-variant and gene-based tests needs to consider the effects of noncoding SNPs both nearby and megabases away. This study demonstrates that exome array genotyping is a valuable approach to identify low-frequency variants that contribute to complex traits.


Asunto(s)
Exoma/genética , Variación Genética , Insulina/genética , Insulina/metabolismo , Proteínas Adaptadoras Transductoras de Señales , Amidina-Liasas/genética , Proteínas del Citoesqueleto , Proteínas Adaptadoras de Señalización del Receptor del Dominio de Muerte/genética , Ayuno/sangre , Finlandia , Frecuencia de los Genes , Genética de Población , Genotipo , Factores de Intercambio de Guanina Nucleótido/genética , Humanos , Secreción de Insulina , Péptidos y Proteínas de Señalización Intracelular/genética , Masculino , Oxigenasas de Función Mixta/genética , Anotación de Secuencia Molecular , Proinsulina/sangre , Proteínas Supresoras de Tumor/genética
9.
BMC Oral Health ; 12: 57, 2012 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-23259602

RESUMEN

BACKGROUND: Over 90% of adults aged 20 years or older with permanent teeth have suffered from dental caries leading to pain, infection, or even tooth loss. Although caries prevalence has decreased over the past decade, there are still about 23% of dentate adults who have untreated carious lesions in the US. Dental caries is a complex disorder affected by both individual susceptibility and environmental factors. Approximately 35-55% of caries phenotypic variation in the permanent dentition is attributable to genes, though few specific caries genes have been identified. Therefore, we conducted the first genome-wide association study (GWAS) to identify genes affecting susceptibility to caries in adults. METHODS: Five independent cohorts were included in this study, totaling more than 7000 participants. For each participant, dental caries was assessed and genetic markers (single nucleotide polymorphisms, SNPs) were genotyped or imputed across the entire genome. Due to the heterogeneity among the five cohorts regarding age, genotyping platform, quality of dental caries assessment, and study design, we first conducted genome-wide association (GWA) analyses on each of the five independent cohorts separately. We then performed three meta-analyses to combine results for: (i) the comparatively younger, Appalachian cohorts (N = 1483) with well-assessed caries phenotype, (ii) the comparatively older, non-Appalachian cohorts (N = 5960) with inferior caries phenotypes, and (iii) all five cohorts (N = 7443). Top ranking genetic loci within and across meta-analyses were scrutinized for biologically plausible roles on caries. RESULTS: Different sets of genes were nominated across the three meta-analyses, especially between the younger and older age cohorts. In general, we identified several suggestive loci (P-value ≤ 10E-05) within or near genes with plausible biological roles for dental caries, including RPS6KA2 and PTK2B, involved in p38-depenedent MAPK signaling, and RHOU and FZD1, involved in the Wnt signaling cascade. Both of these pathways have been implicated in dental caries. ADMTS3 and ISL1 are involved in tooth development, and TLR2 is involved in immune response to oral pathogens. CONCLUSIONS: As the first GWAS for dental caries in adults, this study nominated several novel caries genes for future study, which may lead to better understanding of cariogenesis, and ultimately, to improved disease predictions, prevention, and/or treatment.


Asunto(s)
Susceptibilidad a Caries Dentarias/genética , Caries Dental/genética , Estudio de Asociación del Genoma Completo , Sistema de Señalización de MAP Quinasas/genética , Vía de Señalización Wnt/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Cromosomas Humanos/genética , Índice CPO , Dentición Permanente , Humanos , Persona de Mediana Edad , Adulto Joven
10.
Nat Genet ; 44(6): 642-50, 2012 May 06.
Artículo en Inglés | MEDLINE | ID: mdl-22561516

RESUMEN

We detected clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells with the same abnormal karyotype (>5-10%; presumably of clonal origin) in the presence of normal cells. The frequency of detectable clonal mosaicism in peripheral blood is low (<0.5%) from birth until 50 years of age, after which it rapidly rises to 2-3% in the elderly. Many of the mosaic anomalies are characteristic of those found in hematological cancers and identify common deleted regions with genes previously associated with these cancers. Although only 3% of subjects with detectable clonal mosaicism had any record of hematological cancer before DNA sampling, those without a previous diagnosis have an estimated tenfold higher risk of a subsequent hematological cancer (95% confidence interval = 6-18).


Asunto(s)
Envejecimiento/genética , Aberraciones Cromosómicas , Mosaicismo , Neoplasias/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Preescolar , Mapeo Cromosómico , Variaciones en el Número de Copia de ADN , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Lactante , Recién Nacido , Masculino , Persona de Mediana Edad
11.
Genet Epidemiol ; 35(8): 887-98, 2011 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22125226

RESUMEN

Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient reuse of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of 14 phenotypes for extraction of study samples from each site's DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample and marker quality and various batch effects. Upon completion of the genotyping and QC analyses for each site's primary study, eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset reentered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here, we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II, and also serve as a starting point for investigators merging multiple genotype datasets accessible through the National Center for Biotechnology Information in the database of Genotypes and Phenotypes. Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.


Asunto(s)
Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo/normas , Control de Calidad , Algoritmos , Genotipo , Humanos , National Human Genome Research Institute (U.S.) , Fenotipo , Estados Unidos
12.
Hum Mol Genet ; 20(24): 5012-23, 2011 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-21926416

RESUMEN

We performed a multistage genome-wide association study of melanoma. In a discovery cohort of 1804 melanoma cases and 1026 controls, we identified loci at chromosomes 15q13.1 (HERC2/OCA2 region) and 16q24.3 (MC1R) regions that reached genome-wide significance within this study and also found strong evidence for genetic effects on susceptibility to melanoma from markers on chromosome 9p21.3 in the p16/ARF region and on chromosome 1q21.3 (ARNT/LASS2/ANXA9 region). The most significant single-nucleotide polymorphisms (SNPs) in the 15q13.1 locus (rs1129038 and rs12913832) lie within a genomic region that has profound effects on eye and skin color; notably, 50% of variability in eye color is associated with variation in the SNP rs12913832. Because eye and skin colors vary across European populations, we further evaluated the associations of the significant SNPs after carefully adjusting for European substructure. We also evaluated the top 10 most significant SNPs by using data from three other genome-wide scans. Additional in silico data provided replication of the findings from the most significant region on chromosome 1q21.3 rs7412746 (P = 6 × 10(-10)). Together, these data identified several candidate genes for additional studies to identify causal variants predisposing to increased risk for developing melanoma.


Asunto(s)
Sitios Genéticos/genética , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Melanoma/genética , Neoplasias Cutáneas/genética , Estudios de Casos y Controles , Cromosomas Humanos Par 1/genética , Marcadores Genéticos , Factores de Intercambio de Guanina Nucleótido/genética , Humanos , Metaanálisis como Asunto , Pigmentación/genética , Polimorfismo de Nucleótido Simple/genética , Ubiquitina-Proteína Ligasas
13.
Genet Epidemiol ; 35(6): 469-78, 2011 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-21618603

RESUMEN

Nonsyndromic cleft palate (CP) is a common birth defect with a complex and heterogeneous etiology involving both genetic and environmental risk factors. We conducted a genome-wide association study (GWAS) using 550 case-parent trios, ascertained through a CP case collected in an international consortium. Family-based association tests of single nucleotide polymorphisms (SNP) and three common maternal exposures (maternal smoking, alcohol consumption, and multivitamin supplementation) were used in a combined 2 df test for gene (G) and gene-environment (G × E) interaction simultaneously, plus a separate 1 df test for G × E interaction alone. Conditional logistic regression models were used to estimate effects on risk to exposed and unexposed children. While no SNP achieved genome-wide significance when considered alone, markers in several genes attained or approached genome-wide significance when G × E interaction was included. Among these, MLLT3 and SMC2 on chromosome 9 showed multiple SNPs resulting in an increased risk if the mother consumed alcohol during the peri-conceptual period (3 months prior to conception through the first trimester). TBK1 on chr. 12 and ZNF236 on chr. 18 showed multiple SNPs associated with higher risk of CP in the presence of maternal smoking. Additional evidence of reduced risk due to G × E interaction in the presence of multivitamin supplementation was observed for SNPs in BAALC on chr. 8. These results emphasize the need to consider G × E interaction when searching for genes influencing risk to complex and heterogeneous disorders, such as nonsyndromic CP.


Asunto(s)
Fisura del Paladar/genética , Consumo de Bebidas Alcohólicas , Mapeo Cromosómico , Fisura del Paladar/inducido químicamente , Fisura del Paladar/etiología , Femenino , Interacción Gen-Ambiente , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Masculino , Exposición Materna , Modelos Genéticos , Padres , Polimorfismo de Nucleótido Simple , Embarazo , Riesgo , Vitaminas/uso terapéutico
14.
Curr Protoc Hum Genet ; Chapter 1: Unit1.19, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21234875

RESUMEN

Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the electronic MEdical Records and Genomics (eMERGE) network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. We discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.


Asunto(s)
Estudio de Asociación del Genoma Completo/normas , Programas Informáticos , Registros Electrónicos de Salud , Estudio de Asociación del Genoma Completo/métodos , Genómica , Genotipo , Humanos , Fenotipo , Control de Calidad
15.
G3 (Bethesda) ; 1(6): 505-14, 2011 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-22384361

RESUMEN

Ischemic stroke (IS) is among the leading causes of death in Western countries. There is a significant genetic component to IS susceptibility, especially among young adults. To date, research to identify genetic loci predisposing to stroke has met only with limited success. We performed a genome-wide association (GWA) analysis of early-onset IS to identify potential stroke susceptibility loci. The GWA analysis was conducted by genotyping 1 million SNPs in a biracial population of 889 IS cases and 927 controls, ages 15-49 years. Genotypes were imputed using the HapMap3 reference panel to provide 1.4 million SNPs for analysis. Logistic regression models adjusting for age, recruitment stages, and population structure were used to determine the association of IS with individual SNPs. Although no single SNP reached genome-wide significance (P < 5 × 10(-8)), we identified two SNPs in chromosome 2q23.3, rs2304556 (in FMNL2; P = 1.2 × 10(-7)) and rs1986743 (in ARL6IP6; P = 2.7 × 10(-7)), strongly associated with early-onset stroke. These data suggest that a novel locus on human chromosome 2q23.3 may be associated with IS susceptibility among young adults.

17.
Genet Epidemiol ; 34(6): 591-602, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-20718045

RESUMEN

Genome-wide scans of nucleotide variation in human subjects are providing an increasing number of replicated associations with complex disease traits. Most of the variants detected have small effects and, collectively, they account for a small fraction of the total genetic variance. Very large sample sizes are required to identify and validate findings. In this situation, even small sources of systematic or random error can cause spurious results or obscure real effects. The need for careful attention to data quality has been appreciated for some time in this field, and a number of strategies for quality control and quality assurance (QC/QA) have been developed. Here we extend these methods and describe a system of QC/QA for genotypic data in genome-wide association studies (GWAS). This system includes some new approaches that (1) combine analysis of allelic probe intensities and called genotypes to distinguish gender misidentification from sex chromosome aberrations, (2) detect autosomal chromosome aberrations that may affect genotype calling accuracy, (3) infer DNA sample quality from relatedness and allelic intensities, (4) use duplicate concordance to infer SNP quality, (5) detect genotyping artifacts from dependence of Hardy-Weinberg equilibrium test P-values on allelic frequency, and (6) demonstrate sensitivity of principal components analysis to SNP selection. The methods are illustrated with examples from the "Gene Environment Association Studies" (GENEVA) program. The results suggest several recommendations for QC/QA in the design and execution of GWAS.


Asunto(s)
Estudio de Asociación del Genoma Completo/normas , Genotipo , Aneuploidia , Artefactos , Estudios de Casos y Controles , Aberraciones Cromosómicas , Femenino , Frecuencia de los Genes , Variación Genética , Genética de Población , Estudio de Asociación del Genoma Completo/métodos , Humanos , Neoplasias Pulmonares/genética , Masculino , Polimorfismo de Nucleótido Simple , Control de Calidad , Aberraciones Cromosómicas Sexuales/estadística & datos numéricos , Trastornos Relacionados con Sustancias/genética
18.
Nat Genet ; 42(6): 525-9, 2010 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-20436469

RESUMEN

Case-parent trios were used in a genome-wide association study of cleft lip with and without cleft palate. SNPs near two genes not previously associated with cleft lip with and without cleft palate (MAFB, most significant SNP rs13041247, with odds ratio (OR) per minor allele = 0.704, 95% CI 0.635-0.778, P = 1.44 x 10(-11); and ABCA4, most significant SNP rs560426, with OR = 1.432, 95% CI 1.292-1.587, P = 5.01 x 10(-12)) and two previously identified regions (at chromosome 8q24 and IRF6) attained genome-wide significance. Stratifying trios into European and Asian ancestry groups revealed differences in statistical significance, although estimated effect sizes remained similar. Replication studies from several populations showed confirming evidence, with families of European ancestry giving stronger evidence for markers in 8q24, whereas Asian families showed stronger evidence for association with MAFB and ABCA4. Expression studies support a role for MAFB in palatal development.


Asunto(s)
Transportadoras de Casetes de Unión a ATP/genética , Labio Leporino/genética , Fisura del Paladar/genética , Predisposición Genética a la Enfermedad , Factor de Transcripción MafB/genética , Polimorfismo de Nucleótido Simple , Animales , Pueblo Asiatico/genética , Femenino , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Ratones , Población Blanca/genética
19.
Proc Natl Acad Sci U S A ; 107(16): 7401-6, 2010 Apr 20.
Artículo en Inglés | MEDLINE | ID: mdl-20385819

RESUMEN

We executed a genome-wide association scan for age-related macular degeneration (AMD) in 2,157 cases and 1,150 controls. Our results validate AMD susceptibility loci near CFH (P < 10(-75)), ARMS2 (P < 10(-59)), C2/CFB (P < 10(-20)), C3 (P < 10(-9)), and CFI (P < 10(-6)). We compared our top findings with the Tufts/Massachusetts General Hospital genome-wide association study of advanced AMD (821 cases, 1,709 controls) and genotyped 30 promising markers in additional individuals (up to 7,749 cases and 4,625 controls). With these data, we identified a susceptibility locus near TIMP3 (overall P = 1.1 x 10(-11)), a metalloproteinase involved in degradation of the extracellular matrix and previously implicated in early-onset maculopathy. In addition, our data revealed strong association signals with alleles at two loci (LIPC, P = 1.3 x 10(-7); CETP, P = 7.4 x 10(-7)) that were previously associated with high-density lipoprotein cholesterol (HDL-c) levels in blood. Consistent with the hypothesis that HDL metabolism is associated with AMD pathogenesis, we also observed association with AMD of HDL-c-associated alleles near LPL (P = 3.0 x 10(-3)) and ABCA1 (P = 5.6 x 10(-4)). Multilocus analysis including all susceptibility loci showed that 329 of 331 individuals (99%) with the highest-risk genotypes were cases, and 85% of these had advanced AMD. Our studies extend the catalog of AMD associated loci, help identify individuals at high risk of disease, and provide clues about underlying cellular pathways that should eventually lead to new therapies.


Asunto(s)
Predisposición Genética a la Enfermedad , Lipoproteínas HDL/metabolismo , Degeneración Macular/genética , Inhibidor Tisular de Metaloproteinasa-3/genética , Alelos , Estudios de Casos y Controles , Mapeo Cromosómico , Factor I de Complemento/genética , Variación Genética , Estudio de Asociación del Genoma Completo , Genotipo , Humanos , Polimorfismo de Nucleótido Simple , Análisis de Regresión , Riesgo , Inhibidor Tisular de Metaloproteinasa-3/fisiología
20.
Genet Epidemiol ; 34(4): 364-72, 2010 May.
Artículo en Inglés | MEDLINE | ID: mdl-20091798

RESUMEN

Genome-wide association studies (GWAS) have emerged as powerful means for identifying genetic loci related to complex diseases. However, the role of environment and its potential to interact with key loci has not been adequately addressed in most GWAS. Networks of collaborative studies involving different study populations and multiple phenotypes provide a powerful approach for addressing the challenges in analysis and interpretation shared across studies. The Gene, Environment Association Studies (GENEVA) consortium was initiated to: identify genetic variants related to complex diseases; identify variations in gene-trait associations related to environmental exposures; and ensure rapid sharing of data through the database of Genotypes and Phenotypes. GENEVA consists of several academic institutions, including a coordinating center, two genotyping centers and 14 independently designed studies of various phenotypes, as well as several Institutes and Centers of the National Institutes of Health led by the National Human Genome Research Institute. Minimum detectable effect sizes include relative risks ranging from 1.24 to 1.57 and proportions of variance explained ranging from 0.0097 to 0.02. Given the large number of research participants (N>80,000), an important feature of GENEVA is harmonization of common variables, which allow analyses of additional traits. Environmental exposure information available from most studies also enables testing of gene-environment interactions. Facilitated by its sizeable infrastructure for promoting collaboration, GENEVA has established a unified framework for genotyping, data quality control, analysis and interpretation. By maximizing knowledge obtained through collaborative GWAS incorporating environmental exposure information, GENEVA aims to enhance our understanding of disease etiology, potentially identifying opportunities for intervention.


Asunto(s)
Estudio de Asociación del Genoma Completo , Ambiente , Genotipo , Humanos , Modelos Genéticos , Epidemiología Molecular/métodos , Fenotipo , Polimorfismo Genético , Grupos de Población , Control de Calidad , Sitios de Carácter Cuantitativo , Riesgo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA