Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 98
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Alzheimers Dement ; 20(5): 3290-3304, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38511601

RESUMEN

INTRODUCTION: Genome-wide association studies (GWAS) have identified loci associated with Alzheimer's disease (AD) but did not identify specific causal genes or variants within those loci. Analysis of whole genome sequence (WGS) data, which interrogates the entire genome and captures rare variations, may identify causal variants within GWAS loci. METHODS: We performed single common variant association analysis and rare variant aggregate analyses in the pooled population (N cases = 2184, N controls = 2383) and targeted analyses in subpopulations using WGS data from the Alzheimer's Disease Sequencing Project (ADSP). The analyses were restricted to variants within 100 kb of 83 previously identified GWAS lead variants. RESULTS: Seventeen variants were significantly associated with AD within five genomic regions implicating the genes OARD1/NFYA/TREML1, JAZF1, FERMT2, and SLC24A4. KAT8 was implicated by both single variant and rare variant aggregate analyses. DISCUSSION: This study demonstrates the utility of leveraging WGS to gain insights into AD loci identified via GWAS.


Asunto(s)
Enfermedad de Alzheimer , Estudio de Asociación del Genoma Completo , Secuenciación Completa del Genoma , Humanos , Enfermedad de Alzheimer/genética , Femenino , Masculino , Predisposición Genética a la Enfermedad/genética , Anciano , Polimorfismo de Nucleótido Simple/genética , Variación Genética/genética
2.
Genome Res ; 29(1): 125-134, 2019 01.
Artículo en Inglés | MEDLINE | ID: mdl-30514702

RESUMEN

Genotype imputation is widely used in genome-wide association studies to boost variant density, allowing increased power in association testing. Many studies currently include pedigree data due to increasing interest in rare variants coupled with the availability of appropriate analysis tools. The performance of population-based (subjects are unrelated) imputation methods is well established. However, the performance of family- and population-based imputation methods on family data has been subject to much less scrutiny. Here, we extensively compare several family- and population-based imputation methods on family data of large pedigrees with both European and African ancestry. Our comparison includes many widely used family- and population-based tools and another method, Ped_Pop, which combines family- and population-based imputation results. We also compare four subject selection strategies for full sequencing to serve as the reference panel for imputation: GIGI-Pick, ExomePicks, PRIMUS, and random selection. Moreover, we compare two imputation accuracy metrics: the Imputation Quality Score and Pearson's correlation R 2 for predicting power of association analysis using imputation results. Our results show that (1) GIGI outperforms Merlin; (2) family-based imputation outperforms population-based imputation for rare variants but not for common ones; (3) combining family- and population-based imputation outperforms all imputation approaches for all minor allele frequencies; (4) GIGI-Pick gives the best selection strategy based on the R 2 criterion; and (5) R 2 is the best measure of imputation accuracy. Our study is the first to extensively evaluate the imputation performance of many available family- and population-based tools on the same family data and provides guidelines for future studies.


Asunto(s)
Población Negra/genética , Familia , Genoma Humano , Población Blanca/genética , Femenino , Humanos , Masculino
3.
Brief Bioinform ; 20(1): 245-253, 2019 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-28968627

RESUMEN

Genome-wide association studies have been an important approach used to localize trait loci, with primary focus on common variants. The multiple rare variant-common disease hypothesis may explain the missing heritability remaining after accounting for identified common variants. Advances of sequencing technologies with their decreasing costs, coupled with methodological advances in the context of association studies in large samples, now make the study of rare variants at a genome-wide scale feasible. The resurgence of family-based association designs because of their advantage in studying rare variants has also stimulated more methods development, mainly based on linear mixed models (LMMs). Other tests such as score tests can have advantages over the LMMs, but to date have mainly been proposed for single-marker association tests. In this article, we extend several score tests (χcorrected2, WQLS, and SKAT) to the multiple variant association framework. We evaluate and compare their statistical performances relative with the LMM. Moreover, we show that three tests can be cast as the difference between marker allele frequencies (AFs) estimated in each of the group of affected and unaffected subjects. We show that these tests are flexible, as they can be based on related, unrelated or both related and unrelated subjects. They also make feasible an increasingly common design that only sequences a subset of affected subjects (related or unrelated) and uses for comparison publicly available AFs estimated in a group of healthy subjects. Finally, we show the great impact of linkage disequilibrium on the performance of all these tests.


Asunto(s)
Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Estudios de Casos y Controles , Biología Computacional/métodos , Simulación por Computador , Estudios de Factibilidad , Femenino , Frecuencia de los Genes , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento/estadística & datos numéricos , Humanos , Modelos Lineales , Desequilibrio de Ligamiento , Masculino , Modelos Genéticos , Linaje , Sitios de Carácter Cuantitativo , Análisis de Secuencia de ADN/estadística & datos numéricos
4.
Genomics ; 111(4): 808-818, 2019 07.
Artículo en Inglés | MEDLINE | ID: mdl-29857119

RESUMEN

The Alzheimer's Disease Sequencing Project (ADSP) performed whole genome sequencing (WGS) of 584 subjects from 111 multiplex families at three sequencing centers. Genotype calling of single nucleotide variants (SNVs) and insertion-deletion variants (indels) was performed centrally using GATK-HaplotypeCaller and Atlas V2. The ADSP Quality Control (QC) Working Group applied QC protocols to project-level variant call format files (VCFs) from each pipeline, and developed and implemented a novel protocol, termed "consensus calling," to combine genotype calls from both pipelines into a single high-quality set. QC was applied to autosomal bi-allelic SNVs and indels, and included pipeline-recommended QC filters, variant-level QC, and sample-level QC. Low-quality variants or genotypes were excluded, and sample outliers were noted. Quality was assessed by examining Mendelian inconsistencies (MIs) among 67 parent-offspring pairs, and MIs were used to establish additional genotype-specific filters for GATK calls. After QC, 578 subjects remained. Pipeline-specific QC excluded ~12.0% of GATK and 14.5% of Atlas SNVs. Between pipelines, ~91% of SNV genotypes across all QCed variants were concordant; 4.23% and 4.56% of genotypes were exclusive to Atlas or GATK, respectively; the remaining ~0.01% of discordant genotypes were excluded. For indels, variant-level QC excluded ~36.8% of GATK and 35.3% of Atlas indels. Between pipelines, ~55.6% of indel genotypes were concordant; while 10.3% and 28.3% were exclusive to Atlas or GATK, respectively; and ~0.29% of discordant genotypes were. The final WGS consensus dataset contains 27,896,774 SNVs and 3,133,926 indels and is publicly available.


Asunto(s)
Enfermedad de Alzheimer/genética , Estudio de Asociación del Genoma Completo/normas , Técnicas de Genotipaje/normas , Control de Calidad , Secuenciación Completa del Genoma/normas , Algoritmos , Femenino , Estudio de Asociación del Genoma Completo/métodos , Genotipo , Técnicas de Genotipaje/métodos , Humanos , Masculino , Polimorfismo Genético , Secuenciación Completa del Genoma/métodos
5.
Genet Epidemiol ; 42(6): 500-515, 2018 09.
Artículo en Inglés | MEDLINE | ID: mdl-29862559

RESUMEN

Multipoint linkage analysis is an important approach for localizing disease-associated loci in pedigrees. Linkage analysis, however, is sensitive to misspecification of marker allele frequencies. Pedigrees from recently admixed populations are particularly susceptible to this problem because of the challenge of accurately accounting for population structure. Therefore, increasing emphasis on use of multiethnic samples in genetic studies requires reevaluation of best practices, given data currently available. Typical strategies have been to compute allele frequencies from the sample, or to use marker allele frequencies determined by admixture proportions averaged over the entire sample. However, admixture proportions vary among pedigrees and throughout the genome in a family-specific manner. Here, we evaluate several approaches to model admixture in linkage analysis, providing different levels of detail about ancestral origin. To perform our evaluations, for specification of marker allele frequencies, we used data on 67 Caribbean Hispanic admixed families from the Alzheimer's Disease Sequencing Project. Our results show that choice of admixture model has an effect on the linkage analysis results. Variant-specific admixture proportions, computed for individual families, provide the most detailed regional admixture estimates, and, as such, are the most appropriate allele frequencies for linkage analysis. This likely decreases the number of false-positive results, and is straightforward to implement.


Asunto(s)
Enfermedad de Alzheimer/genética , Pool de Genes , Hispánicos o Latinos/genética , Linaje , Filogenia , Análisis de Secuencia de ADN , Región del Caribe , Etnicidad , Familia , Femenino , Frecuencia de los Genes/genética , Ligamiento Genético , Genética de Población , Humanos , Escala de Lod , Masculino , Modelos Genéticos , Análisis de Componente Principal
6.
Bioinformatics ; 34(9): 1591-1593, 2018 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-29267877

RESUMEN

Summary: Genome-wide association studies have become common over the last ten years, with a shift towards targeting rare variants, especially in pedigree-data. Despite lower costs, sequencing for rare variants still remains expensive. To have a relatively large sample with acceptable cost, imputation approaches may be used, such as GIGI for pedigree data. GIGI is an imputation method that handles large pedigrees and is particularly good for rare variant imputation. GIGI requires a subset of individuals in a pedigree to be fully sequenced, while other individuals are sequenced only at relevant markers. The imputation will infer the missing genotypes at untyped markers. Running GIGI on large pedigrees for large numbers of markers can be very time consuming. We present GIGI-Quick as a method to efficiently split GIGI's input, run GIGI in parallel and efficiently merge the output to reduce the runtime with the number of cores. This allows obtaining imputation results faster, and therefore all subsequent association analyses. Availability and and implementation: GIGI-Quick is open source and publicly available via: https://cse-git.qcri.org/Imputation/GIGI-Quick. Contact: msaad@hbku.edu.qa. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Genotipo , Linaje , Programas Informáticos
7.
Alzheimers Dement ; 15(12): 1524-1532, 2019 12.
Artículo en Inglés | MEDLINE | ID: mdl-31606368

RESUMEN

INTRODUCTION: Although the relationship between APOE and Alzheimer's disease (AD) is well established in populations of European descent, the effects of APOE and ancestry on AD risk in diverse populations is not well understood. METHODS: Logistic mixed model regression and survival analyses were performed in a sample of 3067 Caribbean Hispanics and 3028 individuals of European descent to assess the effects of APOE genotype, local ancestry, and genome-wide ancestry on AD risk and age at onset. RESULTS: Among the Caribbean Hispanics, individuals with African-derived ancestry at APOE had 39% lower odds of AD than individuals with European-derived APOE, after adjusting for APOE genotype, age, and genome-wide ancestry. While APOE E2 and E4 effects on AD risk and age at onset were significant in the Caribbean Hispanics, they were substantially attenuated compared with those in European ancestry individuals. DISCUSSION: These results suggest that additional genetic variation in the APOE region influences AD risk beyond APOE E2/E3/E4.


Asunto(s)
Enfermedad de Alzheimer , Apolipoproteínas E/genética , Población Negra/genética , Hispánicos o Latinos/genética , Población Blanca/genética , Edad de Inicio , Anciano , Enfermedad de Alzheimer/epidemiología , Enfermedad de Alzheimer/genética , Región del Caribe/etnología , Etnicidad/estadística & datos numéricos , Femenino , Genotipo , Humanos , Masculino , Persona de Mediana Edad
8.
Hum Genet ; 137(10): 807-815, 2018 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30276537

RESUMEN

Hundreds of genes have been implicated in autism spectrum disorders (ASDs). In genetically heterogeneous conditions, large families with multiple affected individuals provide strong evidence implicating a rare variant, and replication of the same variant in multiple families is unusual. We previously published linkage analyses and follow-up exome sequencing in seven large families with ASDs, implicating 14 rare exome variants. These included rs200195897, which was transmitted to four affected individuals in one family. We attempted replication of those variants in the MSSNG database. MSSNG is a unique resource for replication of ASD risk loci, containing whole genome sequence (WGS) on thousands of individuals diagnosed with ASDs and family members. For each exome variant, we obtained all carriers and their relatives in MSSNG, using a TDT test to quantify evidence for transmission and association. We replicated the transmission of rs200195897 to four affected individuals in three additional families. rs200195897 was also present in three singleton affected individuals, and no unaffected individuals other than transmitting parents. We identified two additional rare variants (rs566472488 and rs185038034) transmitted with rs200195897 on 1p36.33. Sanger sequencing confirmed the presence of these variants in the original family segregating rs200195897. To our knowledge, this is the first example of a rare haplotype being transmitted with ASD in multiple families. The candidate risk variants include a missense mutation in SAMD11, an intronic variant in NOC2L, and a regulatory region variant close to both genes. NOC2L is a transcription repressor, and several genes involved in transcription regulation have been previously associated with ASDs.


Asunto(s)
Trastorno del Espectro Autista/genética , Proteínas del Ojo/genética , Sitios Genéticos , Haplotipos , Mutación Missense , Polimorfismo Genético , Proteínas Represoras/genética , Femenino , Humanos , Masculino , Factores de Riesgo
9.
Dement Geriatr Cogn Disord ; 45(1-2): 1-17, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29486463

RESUMEN

BACKGROUND/AIMS: The Alzheimer's Disease Sequencing Project (ADSP) aims to identify novel genes influencing Alzheimer's disease (AD). Variants within genes known to cause dementias other than AD have previously been associated with AD risk. We describe evidence of co-segregation and associations between variants in dementia genes and clinically diagnosed AD within the ADSP. METHODS: We summarize the properties of known pathogenic variants within dementia genes, describe the co-segregation of variants annotated as "pathogenic" in ClinVar and new candidates observed in ADSP families, and test for associations between rare variants in dementia genes in the ADSP case-control study. The participants were clinically evaluated for AD, and they represent European, Caribbean Hispanic, and isolate Dutch populations. RESULTS/CONCLUSIONS: Pathogenic variants in dementia genes were predominantly rare and conserved coding changes. Pathogenic variants within ARSA, CSF1R, and GRN were observed, and candidate variants in GRN and CHMP2B were nominated in ADSP families. An independent case-control study provided evidence of an association between variants in TREM2, APOE, ARSA, CSF1R, PSEN1, and MAPT and risk of AD. Variants in genes which cause dementing disorders may influence the clinical diagnosis of AD in a small proportion of cases within the ADSP.


Asunto(s)
Enfermedad de Alzheimer/genética , Demencia/genética , Proteínas del Tejido Nervioso/genética , Anciano , Anciano de 80 o más Años , Enfermedad de Alzheimer/epidemiología , Estudios de Casos y Controles , Estudios de Cohortes , Demencia/epidemiología , Femenino , Variación Genética , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Polimorfismo de Nucleótido Simple , Prevalencia , Análisis de Secuencia de ADN
10.
Am J Hum Genet ; 94(2): 257-67, 2014 Feb 06.
Artículo en Inglés | MEDLINE | ID: mdl-24507777

RESUMEN

The use of large pedigrees is an effective design for identifying rare functional variants affecting heritable traits. Cost-effective studies using sequence data can be achieved via pedigree-based genotype imputation in which some subjects are sequenced and missing genotypes are inferred on the remaining subjects. Because of high cost, it is important to carefully prioritize subjects for sequencing. Here, we introduce a statistical framework that enables systematic comparison among subject-selection choices for sequencing. We introduce a metric "local coverage," which allows the use of inferred inheritance vectors to measure genotype-imputation ability specifically in a region of interest, such as one with prior evidence of linkage. In the absence of linkage information, we can instead use a "genome-wide coverage" metric computed with the pedigree structure. These metrics enable the development of a method that identifies efficient selection choices for sequencing. As implemented in GIGI-Pick, this method also flexibly allows initial manual selection of subjects and optimizes selections within the constraint that only some subjects might be available for sequencing. In the present study, we used simulations to compare GIGI-Pick with PRIMUS, ExomePicks, and common ad hoc methods of selecting subjects. In genotype imputation of both common and rare alleles, GIGI-Pick substantially outperformed all other methods considered and had the added advantage of incorporating prior linkage information. We also used a real pedigree to demonstrate the utility of our approach in identifying causal mutations. Our work enables prioritization of subjects for sequencing to facilitate dissection of the genetic basis of heritable traits.


Asunto(s)
Ligamiento Genético/fisiología , Modelos Genéticos , Linaje , Análisis de Secuencia/métodos , Algoritmos , Alelos , Femenino , Estudios de Asociación Genética , Genotipo , Humanos , Masculino , Cadenas de Markov , Método de Montecarlo , Fenotipo , Polimorfismo de Nucleótido Simple , Programas Informáticos , Estadística como Asunto
11.
Am J Hum Genet ; 92(4): 504-16, 2013 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-23561844

RESUMEN

Recent emergence of the common-disease-rare-variant hypothesis has renewed interest in the use of large pedigrees for identifying rare causal variants. Genotyping with modern sequencing platforms is increasingly common in the search for such variants but remains expensive and often is limited to only a few subjects per pedigree. In population-based samples, genotype imputation is widely used so that additional genotyping is not needed. We now introduce an analogous approach that enables computationally efficient imputation in large pedigrees. Our approach samples inheritance vectors (IVs) from a Markov Chain Monte Carlo sampler by conditioning on genotypes from a sparse set of framework markers. Missing genotypes are probabilistically inferred from these IVs along with observed dense genotypes that are available on a subset of subjects. We implemented our approach in the Genotype Imputation Given Inheritance (GIGI) program and evaluated the approach on both simulated and real large pedigrees. With a real pedigree, we also compared imputed results obtained from this approach with those from the population-based imputation program BEAGLE. We demonstrated that our pedigree-based approach imputes many alleles with high accuracy. It is much more accurate for calling rare alleles than is population-based imputation and does not require an outside reference sample. We also evaluated the effect of varying other parameters, including the marker type and density of the framework panel, threshold for calling genotypes, and population allele frequencies. By leveraging information from existing genotypes already assayed on large pedigrees, our approach can facilitate cost-effective use of sequence data in the pursuit of rare causal variants.


Asunto(s)
Genoma Humano , Genotipo , Modelos Genéticos , Polimorfismo de Nucleótido Simple/genética , Algoritmos , Femenino , Frecuencia de los Genes , Estudio de Asociación del Genoma Completo , Humanos , Masculino , Cadenas de Markov , Método de Montecarlo , Linaje
12.
Am J Hum Genet ; 93(6): 1035-45, 2013 Dec 05.
Artículo en Inglés | MEDLINE | ID: mdl-24268658

RESUMEN

Hypertriglyceridemia (HTG) is a heritable risk factor for cardiovascular disease. Investigating the genetics of HTG may identify new drug targets. There are ~35 known single-nucleotide variants (SNVs) that explain only ~10% of variation in triglyceride (TG) level. Because of the genetic heterogeneity of HTG, a family study design is optimal for identification of rare genetic variants with large effect size because the same mutation can be observed in many relatives and cosegregation with TG can be tested. We considered HTG in a five-generation family of European American descent (n = 121), ascertained for familial combined hyperlipidemia. By using Bayesian Markov chain Monte Carlo joint oligogenic linkage and association analysis, we detected linkage to chromosomes 7 and 17. Whole-exome sequence data revealed shared, highly conserved, private missense SNVs in both SLC25A40 on chr7 and PLD2 on chr17. Jointly, these SNVs explained 49% of the genetic variance in TG; however, only the SLC25A40 SNV was significantly associated with TG (p = 0.0001). This SNV, c.374A>G, causes a highly disruptive p.Tyr125Cys substitution just outside the second helical transmembrane region of the SLC25A40 inner mitochondrial membrane transport protein. Whole-gene testing in subjects from the Exome Sequencing Project confirmed the association between TG and SLC25A40 rare, highly conserved, coding variants (p = 0.03). These results suggest a previously undescribed pathway for HTG and illustrate the power of large pedigrees in the search for rare, causal variants.


Asunto(s)
Exoma , Estudios de Asociación Genética , Ligamiento Genético , Hipertrigliceridemia/genética , Proteínas de Transporte de Membrana Mitocondrial/genética , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Niño , Cromosomas Humanos Par 17 , Cromosomas Humanos Par 7 , Femenino , Predisposición Genética a la Enfermedad , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Hipertrigliceridemia/metabolismo , Masculino , Persona de Mediana Edad , Fenotipo , Polimorfismo de Nucleótido Simple , Triglicéridos/sangre , Adulto Joven
13.
Bioinformatics ; 31(23): 3790-8, 2015 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-26231429

RESUMEN

MOTIVATION: Huge genetic datasets with dense marker panels are now common. With the availability of sequence data and recognition of importance of rare variants, smaller studies based on pedigrees are again also common. Pedigree-based samples often start with a dense marker panel, a subset of which may be used for linkage analysis to reduce computational burden and to limit linkage disequilibrium between single-nucleotide polymorphisms (SNPs). Programs attempting to select markers for linkage panels exist but lack flexibility. RESULTS: We developed a pedigree-based analysis pipeline (PBAP) suite of programs geared towards SNPs and sequence data. PBAP performs quality control, marker selection and file preparation. PBAP sets up files for MORGAN, which can handle analyses for small and large pedigrees, typically human, and results can be used with other programs and for downstream analyses. We evaluate and illustrate its features with two real datasets. AVAILABILITY AND IMPLEMENTATION: PBAP scripts may be downloaded from http://faculty.washington.edu/wijsman/software.shtml. CONTACT: wijsman@uw.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Trastorno del Espectro Autista/genética , Ligamiento Genético , Marcadores Genéticos , Polimorfismo de Nucleótido Simple/genética , Programas Informáticos , Femenino , Humanos , Desequilibrio de Ligamiento , Masculino , Linaje , Control de Calidad
14.
BMC Genet ; 17 Suppl 2: 9, 2016 Feb 03.
Artículo en Inglés | MEDLINE | ID: mdl-26866700

RESUMEN

Participants in the family-based analysis group at Genetic Analysis Workshop 19 addressed diverse topics, all of which used the family data. Topics addressed included questions of study design and data quality control (QC), genotype imputation to augment available sequence data, and linkage and/or association analyses. Results show that pedigree-based tests that are sensitive to genotype error may be useful for QC. Imputation quality improved with inclusion of small amounts of pedigree information used to phase the data in evaluation of 5 commonly used approaches for imputation in samples of (typically) unrelated subjects. It improved still further when pedigree-based imputation using larger pedigrees was also added. An important distinction was made between methods that do versus do not make use of Mendelian transmission in pedigrees, because this serves as a key difference between underlying models and assumptions. Methods that model relatedness generally had higher power in association testing than did analyses that carry out testing in the presence of a transmission model, but this may reflect details of implementation and/or ability of more general methods to jointly include data from larger pedigrees. In either case, for single nucleotide polymorphism-set approaches, weights that incorporate information on functional effects may be more useful than those that are based only on allele frequencies. The overall results demonstrate that family data continue to provide important information in the search for trait loci.


Asunto(s)
Linaje , Polimorfismo de Nucleótido Simple , Exactitud de los Datos , Genotipo , Humanos , Control de Calidad
15.
Genet Epidemiol ; 38(7): 579-90, 2014 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-25132070

RESUMEN

In the last two decades, complex traits have become the main focus of genetic studies. The hypothesis that both rare and common variants are associated with complex traits is increasingly being discussed. Family-based association studies using relatively large pedigrees are suitable for both rare and common variant identification. Because of the high cost of sequencing technologies, imputation methods are important for increasing the amount of information at low cost. A recent family-based imputation method, Genotype Imputation Given Inheritance (GIGI), is able to handle large pedigrees and accurately impute rare variants, but does less well for common variants where population-based methods perform better. Here, we propose a flexible approach to combine imputation data from both family- and population-based methods. We also extend the Sequence Kernel Association Test for Rare and Common variants (SKAT-RC), originally proposed for data from unrelated subjects, to family data in order to make use of such imputed data. We call this extension "famSKAT-RC." We compare the performance of famSKAT-RC and several other existing burden and kernel association tests. In simulated pedigree sequence data, our results show an increase of imputation accuracy from use of our combining approach. Also, they show an increase of power of the association tests with this approach over the use of either family- or population-based imputation methods alone, in the context of rare and common variants. Moreover, our results show better performance of famSKAT-RC compared to the other considered tests, in most scenarios investigated here.


Asunto(s)
Estudios de Asociación Genética , Polimorfismo de Nucleótido Simple , Simulación por Computador , Predisposición Genética a la Enfermedad , Genotipo , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Análisis Multivariante , Linaje , Fenotipo , Programas Informáticos
16.
Genet Epidemiol ; 38(1): 1-9, 2014 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24243664

RESUMEN

Recently, the "Common Disease-Multiple Rare Variants" hypothesis has received much attention, especially with current availability of next-generation sequencing. Family-based designs are well suited for discovery of rare variants, with large and carefully selected pedigrees enriching for multiple copies of such variants. However, sequencing a large number of samples is still prohibitive. Here, we evaluate a cost-effective strategy (pseudosequencing) to detect association with rare variants in large pedigrees. This strategy consists of sequencing a small subset of subjects, genotyping the remaining sampled subjects on a set of sparse markers, and imputing the untyped markers in the remaining subjects conditional on the sequenced subjects and pedigree information. We used a recent pedigree imputation method (GIGI), which is able to efficiently handle large pedigrees and accurately impute rare variants. We used burden and kernel association tests, famWS and famSKAT, which both account for family relationships and heterogeneity of allelic effect for famSKAT only. We simulated pedigree sequence data and compared the power of association tests for pseudosequence data, a subset of sequence data used for imputation, and all subjects sequenced. We also compared, within the pseudosequence data, the power of association test using best-guess genotypes and allelic dosages. Our results show that the pseudosequencing strategy considerably improves the power to detect association with rare variants. They also show that the use of allelic dosages results in much higher power than use of best-guess genotypes in these family-based data. Moreover, famSKAT shows greater power than famWS in most of scenarios we considered.


Asunto(s)
Estudios de Asociación Genética/métodos , Variación Genética/genética , Genotipo , Linaje , Análisis de Secuencia de ADN , Alelos , Estudio de Asociación del Genoma Completo , Haplotipos , Humanos , Desequilibrio de Ligamiento , Modelos Genéticos , Proyectos de Investigación , Análisis de Secuencia de ADN/economía , Programas Informáticos
17.
Genet Epidemiol ; 38(4): 291-9, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-24718985

RESUMEN

Detection of genotyping errors is a necessary step to minimize false results in genetic analysis. This is especially important when the rate of genotyping errors is high, as has been reported for high-throughput sequence data. To detect genotyping errors in pedigrees, Mendelian inconsistent (MI) error checks exist, as do multi-point methods that flag Mendelian consistent (MC) errors for sparse multi-allelic markers. However, few methods exist for detecting MC genotyping errors, particularly for dense variants on large pedigrees. Here, we introduce an efficient method to detect MC errors even for very dense variants (e.g., SNPs and sequencing data) on pedigrees that may be large. Our method first samples inheritance vectors (IVs) using a moderately sparse but informative set of markers using a Markov chain Monte Carlo-based sampler. Using sampled IVs, we considered two test statistics to detect MC genotyping errors: the percentage of IVs inconsistent with observed genotypes (A1) or the posterior probability of error configurations (A2). Using simulations, we show that this method, even with the simpler A1 statistic, is effective for detecting MC genotyping errors in dense variants, with sensitivity almost as high as the theoretical best sensitivity possible. We also evaluate the effectiveness of this method as a function of parameters, when including the observed pattern for genotype, density of framework markers, error rate, allele frequencies, and number of sampled inheritance vectors. Our approach provides a line of defense against false findings based on the use of dense variants in pedigrees.


Asunto(s)
Genotipo , Técnicas de Genotipaje , Linaje , Proyectos de Investigación , Alelos , Humanos , Cadenas de Markov , Método de Montecarlo , Polimorfismo de Nucleótido Simple/genética
18.
Genet Epidemiol ; 38 Suppl 1: S21-8, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25112184

RESUMEN

When analyzing family data, we dream of perfectly informative data, even whole-genome sequences (WGSs) for all family members. Reality intervenes, and we find that next-generation sequencing (NGS) data have errors and are often too expensive or impossible to collect on everyone. The Genetic Analysis Workshop 18 working groups on quality control and dropping WGSs through families using a genome-wide association framework focused on finding, correcting, and using errors within the available sequence and family data, developing methods to infer and analyze missing sequence data among relatives, and testing for linkage and association with simulated blood pressure. We found that single-nucleotide polymorphisms, NGS data, and imputed data are generally concordant but that errors are particularly likely at rare variants, for homozygous genotypes, within regions with repeated sequences or structural variants, and within sequence data imputed from unrelated individuals. Admixture complicated identification of cryptic relatedness, but information from Mendelian transmission improved error detection and provided an estimate of the de novo mutation rate. Computationally, fast rule-based imputation was accurate but could not cover as many loci or subjects as more computationally demanding probability-based methods. Incorporating population-level data into pedigree-based imputation methods improved results. Observed data outperformed imputed data in association testing, but imputed data were also useful. We discuss the strengths and weaknesses of existing methods and suggest possible future directions, such as improving communication between data collectors and data analysts, establishing thresholds for and improving imputation quality, and incorporating error into imputation and analytical models.


Asunto(s)
Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento/normas , Estudios de Asociación Genética , Ligamiento Genético , Genotipo , Humanos , Análisis de la Aleatorización Mendeliana , Linaje , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
19.
Hum Genet ; 134(10): 1055-68, 2015 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-26204995

RESUMEN

Autism spectrum disorders (ASDs) are a group of neurodevelopmental disorders, characterized by impairment in communication and social interactions, and by repetitive behaviors. ASDs are highly heritable, and estimates of the number of risk loci range from hundreds to >1000. We considered 7 extended families (size 12-47 individuals), each with ≥3 individuals affected by ASD. All individuals were genotyped with dense SNP panels. A small subset of each family was typed with whole exome sequence (WES). We used a 3-step approach for variant identification. First, we used family-specific parametric linkage analysis of the SNP data to identify regions of interest. Second, we filtered variants in these regions based on frequency and function, obtaining exactly 200 candidates. Third, we compared two approaches to narrowing this list further. We used information from the SNP data to impute exome variant dosages into those without WES. We regressed affected status on variant allele dosage, using pedigree-based kinship matrices to account for relationships. The p value for the test of the null hypothesis that variant allele dosage is unrelated to phenotype was used to indicate strength of evidence supporting the variant. A cutoff of p = 0.05 gave 28 variants. As an alternative third filter, we required Mendelian inheritance in those with WES, resulting in 70 variants. The imputation- and association-based approach was effective. We identified four strong candidate genes for ASD (SEZ6L, HISPPD1, FEZF1, SAMD11), all of which have been previously implicated in other studies, or have a strong biological argument for their relevance.


Asunto(s)
Trastorno del Espectro Autista/genética , Proteínas del Ojo/genética , Proteínas de la Membrana/genética , Fosfotransferasas (Aceptor del Grupo Fosfato)/genética , Factores de Transcripción/genética , Exoma , Femenino , Frecuencia de los Genes , Genes Dominantes , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , Masculino , Modelos Genéticos , Polimorfismo de Nucleótido Simple , Proteínas Represoras , Análisis de Secuencia de ADN
20.
Hum Hered ; 78(1): 1-8, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24969160

RESUMEN

OBJECTIVES: A particular approach to the visualization of descent of founder DNA copies in a pedigree has been suggested, which helps to understand haplotype sharing patterns among subjects of interest. However, the approach does not provide the information in an ideal format to show haplotype sharing patterns. Therefore, we aimed to find an efficient way to visualize such sharing patterns and to demonstrate that our tool provides useful information for finding an informative subset of subjects for a sequence study. METHODS: The visualization package, SharedHap, computes and visualizes a novel metric, the SharedHap proportion, which quantifies haplotype sharing among a set of subjects of interest. We applied SharedHap to simulated and real pedigree datasets to illustrate the approach. RESULTS: SharedHap successfully represents haplotype sharing patterns that contribute to linkage signals in both simulated and real datasets. Using the visualizations we were also able to find ideal sets of subjects for sequencing studies. CONCLUSIONS: Our novel metric that can be computed using the SharedHap package provides useful information about haplotype sharing patterns among subjects of interest. The visualization of the SharedHap proportion provides useful information in pedigree studies, allowing for a better selection of candidate subjects for use in further sequencing studies.


Asunto(s)
Biología Computacional/métodos , Genética de Población/métodos , Haplotipos , Linaje , Simulación por Computador , Femenino , Efecto Fundador , Humanos , Masculino , Reproducibilidad de los Resultados , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA