Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 22
Filtrar
1.
Nature ; 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38768635

RESUMO

Rare coding variants that significantly impact function provide insights into the biology of a gene1-3. However, ascertaining their frequency requires large sample sizes4-8. Here, we present a catalogue of human protein-coding variation, derived from exome sequencing of 983,578 individuals across diverse populations. 23% of the Regeneron Genetics Center Million Exome data (RGC-ME) comes from non-European individuals of African, East Asian, Indigenous American, Middle Eastern, and South Asian ancestry. This catalogue includes over 10.4 million missense and 1.1 million predicted loss-of-function (pLOF) variants. We identify individuals with rare biallelic pLOF variants in 4,848 genes, 1,751 of which have not been previously reported. From precise quantitative estimates of selection against heterozygous loss-of-function, we identify 3,988 loss-of-function intolerant genes, including 86 that were previously assessed as tolerant and 1,153 lacking established disease annotation. We also define regions of missense depletion at high resolution. Notably, 1,482 genes have regions depleted of missense variants despite being tolerant to pLOF variants. Finally, we estimate that 3% of individuals have a clinically actionable genetic variant, and that 11,773 variants reported in ClinVar with unknown significance are likely to be deleterious cryptic splice sites. To facilitate variant interpretation and genetics-informed precision medicine, we make this important resource of coding variation from the RGC-ME accessible via a public variant allele frequency browser.

3.
Nature ; 622(7984): 784-793, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37821707

RESUMO

The Mexico City Prospective Study is a prospective cohort of more than 150,000 adults recruited two decades ago from the urban districts of Coyoacán and Iztapalapa in Mexico City1. Here we generated genotype and exome-sequencing data for all individuals and whole-genome sequencing data for 9,950 selected individuals. We describe high levels of relatedness and substantial heterogeneity in ancestry composition across individuals. Most sequenced individuals had admixed Indigenous American, European and African ancestry, with extensive admixture from Indigenous populations in central, southern and southeastern Mexico. Indigenous Mexican segments of the genome had lower levels of coding variation but an excess of homozygous loss-of-function variants compared with segments of African and European origin. We estimated ancestry-specific allele frequencies at 142 million genomic variants, with an effective sample size of 91,856 for Indigenous Mexican ancestry at exome variants, all available through a public browser. Using whole-genome sequencing, we developed an imputation reference panel that outperforms existing panels at common variants in individuals with high proportions of central, southern and southeastern Indigenous Mexican ancestry. Our work illustrates the value of genetic studies in diverse populations and provides foundational imputation and allele frequency resources for future genetic studies in Mexico and in the United States, where the Hispanic/Latino population is predominantly of Mexican descent.


Assuntos
Sequenciamento do Exoma , Genoma Humano , Genótipo , Hispânico ou Latino , Adulto , Humanos , África/etnologia , América/etnologia , Europa (Continente)/etnologia , Frequência do Gene/genética , Genética Populacional , Genoma Humano/genética , Técnicas de Genotipagem , Hispânico ou Latino/genética , Homozigoto , Mutação com Perda de Função/genética , México , Estudos Prospectivos
4.
bioRxiv ; 2023 Nov 02.
Artigo em Inglês | MEDLINE | ID: mdl-37214792

RESUMO

Coding variants that have significant impact on function can provide insights into the biology of a gene but are typically rare in the population. Identifying and ascertaining the frequency of such rare variants requires very large sample sizes. Here, we present the largest catalog of human protein-coding variation to date, derived from exome sequencing of 985,830 individuals of diverse ancestry to serve as a rich resource for studying rare coding variants. Individuals of African, Admixed American, East Asian, Middle Eastern, and South Asian ancestry account for 20% of this Exome dataset. Our catalog of variants includes approximately 10.5 million missense (54% novel) and 1.1 million predicted loss-of-function (pLOF) variants (65% novel, 53% observed only once). We identified individuals with rare homozygous pLOF variants in 4,874 genes, and for 1,838 of these this work is the first to document at least one pLOF homozygote. Additional insights from the RGC-ME dataset include 1) improved estimates of selection against heterozygous loss-of-function and identification of 3,459 genes intolerant to loss-of-function, 83 of which were previously assessed as tolerant to loss-of-function and 1,241 that lack disease annotations; 2) identification of regions depleted of missense variation in 457 genes that are tolerant to loss-of-function; 3) functional interpretation for 10,708 variants of unknown or conflicting significance reported in ClinVar as cryptic splice sites using splicing score thresholds based on empirical variant deleteriousness scores derived from RGC-ME; and 4) an observation that approximately 3% of sequenced individuals carry a clinically actionable genetic variant in the ACMG SF 3.1 list of genes. We make this important resource of coding variation available to the public through a variant allele frequency browser. We anticipate that this report and the RGC-ME dataset will serve as a valuable reference for understanding rare coding variation and help advance precision medicine efforts.

5.
Commun Biol ; 5(1): 540, 2022 06 03.
Artigo em Inglês | MEDLINE | ID: mdl-35661827

RESUMO

To better understand the genetics of hearing loss, we performed a genome-wide association meta-analysis with 125,749 cases and 469,497 controls across five cohorts. We identified 53/c loci affecting hearing loss risk, including common coding variants in COL9A3 and TMPRSS3. Through exome sequencing of 108,415 cases and 329,581 controls, we observed rare coding associations with 11 Mendelian hearing loss genes, including additive effects in known hearing loss genes GJB2 (Gly12fs; odds ratio [OR] = 1.21, P = 4.2 × 10-11) and SLC26A5 (gene burden; OR = 1.96, P = 2.8 × 10-17). We also identified hearing loss associations with rare coding variants in FSCN2 (OR = 1.14, P = 1.9 × 10-15) and KLHDC7B (OR = 2.14, P = 5.2 × 10-30). Our results suggest a shared etiology between Mendelian and common hearing loss in adults. This work illustrates the potential of large-scale exome sequencing to elucidate the genetic architecture of common disorders where both common and rare variation contribute to risk.


Assuntos
Estudo de Associação Genômica Ampla , Perda Auditiva , Exoma/genética , Variação Genética , Estudo de Associação Genômica Ampla/métodos , Perda Auditiva/genética , Humanos , Proteínas de Membrana/genética , Proteínas de Neoplasias/genética , Serina Endopeptidases/genética , Sequenciamento do Exoma
6.
Eur J Neurol ; 29(4): 1174-1180, 2022 04.
Artigo em Inglês | MEDLINE | ID: mdl-34935254

RESUMO

BACKGROUND AND PURPOSE: Muscular A-type lamin-interacting protein (MLIP) is most abundantly expressed in cardiac and skeletal muscle. In vitro and animal studies have shown its regulatory role in myoblast differentiation and in organization of myonuclear positioning in skeletal muscle, as well as in cardiomyocyte adaptation and cardiomyopathy. We report the association of biallelic truncating variation in the MLIP gene with human disease in five individuals from two unrelated pedigrees. METHODS: Clinical evaluation and exome sequencing were performed in two unrelated families with elevated creatine kinase level. RESULTS: Family 1. A 6-year-old girl born to consanguineous parents of Arab-Muslim origin presented with myalgia, early fatigue after mild-to-moderate physical exertion, and elevated creatine kinase levels up to 16,000 U/L. Exome sequencing revealed a novel homozygous nonsense variant, c.2530C>T; p.Arg844Ter, in the MLIP gene. Family 2. Three individuals from two distantly related families of Old Order Amish ancestry presented with elevated creatine kinase levels, one of whom also presented with abnormal electrocardiography results. On exome sequencing, all showed homozygosity for a novel nonsense MLIP variant c.1825A>T; p.Lys609Ter. Another individual from this pedigree, who had sinus arrhythmia and for whom creatine kinase level was not available, was also homozygous for this variant. CONCLUSIONS: Our findings suggest that biallelic truncating variants in MLIP result in myopathy characterized by hyperCKemia. Moreover, these cases of MLIP-related disease may indicate that at least in some instances this condition is associated with muscle decompensation and fatigability during low-to-moderate intensity muscle exertion as well as possible cardiac involvement.


Assuntos
Cardiomiopatias , Doenças Musculares , Adaptação Fisiológica , Animais , Humanos , Doenças Musculares/genética , Mialgia , Linhagem
7.
Genet Epidemiol ; 45(6): 664-681, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34184762

RESUMO

Serum alanine aminotransferase (ALT) and aspartate aminotransferase (AST) are biomarkers for liver health. Here we report the largest genome-wide association analysis to date of serum ALT and AST levels in over 388k people of European ancestry from UK biobank and DiscovEHR. Eleven million imputed markers with a minor allele frequency (MAF) ≥ 0.5% were analyzed. Overall, 300 ALT and 336 AST independent genome-wide significant associations were identified. Among them, 81 ALT and 61 AST associations are reported for the first time. Genome-wide interaction study identified 9 ALT and 12 AST independent associations significantly modified by body mass index (BMI), including several previously reported potential liver disease therapeutic targets, for example, PNPLA3, HSD17B13, and MARC1. While further work is necessary to understand the effect of ALT and AST-associated variants on liver disease, the weighted burden of significant BMI-modified signals is significantly associated with liver disease outcomes. In summary, this study identifies genetic associations which offer an important step forward in understanding the genetic architecture of serum ALT and AST levels. Significant interactions between BMI and genetic loci not only highlight the important role of adiposity in liver damage but also shed light on the genetic etiology of liver disease in obese individuals.


Assuntos
Alanina Transaminase/sangue , Aspartato Aminotransferases/sangue , Índice de Massa Corporal , Estudo de Associação Genômica Ampla , Humanos
8.
Sci Rep ; 11(1): 5595, 2021 03 10.
Artigo em Inglês | MEDLINE | ID: mdl-33692434

RESUMO

Inflammatory bowel disease (IBD), clinically defined as Crohn's disease (CD), ulcerative colitis (UC), or IBD-unclassified, results in chronic inflammation of the gastrointestinal tract in genetically susceptible hosts. Pediatric onset IBD represents ≥ 25% of all IBD diagnoses and often presents with intestinal stricturing, perianal disease, and failed response to conventional treatments. NOD2 was the first and is the most replicated locus associated with adult IBD, to date. However, its role in pediatric onset IBD is not well understood. We performed whole-exome sequencing on a cohort of 1,183 patients with pediatric onset IBD (ages 0-18.5 years). We identified 92 probands with biallelic rare and low frequency NOD2 variants accounting for approximately 8% of our cohort, suggesting a Mendelian inheritance pattern of disease. Additionally, we investigated the contribution of recessive inheritance of NOD2 alleles in adult IBD patients from a large clinical population cohort. We found that recessive inheritance of NOD2 variants explains ~ 7% of cases in this adult IBD cohort, including ~ 10% of CD cases, confirming the observations from our pediatric IBD cohort. Exploration of EHR data showed that several of these adult IBD patients obtained their initial IBD diagnosis before 18 years of age, consistent with early onset disease. While it has been previously reported that carriers of more than one NOD2 risk alleles have increased susceptibility to Crohn's Disease (CD), our data formally demonstrate that recessive inheritance of NOD2 alleles is a mechanistic driver of early onset IBD, specifically CD, likely due to loss of NOD2 protein function. Collectively, our findings show that recessive inheritance of rare and low frequency deleterious NOD2 variants account for 7-10% of CD cases and implicate NOD2 as a Mendelian disease gene for early onset Crohn's Disease.


Assuntos
Colite Ulcerativa/genética , Doença de Crohn/genética , Mutação , Proteína Adaptadora de Sinalização NOD2/genética , Adolescente , Adulto , Idade de Início , Criança , Pré-Escolar , Colite Ulcerativa/metabolismo , Doença de Crohn/metabolismo , Feminino , Humanos , Lactente , Recém-Nascido , Masculino
9.
Mol Genet Metab Rep ; 26: 100699, 2021 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-33457206

RESUMO

Iron­sulfur clusters (FeSCs) are vital components of a variety of essential proteins, most prominently within mitochondrial respiratory chain complexes I-III; Fe-S assembly and distribution is performed via multi-step pathways. Variants affecting several proteins in these pathways have been described in genetic disorders, including severe mitochondrial disease. Here we describe a Christian Arab kindred with two infants that died due to mitochondrial disorder involving Fe-S containing respiratory chain complexes and a third sibling who survived the initial crisis. A homozygous missense variant in NFS1: c.215G>A; p.Arg72Gln was detected by whole exome sequencing. The NFS1 gene encodes a cysteine desulfurase, which, in complex with ISD11 and ACP, initiates the first step of Fe-S formation. Arginine at position 72 plays a role in NFS1-ISD11 complex formation; therefore, its substitution with glutamine is expected to affect complex stability and function. Interestingly, this is the only pathogenic variant ever reported in the NFS1 gene, previously described once in an Old Order Mennonite family presenting a similar phenotype with intra-familial variability in patient outcomes. Analysis of datasets from both populations did not show a common haplotype, suggesting this variant is a recurrent de novo variant. Our report of the second case of NFS1-related mitochondrial disease corroborates the pathogenicity of this recurring variant and implicates it as a hot-spot variant. While the genetic resolution allows for prenatal diagnosis for the family, it also raises critical clinical questions regarding follow-up and possible treatment options of severely affected and healthy homozygous individuals with mitochondrial co-factor therapy or cysteine supplementation.

10.
HGG Adv ; 2(3): 100039, 2021 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-35047837

RESUMO

Parent-of-origin (PoO) effects refer to the differential phenotypic impacts of genetic variants dependent on their parental inheritance due to imprinting. While PoO effects can influence complex traits, they may be poorly captured by models that do not differentiate the parental origin of the variant. The aim of this study was to conduct a genome-wide screen for PoO effects on a broad range of clinical traits derived from electronic health records (EHR) in the DiscovEHR study enriched with familial relationships. Using pairwise kinship estimates from genetic data and demographic data, we identified 22,051 offspring among 134,049 individuals in the DiscovEHR study. PoO of ~9 million variants was assigned in the offspring by comparing offspring and parental genotypes and haplotypes. We then performed genome-wide PoO association analyses across 154 quantitative and 611 binary traits extracted from EHR. Of the 732 significant PoO associations identified (p < 5 × 10-8), we attempted to replicate 274 PoO associations in the UK Biobank study with 5,015 offspring and replicated 9 PoO associations (p < 0.05). In summary, our study implements a bioinformatic and statistical approach to examine PoO effects genome-wide in a large population study enriched with familial relationships and systematically characterizes PoO effects on hundreds of clinical traits derived from EHR. Our results suggest that, while the statistical power to detect PoO effects remains modest yet, accurately modeling PoO effects has the potential to find new associations that may have been missed by the standard additive model, further enhancing the mechanistic understanding of genetic influence on complex traits.

11.
Nature ; 586(7831): 749-756, 2020 10.
Artigo em Inglês | MEDLINE | ID: mdl-33087929

RESUMO

The UK Biobank is a prospective study of 502,543 individuals, combining extensive phenotypic and genotypic data with streamlined access for researchers around the world1. Here we describe the release of exome-sequence data for the first 49,960 study participants, revealing approximately 4 million coding variants (of which around 98.6% have a frequency of less than 1%). The data include 198,269 autosomal predicted loss-of-function (LOF) variants, a more than 14-fold increase compared to the imputed sequence. Nearly all genes (more than 97%) had at least one carrier with a LOF variant, and most genes (more than 69%) had at least ten carriers with a LOF variant. We illustrate the power of characterizing LOF variants in this population through association analyses across 1,730 phenotypes. In addition to replicating established associations, we found novel LOF variants with large effects on disease traits, including PIEZO1 on varicose veins, COL6A1 on corneal resistance, MEPE on bone density, and IQGAP2 and GMPR on blood cell traits. We further demonstrate the value of exome sequencing by surveying the prevalence of pathogenic variants of clinical importance, and show that 2% of this population has a medically actionable variant. Furthermore, we characterize the penetrance of cancer in carriers of pathogenic BRCA1 and BRCA2 variants. Exome sequences from the first 49,960 participants highlight the promise of genome sequencing in large population-based studies and are now accessible to the scientific community.


Assuntos
Bases de Dados Genéticas , Sequenciamento do Exoma , Exoma/genética , Mutação com Perda de Função/genética , Fenótipo , Idoso , Densidade Óssea/genética , Colágeno Tipo VI/genética , Demografia , Feminino , Genes BRCA1 , Genes BRCA2 , Genótipo , Humanos , Canais Iônicos/genética , Masculino , Pessoa de Meia-Idade , Neoplasias/genética , Penetrância , Fragmentos de Peptídeos/genética , Reino Unido , Varizes/genética , Proteínas Ativadoras de ras GTPase/genética
12.
Eur J Hum Genet ; 28(9): 1243-1264, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32376988

RESUMO

Previously we reported the identification of a homozygous COL27A1 (c.2089G>C; p.Gly697Arg) missense variant and proposed it as a founder allele in Puerto Rico segregating with Steel syndrome (STLS, MIM #615155); a rare osteochondrodysplasia characterized by short stature, congenital bilateral hip dysplasia, carpal coalitions, and scoliosis. We now report segregation of this variant in five probands from the initial clinical report defining the syndrome and an additional family of Puerto Rican descent with multiple affected adult individuals. We modeled the orthologous variant in murine Col27a1 and found it recapitulates some of the major Steel syndrome associated skeletal features including reduced body length, scoliosis, and a more rounded skull shape. Characterization of the in vivo murine model shows abnormal collagen deposition in the extracellular matrix and disorganization of the proliferative zone of the growth plate. We report additional COL27A1 pathogenic variant alleles identified in unrelated consanguineous Turkish kindreds suggesting Clan Genomics and identity-by-descent homozygosity contributing to disease in this population. The hypothesis that carrier states for this autosomal recessive osteochondrodysplasia may contribute to common complex traits is further explored in a large clinical population cohort. Our findings augment our understanding of COL27A1 biology and its role in skeletal development; and expand the functional allelic architecture in this gene underlying both rare and common disease phenotypes.


Assuntos
Anormalidades Múltiplas/genética , Colágenos Fibrilares/genética , Efeito Fundador , Luxação do Quadril/genética , Escoliose/genética , Anormalidades Múltiplas/patologia , Adolescente , Animais , Desenvolvimento Ósseo , Criança , Pré-Escolar , Consanguinidade , Matriz Extracelular/metabolismo , Matriz Extracelular/patologia , Feminino , Colágenos Fibrilares/metabolismo , Frequência do Gene , Luxação do Quadril/patologia , Homozigoto , Humanos , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Mutação , Linhagem , Escoliose/patologia , Síndrome
13.
Diabetes ; 69(2): 249-258, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-31836692

RESUMO

Lipodystrophies are a group of disorders characterized by absence or loss of adipose tissue and abnormal fat distribution, commonly accompanied by metabolic dysregulation. Although considered rare disorders, their prevalence in the general population is not well understood. We aimed to evaluate the clinical and genetic prevalence of lipodystrophy disorders in a large clinical care cohort. We interrogated the electronic health record (EHR) information of >1.3 million adults from the Geisinger Health System for lipodystrophy diagnostic codes. We estimate a clinical prevalence of disease of 1 in 20,000 individuals. We performed genetic analyses in individuals with available genomic data to identify variants associated with inherited lipodystrophies and examined their EHR for comorbidities associated with lipodystrophy. We identified 16 individuals carrying the p.R482Q pathogenic variant in LMNA associated with Dunnigan familial partial lipodystrophy. Four had a clinical diagnosis of lipodystrophy, whereas the remaining had no documented clinical diagnosis despite having accompanying metabolic abnormalities. We observed a lipodystrophy-associated variant carrier frequency of 1 in 3,082 individuals in our cohort with substantial burden of metabolic dysregulation. We estimate a genetic prevalence of disease of ∼1 in 7,000 in the general population. Partial lipodystrophy is an underdiagnosed condition. and its prevalence, as defined molecularly, is higher than previously reported. Genetically guided stratification of patients with common metabolic disorders, like diabetes and dyslipidemia, is an important step toward precision medicine.


Assuntos
Registros Eletrônicos de Saúde , Lipodistrofia/epidemiologia , Lipodistrofia/genética , Vigilância da População , Adulto , Feminino , Predisposição Genética para Doença , Genômica , Humanos , Masculino , Estados Unidos/epidemiologia
14.
Am J Hum Genet ; 102(5): 874-889, 2018 05 03.
Artigo em Inglês | MEDLINE | ID: mdl-29727688

RESUMO

Large-scale human genetics studies are ascertaining increasing proportions of populations as they continue growing in both number and scale. As a result, the amount of cryptic relatedness within these study cohorts is growing rapidly and has significant implications on downstream analyses. We demonstrate this growth empirically among the first 92,455 exomes from the DiscovEHR cohort and, via a custom simulation framework we developed called SimProgeny, show that these measures are in line with expectations given the underlying population and ascertainment approach. For example, within DiscovEHR we identified ∼66,000 close (first- and second-degree) relationships, involving 55.6% of study participants. Our simulation results project that >70% of the cohort will be involved in these close relationships, given that DiscovEHR scales to 250,000 recruited individuals. We reconstructed 12,574 pedigrees by using these relationships (including 2,192 nuclear families) and leveraged them for multiple applications. The pedigrees substantially improved the phasing accuracy of 20,947 rare, deleterious compound heterozygous mutations. Reconstructed nuclear families were critical for identifying 3,415 de novo mutations in ∼1,783 genes. Finally, we demonstrate the segregation of known and suspected disease-causing mutations, including a tandem duplication that occurs in LDLR and causes familial hypercholesterolemia, through reconstructed pedigrees. In summary, this work highlights the prevalence of cryptic relatedness expected among large healthcare population-genomic studies and demonstrates several analyses that are uniquely enabled by large amounts of cryptic relatedness.


Assuntos
Exoma/genética , Medicina de Precisão , Estudos de Coortes , Simulação por Computador , Registros Eletrônicos de Saúde , Éxons/genética , Família , Feminino , Genética Populacional , Geografia , Heterozigoto , Humanos , Masculino , Mutação/genética , Linhagem , Fenótipo , Reprodutibilidade dos Testes
15.
Science ; 354(6319)2016 Dec 23.
Artigo em Inglês | MEDLINE | ID: mdl-28008009

RESUMO

The DiscovEHR collaboration between the Regeneron Genetics Center and Geisinger Health System couples high-throughput sequencing to an integrated health care system using longitudinal electronic health records (EHRs). We sequenced the exomes of 50,726 adult participants in the DiscovEHR study to identify ~4.2 million rare single-nucleotide variants and insertion/deletion events, of which ~176,000 are predicted to result in a loss of gene function. Linking these data to EHR-derived clinical phenotypes, we find clinical associations supporting therapeutic targets, including genes encoding drug targets for lipid lowering, and identify previously unidentified rare alleles associated with lipid levels and other blood level traits. About 3.5% of individuals harbor deleterious variants in 76 clinically actionable genes. The DiscovEHR data set provides a blueprint for large-scale precision medicine initiatives and genomics-guided therapeutic discovery.


Assuntos
Prestação Integrada de Cuidados de Saúde , Doença/genética , Registros Eletrônicos de Saúde , Exoma/genética , Sequenciamento de Nucleotídeos em Larga Escala , Adulto , Desenho de Fármacos , Frequência do Gene , Genômica , Humanos , Hipolipemiantes/farmacologia , Mutação INDEL , Lipídeos/sangue , Terapia de Alvo Molecular , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
16.
Am J Hum Genet ; 99(1): 154-62, 2016 07 07.
Artigo em Inglês | MEDLINE | ID: mdl-27374771

RESUMO

Accurate estimation of shared ancestry is an important component of many genetic studies; current prediction tools accurately estimate pairwise genetic relationships up to the ninth degree. Pedigree-aware distant-relationship estimation (PADRE) combines relationship likelihoods generated by estimation of recent shared ancestry (ERSA) with likelihoods from family networks reconstructed by pedigree reconstruction and identification of a maximum unrelated set (PRIMUS), improving the power to detect distant relationships between pedigrees. Using PADRE, we estimated relationships from simulated pedigrees and three extended pedigrees, correctly predicting 20% more fourth- through ninth-degree simulated relationships than when using ERSA alone. By leveraging pedigree information, PADRE can even identify genealogical relationships between individuals who are genetically unrelated. For example, although 95% of 13(th)-degree relatives are genetically unrelated, in simulations, PADRE correctly predicted 50% of 13(th)-degree relationships to within one degree of relatedness. The improvement in prediction accuracy was consistent between simulated and actual pedigrees. We also applied PADRE to the HapMap3 CEU samples and report new cryptic relationships and validation of previously described relationships between families. PADRE greatly expands the range of relationships that can be estimated by using genetic data in pedigrees.


Assuntos
Algoritmos , Haplótipos/genética , Linhagem , Feminino , Humanos , Masculino , Modelos Genéticos , Reprodutibilidade dos Testes
17.
Bioinformatics ; 32(4): 596-8, 2016 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-26515822

RESUMO

UNLABELLED: PRIMUS is a pedigree reconstruction algorithm that uses estimates of genome-wide identity by descent to reconstruct pedigrees consistent with observed genetic data. However, when genetic data for individuals within a pedigree are missing, often multiple pedigrees can be reconstructed that fit the data. We report a major expansion of PRIMUS that uses mitochondrial (mtDNA) and non-recombining Y chromosome (NRY) haplotypes to eliminate many pedigree structures that are inconsistent with the genetic data. We demonstrate that discordances in mtDNA and NRY haplotypes substantially reduce the number of potential pedigrees, and often lead to the identification of the correct pedigree. AVAILABILITY AND IMPLEMENTATION: We have implemented PRIMUS updates in PERL and it is available at primus.gs.washington.edu.


Assuntos
Cromossomos Humanos Y/genética , DNA Mitocondrial/genética , Haplótipos/genética , Software , Algoritmos , Simulação por Computador , Genética Populacional , Humanos , Desequilíbrio de Ligação , Linhagem
18.
Genome Med ; 7(1): 54, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26195989

RESUMO

BACKGROUND: Besides its growing importance in clinical diagnostics and understanding the genetic basis of Mendelian and complex diseases, whole exome sequencing (WES) is a rich source of additional information of potential clinical utility for physicians, patients and their families. We analyzed the frequency and nature of single nucleotide variants (SNVs) considered secondary findings and recessive disease allele carrier status in the exomes of 8554 individuals from a large, randomly sampled cohort study and 2514 patients from a study of presumed Mendelian disease having undergone WES. METHODS: We used the same sequencing platform and data processing pipeline to analyze all samples and characterized the distributions of reported pathogenic (ClinVar, Human Gene Mutation Database (HGMD)) and predicted deleterious variants in the pre-specified American College of Medical Genetics and Genomics (ACMG) secondary findings and recessive disease genes in different ethnic groups. RESULTS: In the 56 ACMG secondary findings genes, the average number of predicted deleterious variants per individual was 0.74, and the mean number of ClinVar reported pathogenic variants was 0.06. We observed an average of 10 deleterious and 0.78 ClinVar reported pathogenic variants per individual in 1423 autosomal recessive disease genes. By repeatedly sampling pairs of exomes, 0.5 % of the randomly generated couples were at 25 % risk of having an affected offspring for an autosomal recessive disorder based on the ClinVar variants. CONCLUSIONS: By investigating reported pathogenic and novel, predicted deleterious variants we estimated the lower and upper limits of the population fraction for which exome sequencing may reveal additional medically relevant information. We suggest that the observed wide range for the lower and upper limits of these frequency numbers will be gradually reduced due to improvement in classification databases and prediction algorithms.

19.
Am J Hum Genet ; 95(5): 553-64, 2014 Nov 06.
Artigo em Inglês | MEDLINE | ID: mdl-25439724

RESUMO

Understanding and correctly utilizing relatedness among samples is essential for genetic analysis; however, managing sample records and pedigrees can often be error prone and incomplete. Data sets ascertained by random sampling often harbor cryptic relatedness that can be leveraged in genetic analyses for maximizing power. We have developed a method that uses genome-wide estimates of pairwise identity by descent to identify families and quickly reconstruct and score all possible pedigrees that fit the genetic data by using up to third-degree relatives, and we have included it in the software package PRIMUS (Pedigree Reconstruction and Identification of the Maximally Unrelated Set). Here, we validate its performance on simulated, clinical, and HapMap pedigrees. Among these samples, we demonstrate that PRIMUS can verify reported pedigree structures and identify cryptic relationships. Finally, we show that PRIMUS reconstructed pedigrees, all of which were previously unknown, for 203 families from a cohort collected in Starr County, TX (1,890 samples).


Assuntos
Genética Populacional/métodos , Linhagem , Doença Pulmonar Obstrutiva Crônica/genética , Software , Sequência de Bases , Simulação por Computador , Exoma/genética , Frequência do Gene , Humanos , Dados de Sequência Molecular , Análise de Sequência de DNA , Texas
20.
Genet Epidemiol ; 37(2): 136-41, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-22996348

RESUMO

Many statistical analyses of genetic data rely on the assumption of independence among samples. Consequently, relatedness is either modeled in the analysis or samples are removed to "clean" the data of any pairwise relatedness above a tolerated threshold. Current methods do not maximize the number of unrelated individuals retained for further analysis, and this is a needless loss of resources. We report a novel application of graph theory that identifies the maximum set of unrelated samples in any dataset given a user-defined threshold of relatedness as well as all networks of related samples. We have implemented this method into an open source program called Pedigree Reconstruction and Identification of a Maximum Unrelated Set, PRIMUS. We show that PRIMUS outperforms the three existing methods, allowing researchers to retain up to 50% more unrelated samples. A unique strength of PRIMUS is its ability to weight the maximum clique selection using additional criteria (e.g. affected status and data missingness). PRIMUS is a permanent solution to identifying the maximum number of unrelated samples for a genetic analysis.


Assuntos
Estudo de Associação Genômica Ampla , Modelos Teóricos , Algoritmos , Projeto HapMap , Humanos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA