Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 49
Filter
1.
Genome Res ; 29(10): 1567-1577, 2019 10.
Article in English | MEDLINE | ID: mdl-31575651

ABSTRACT

Germline mutation rates in humans have been estimated for a variety of mutation types, including single-nucleotide and large structural variants. Here, we directly measure the germline retrotransposition rate for the three active retrotransposon elements: L1, Alu, and SVA. We used three tools for calling mobile element insertions (MEIs) (MELT, RUFUS, and TranSurVeyor) on blood-derived whole-genome sequence (WGS) data from 599 CEPH individuals, comprising 33 three-generation pedigrees. We identified 26 de novo MEIs in 437 births. The retrotransposition rate estimates for Alu elements, one in 40 births, is roughly half the rate estimated using phylogenetic analyses, a difference in magnitude similar to that observed for single-nucleotide variants. The L1 retrotransposition rate is one in 63 births and is within range of previous estimates (1:20-1:200 births). The SVA retrotransposition rate, one in 63 births, is much higher than the previous estimate of one in 900 births. Our large, three-generation pedigrees allowed us to assess parent-of-origin effects and the timing of insertion events in either gametogenesis or early embryonic development. We find a statistically significant paternal bias in Alu retrotransposition. Our study represents the first in-depth analysis of the rate and dynamics of human retrotransposition from WGS data in three-generation human pedigrees.


Subject(s)
Interspersed Repetitive Sequences/genetics , Phylogeny , Retroelements/genetics , Whole Genome Sequencing , Alu Elements/genetics , Animals , Female , Hominidae/blood , Hominidae/genetics , Humans , Long Interspersed Nucleotide Elements/genetics , Male , Mutation , Pedigree , Polymorphism, Single Nucleotide/genetics
2.
Nucleic Acids Res ; 48(6): e36, 2020 04 06.
Article in English | MEDLINE | ID: mdl-32067044

ABSTRACT

Alu retrotransposons account for more than 10% of the human genome, and insertions of these elements create structural variants segregating in human populations. Such polymorphic Alus are powerful markers to understand population structure, and they represent variants that can greatly impact genome function, including gene expression. Accurate genotyping of Alus and other mobile elements has been challenging. Indeed, we found that Alu genotypes previously called for the 1000 Genomes Project are sometimes erroneous, which poses significant problems for phasing these insertions with other variants that comprise the haplotype. To ameliorate this issue, we introduce a new pipeline - TypeTE - which genotypes Alu insertions from whole-genome sequencing data. Starting from a list of polymorphic Alus, TypeTE identifies the hallmarks (poly-A tail and target site duplication) and orientation of Alu insertions using local re-assembly to reconstruct presence and absence alleles. Genotype likelihoods are then computed after re-mapping sequencing reads to the reconstructed alleles. Using a high-quality set of PCR-based genotyping of >200 loci, we show that TypeTE improves genotype accuracy from 83% to 92% in the 1000 Genomes dataset. TypeTE can be readily adapted to other retrotransposon families and brings a valuable toolbox addition for population genomics.


Subject(s)
Interspersed Repetitive Sequences/genetics , Mutagenesis, Insertional/genetics , Software , Whole Genome Sequencing/methods , Databases, Genetic , Gene Frequency/genetics , Genetic Loci , Genetics, Population , Genome, Human , Genotype , Humans
3.
Proc Natl Acad Sci U S A ; 112(45): 13833-8, 2015 Nov 10.
Article in English | MEDLINE | ID: mdl-26504230

ABSTRACT

Pleistocene residential sites with multiple contemporaneous human burials are extremely rare in the Americas. We report mitochondrial genomic variation in the first multiple mitochondrial genomes from a single prehistoric population: two infant burials (USR1 and USR2) from a common interment at the Upward Sun River Site in central Alaska dating to ∼11,500 cal B.P. Using a targeted capture method and next-generation sequencing, we determined that the USR1 infant possessed variants that define mitochondrial lineage C1b, whereas the USR2 genome falls at the root of lineage B2, allowing us to refine younger coalescence age estimates for these two clades. C1b and B2 are rare to absent in modern populations of northern North America. Documentation of these lineages at this location in the Late Pleistocene provides evidence for the extent of mitochondrial diversity in early Beringian populations, which supports the expectations of the Beringian Standstill Model.


Subject(s)
DNA, Mitochondrial/genetics , Genetic Variation , Haplotypes/genetics , Human Migration/history , Models, Theoretical , Phylogeny , Alaska , Archaeology/methods , Base Sequence , Bayes Theorem , Burial/history , Evolution, Molecular , Geography , High-Throughput Nucleotide Sequencing , History, Ancient , Humans , Infant , Likelihood Functions , Models, Genetic , Molecular Sequence Data , Oligonucleotides/genetics
4.
BMC Genomics ; 18(1): 396, 2017 05 22.
Article in English | MEDLINE | ID: mdl-28532386

ABSTRACT

BACKGROUND: The cost of Whole Genome Sequencing (WGS) has decreased tremendously in recent years due to advances in next-generation sequencing technologies. Nevertheless, the cost of carrying out large-scale cohort studies using WGS is still daunting. Past simulation studies with coverage at ~2x have shown promise for using low coverage WGS in studies focused on variant discovery, association study replications, and population genomics characterization. However, the performance of low coverage WGS in populations with a complex history and no reference panel remains to be determined. RESULTS: South Indian populations are known to have a complex population structure and are an example of a major population group that lacks adequate reference panels. To test the performance of extremely low-coverage WGS (EXL-WGS) in populations with a complex history and to provide a reference resource for South Indian populations, we performed EXL-WGS on 185 South Indian individuals from eight populations to ~1.6x coverage. Using two variant discovery pipelines, SNPTools and GATK, we generated a consensus call set that has ~90% sensitivity for identifying common variants (minor allele frequency ≥ 10%). Imputation further improves the sensitivity of our call set. In addition, we obtained high-coverage for the whole mitochondrial genome to infer the maternal lineage evolutionary history of the Indian samples. CONCLUSIONS: Overall, we demonstrate that EXL-WGS with imputation can be a valuable study design for variant discovery with a dramatically lower cost than standard WGS, even in populations with a complex history and without available reference data. In addition, the South Indian EXL-WGS data generated in this study will provide a valuable resource for future Indian genomic studies.


Subject(s)
Asian People/genetics , Metagenomics , Whole Genome Sequencing , Genetic Variation , Genome, Mitochondrial/genetics , Humans
5.
Genome Res ; 23(7): 1170-81, 2013 Jul.
Article in English | MEDLINE | ID: mdl-23599355

ABSTRACT

Alu retrotransposons are the most numerous and active mobile elements in humans, causing genetic disease and creating genomic diversity. Mobile element scanning (ME-Scan) enables comprehensive and affordable identification of mobile element insertions (MEI) using targeted high-throughput sequencing of multiplexed MEI junction libraries. In a single experiment, ME-Scan identifies nearly all AluYb8 and AluYb9 elements, with high sensitivity for both rare and common insertions, in 169 individuals of diverse ancestry. ME-Scan detects heterozygous insertions in single individuals with 91% sensitivity. Insertion presence or absence states determined by ME-Scan are 95% concordant with those determined by locus-specific PCR assays. By sampling diverse populations from Africa, South Asia, and Europe, we are able to identify 5799 Alu insertions, including 2524 novel ones, some of which occur in exons. Sub-Saharan populations and a Pygmy group in particular carry numerous intermediate-frequency Alu insertions that are absent in non-African groups. There is a significant dearth of exon-interrupting insertions among common Alu polymorphisms, but the density of singleton Alu insertions is constant across exonic and nonexonic regions. In one case, a validated novel singleton Alu interrupts a protein-coding exon of FAM187B. This implies that exonic Alu insertions are generally deleterious and thus eliminated by natural selection, but not so quickly that they cannot be observed as extremely rare variants.


Subject(s)
Alu Elements , Genome, Human , High-Throughput Nucleotide Sequencing , Mutagenesis, Insertional , Retroelements , DNA Replication , Exons , Genetic Loci , High-Throughput Nucleotide Sequencing/methods , Humans , Polymorphism, Genetic , Population Groups/genetics , Reproducibility of Results , Sensitivity and Specificity , Transcription, Genetic
6.
PLoS Genet ; 9(7): e1003634, 2013.
Article in English | MEDLINE | ID: mdl-23874230

ABSTRACT

Deedu (DU) Mongolians, who migrated from the Mongolian steppes to the Qinghai-Tibetan Plateau approximately 500 years ago, are challenged by environmental conditions similar to native Tibetan highlanders. Identification of adaptive genetic factors in this population could provide insight into coordinated physiological responses to this environment. Here we examine genomic and phenotypic variation in this unique population and present the first complete analysis of a Mongolian whole-genome sequence. High-density SNP array data demonstrate that DU Mongolians share genetic ancestry with other Mongolian as well as Tibetan populations, specifically in genomic regions related with adaptation to high altitude. Several selection candidate genes identified in DU Mongolians are shared with other Asian groups (e.g., EDAR), neighboring Tibetan populations (including high-altitude candidates EPAS1, PKLR, and CYP2E1), as well as genes previously hypothesized to be associated with metabolic adaptation (e.g., PPARG). Hemoglobin concentration, a trait associated with high-altitude adaptation in Tibetans, is at an intermediate level in DU Mongolians compared to Tibetans and Han Chinese at comparable altitude. Whole-genome sequence from a DU Mongolian (Tianjiao1) shows that about 2% of the genomic variants, including more than 300 protein-coding changes, are specific to this individual. Our analyses of DU Mongolians and the first Mongolian genome provide valuable insight into genetic adaptation to extreme environments.


Subject(s)
Adaptation, Physiological/genetics , Altitude Sickness/genetics , Genome, Human , Selection, Genetic , Acclimatization/genetics , Acclimatization/physiology , Alleles , Altitude , Altitude Sickness/pathology , Asian People/genetics , Gene Frequency , Genetics, Population , Genome-Wide Association Study , Humans , Mongolia , Phenotype , Polymorphism, Single Nucleotide , Sequence Analysis, DNA
7.
Genome Res ; 21(5): 768-74, 2011 May.
Article in English | MEDLINE | ID: mdl-21324875

ABSTRACT

Accurate estimation of recent shared ancestry is important for genetics, evolution, medicine, conservation biology, and forensics. Established methods estimate kinship accurately for first-degree through third-degree relatives. We demonstrate that chromosomal segments shared by two individuals due to identity by descent (IBD) provide much additional information about shared ancestry. We developed a maximum-likelihood method for the estimation of recent shared ancestry (ERSA) from the number and lengths of IBD segments derived from high-density SNP or whole-genome sequence data. We used ERSA to estimate relationships from SNP genotypes in 169 individuals from three large, well-defined human pedigrees. ERSA is accurate to within one degree of relationship for 97% of first-degree through fifth-degree relatives and 80% of sixth-degree and seventh-degree relatives. We demonstrate that ERSA's statistical power approaches the maximum theoretical limit imposed by the fact that distant relatives frequently share no DNA through a common ancestor. ERSA greatly expands the range of relationships that can be estimated from genetic data and is implemented in a freely available software package.


Subject(s)
Inheritance Patterns/genetics , Likelihood Functions , Models, Genetic , Models, Statistical , Pedigree , Chromosome Mapping , DNA/genetics , Family , Genetic Linkage , Humans , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , Principal Component Analysis , Software
8.
Am J Obstet Gynecol ; 210(4): 321.e1-321.e21, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24594138

ABSTRACT

OBJECTIVE: We hypothesized that genetic variation affects responsiveness to 17-alpha hydroxyprogesterone caproate (17P) for recurrent preterm birth prevention. STUDY DESIGN: Women of European ancestry with ≥1 spontaneous singleton preterm birth at <34 weeks' gestation who received 17P were recruited prospectively and classified as a 17P responder or nonresponder by the difference in delivery gestational age between 17P-treated and -untreated pregnancies. Samples underwent whole exome sequencing. Coding variants were compared between responders and nonresponders with the use of the Variant Annotation, Analysis, and Search Tool (VAAST), which is a probabilistic search tool for the identification of disease-causing variants, and were compared with a Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway candidate gene list. Genes with the highest VAAST scores were then classified by the online Protein ANalysis THrough Evolutionary Relationships (PANTHER) system into known gene ontology molecular functions and biologic processes. Gene distributions within these classifications were compared with an online reference population to identify over- and under- represented gene sets. RESULTS: Fifty women (9 nonresponders) were included. Responders delivered 9.2 weeks longer with 17P vs 1.3 weeks' gestation for nonresponders (P < .001). A genome-wide search for genetic differences implicated the NOS1 gene to be the most likely associated gene from among genes on the KEGG candidate gene list (P < .00095). PANTHER analysis revealed several over-represented gene ontology categories that included cell adhesion, cell communication, signal transduction, nitric oxide signal transduction, and receptor activity (all with significant Bonferroni-corrected probability values). CONCLUSION: We identified sets of over-represented genes in key processes among responders to 17P, which is the first step in the application of pharmacogenomics to preterm birth prevention.


Subject(s)
Estrogen Antagonists/administration & dosage , Hydroxyprogesterones/administration & dosage , Premature Birth/prevention & control , 17 alpha-Hydroxyprogesterone Caproate , Case-Control Studies , Exome , Female , Genetic Variation , Humans , Nitric Oxide Synthase/genetics , Nitric Oxide Synthase Type I/genetics , Pharmacogenetics , Pregnancy , Prospective Studies , Secondary Prevention , Sequence Analysis, DNA/methods
9.
medRxiv ; 2024 May 05.
Article in English | MEDLINE | ID: mdl-38746151

ABSTRACT

While genome sequencing has transformed medicine by elucidating the genetic underpinnings of both rare and common complex disorders, its utility to predict clinical outcomes remains understudied. Here, we used artificial intelligence (AI) technologies to explore the predictive value of genome sequencing in forecasting clinical outcomes following surgery for congenital heart defects (CHD). We report results for a cohort of 2,253 CHD patients from the Pediatric Cardiac Genomics Consortium with a broad range of complex heart defects, pre- and post-operative clinical variables and exome sequencing. Damaging genotypes in chromatin-modifying and cilia-related genes were associated with an elevated risk of adverse post-operative outcomes, including mortality, cardiac arrest and prolonged mechanical ventilation. The impact of damaging genotypes was further amplified in the context of specific CHD phenotypes, surgical complexity and extra-cardiac anomalies. The absence of a damaging genotype in chromatin-modifying and cilia-related genes was also informative, reducing the risk for adverse postoperative outcomes. Thus, genome sequencing enriches the ability to forecast outcomes following congenital cardiac surgery.

10.
bioRxiv ; 2024 Aug 05.
Article in English | MEDLINE | ID: mdl-39149261

ABSTRACT

Using five complementary short- and long-read sequencing technologies, we phased and assembled >95% of each diploid human genome in a four-generation, 28-member family (CEPH 1463) allowing us to systematically assess de novo mutations (DNMs) and recombination. From this family, we estimate an average of 192 DNMs per generation, including 75.5 de novo single-nucleotide variants (SNVs), 7.4 non-tandem repeat indels, 79.6 de novo indels or structural variants (SVs) originating from tandem repeats, 7.7 centromeric de novo SVs and SNVs, and 12.4 de novo Y chromosome events per generation. STRs and VNTRs are the most mutable with 32 loci exhibiting recurrent mutation through the generations. We accurately assemble 288 centromeres and six Y chromosomes across the generations, documenting de novo SVs, and demonstrate that the DNM rate varies by an order of magnitude depending on repeat content, length, and sequence identity. We show a strong paternal bias (75-81%) for all forms of germline DNM, yet we estimate that 17% of de novo SNVs are postzygotic in origin with no paternal bias. We place all this variation in the context of a high-resolution recombination map (~3.5 kbp breakpoint resolution). We observe a strong maternal recombination bias (1.36 maternal:paternal ratio) with a consistent reduction in the number of crossovers with increasing paternal (r=0.85) and maternal (r=0.65) age. However, we observe no correlation between meiotic crossover locations and de novo SVs, arguing against non-allelic homologous recombination as a predominant mechanism. The use of multiple orthogonal technologies, near-telomere-to-telomere phased genome assemblies, and a multi-generation family to assess transmission has created the most comprehensive, publicly available "truth set" of all classes of genomic variants. The resource can be used to test and benchmark new algorithms and technologies to understand the most fundamental processes underlying human genetic variation.

11.
Mol Biol Evol ; 29(1): 101-11, 2012 Jan.
Article in English | MEDLINE | ID: mdl-21816865

ABSTRACT

Inflammatory bowel disease 5 (IBD5) is a 250 kb haplotype on chromosome 5 that is associated with an increased risk of Crohn's disease in Europeans. The OCTN1 gene is centrally located on IBD5 and encodes a transporter of the antioxidant ergothioneine (ET). The 503F variant of OCTN1 is strongly associated with IBD5 and is a gain-of-function mutation that increases absorption of ET. Although 503F has been implicated as the variant potentially responsible for Crohn's disease susceptibility at IBD5, there is little evidence beyond statistical association to support its role in disease causation. We hypothesize that 503F is a recent adaptation in Europeans that swept to relatively high frequency and that disease association at IBD5 results not from 503F itself, but from one or more nearby hitchhiking variants, in the genes IRF1 or IL5. To test for evidence of recent positive selection on the 503F allele, we employed the iHS statistic, which was significant in the European CEU HapMap population (P=0.0007) and European Human Genome Diversity Panel populations (P≤0.01). To evaluate the hypothesis of disease-variant hitchhiking, we performed haplotype association tests on high-density microarray data in a sample of 1,868 Crohn's disease cases and 5,550 controls. We found that 503F haplotypes with recombination breakpoints between OCTN1 and IRF1 or IL5 were not associated with disease (odds ratio [OR]: 1.05, P=0.21). In contrast, we observed strong disease association for 503F haplotypes with no recombination between these three genes (OR: 1.24, P=2.6×10(-8)), as expected if the sweeping haplotype harbored one or more disease-causing mutations in IRF1 or IL5. To further evaluate these disease-gene candidates, we obtained expression data from lower gastrointestinal biopsies of healthy individuals and Crohn's disease patients. We observed a 72% increase in gene expression of IRF1 among Crohn's disease patients (P=0.0006) and no significant difference in expression of OCTN1. Collectively, these data indicate that the 503F variant has increased in frequency due to recent positive selection and that disease-causing variants in linkage disequilibrium with 503F have hitchhiked to relatively high frequency, thus forming the IBD5 risk haplotype. Finally, our association results and expression data support IRF1 as a strong candidate for Crohn's disease causation.


Subject(s)
Crohn Disease/genetics , Organic Cation Transport Proteins/genetics , Case-Control Studies , Colon/metabolism , Computer Simulation , Gene Frequency , Haplotypes , Humans , Interferon Regulatory Factor-1/genetics , Linkage Disequilibrium , Oligonucleotide Array Sequence Analysis , Polymorphism, Single Nucleotide , RNA, Messenger/analysis , Selection, Genetic , Symporters , White People/genetics
12.
BMC Genet ; 14: 30, 2013 Apr 25.
Article in English | MEDLINE | ID: mdl-23617681

ABSTRACT

BACKGROUND: Because of the role of inflammation in preterm birth (PTB), polymorphisms in and near the interleukin-6 gene (IL6) have been association study targets. Several previous studies have assessed the association between PTB and a single nucleotide polymorphism (SNP), rs1800795, located in the IL6 gene promoter region. Their results have been inconsistent and SNP frequencies have varied strikingly among different populations. We therefore conducted a meta-analysis with subgroup analysis by population strata to: (1) reduce the confounding effect of population structure, (2) increase sample size and statistical power, and (3) elucidate the association between rs1800975 and PTB. RESULTS: We reviewed all published papers for PTB phenotype and SNP rs1800795 genotype. Maternal genotype and fetal genotype were analyzed separately and the analyses were stratified by population. The PTB phenotype was defined as gestational age (GA) < 37 weeks, but results from earlier GA were selected when available. All studies were compared by genotype (CC versus CG+GG), based on functional studies.For the maternal genotype analysis, 1,165 PTBs and 3,830 term controls were evaluated. Populations were stratified into women of European descent (for whom the most data were available) and women of heterogeneous origin or admixed populations. All ancestry was self-reported. Women of European descent had a summary odds ratio (OR) of 0.68, (95% confidence interval (CI) 0.51 - 0.91), indicating that the CC genotype is protective against PTB. The result for non-European women was not statistically significant (OR 1.01, 95% CI 0.59 - 1.75). For the fetal genotype analysis, four studies were included; there was no significant association with PTB (OR 0.98, 95% CI 0.72 - 1.33). Sensitivity analysis showed that preterm premature rupture of membrane (PPROM) may be a confounding factor contributing to phenotype heterogeneity. CONCLUSIONS: IL6 SNP rs1800795 genotype CC is protective against PTB in women of European descent. It is not significant in other heterogeneous or admixed populations, or in fetal genotype analysis.Population structure is an important confounding factor that should be controlled for in studies of PTB.


Subject(s)
Interleukin-6/genetics , Polymorphism, Single Nucleotide , Premature Birth/genetics , Female , Humans , Premature Birth/epidemiology , Promoter Regions, Genetic , White People/genetics
13.
BMC Genet ; 13: 39, 2012 May 20.
Article in English | MEDLINE | ID: mdl-22606979

ABSTRACT

BACKGROUND: Populations of the Americas were founded by early migrants from Asia, and some have experienced recent genetic admixture. To better characterize the native and non-native ancestry components in populations from the Americas, we analyzed 815,377 autosomal SNPs, mitochondrial hypervariable segments I and II, and 36 Y-chromosome STRs from 24 Mesoamerican Totonacs and 23 South American Bolivians. RESULTS AND CONCLUSIONS: We analyzed common genomic regions from native Bolivian and Totonac populations to identify 324 highly predictive Native American ancestry informative markers (AIMs). As few as 40-50 of these AIMs perform nearly as well as large panels of random genome-wide SNPs for predicting and estimating Native American ancestry and admixture levels. These AIMs have greater New World vs. Old World specificity than previous AIMs sets. We identify highly-divergent New World SNPs that coincide with high-frequency haplotypes found at similar frequencies in all populations examined, including the HGDP Pima, Maya, Colombian, Karitiana, and Surui American populations. Some of these regions are potential candidates for positive selection. European admixture in the Bolivian sample is approximately 12%, though individual estimates range from 0-48%. We estimate that the admixture occurred ~360-384 years ago. Little evidence of European or African admixture was found in Totonac individuals. Bolivians with pre-Columbian mtDNA and Y-chromosome haplogroups had 5-30% autosomal European ancestry, demonstrating the limitations of Y-chromosome and mtDNA haplogroups and the need for autosomal ancestry informative markers for assessing ancestry in admixed populations.


Subject(s)
American Indian or Alaska Native/genetics , Bolivia/ethnology , DNA, Mitochondrial , Emigration and Immigration , Genetics, Population , Humans , Mexico/ethnology , Phylogeography , Polymorphism, Single Nucleotide , Selection, Genetic
14.
Genome Biol ; 23(1): 253, 2022 12 12.
Article in English | MEDLINE | ID: mdl-36510265

ABSTRACT

BACKGROUND: Short tandem repeats (STRs) compose approximately 3% of the genome, and mutations at STR loci have been linked to dozens of human diseases including amyotrophic lateral sclerosis, Friedreich ataxia, Huntington disease, and fragile X syndrome. Improving our understanding of these mutations would increase our knowledge of the mutational dynamics of the genome and may uncover additional loci that contribute to disease. To estimate the genome-wide pattern of mutations at STR loci, we analyze blood-derived whole-genome sequencing data for 544 individuals from 29 three-generation CEPH pedigrees. These pedigrees contain both sets of grandparents, the parents, and an average of 9 grandchildren per family. RESULTS: We use HipSTR to identify de novo STR mutations in the 2nd generation of these pedigrees and require transmission to the third generation for validation. Analyzing approximately 1.6 million STR loci, we estimate the empirical de novo STR mutation rate to be 5.24 × 10-5 mutations per locus per generation. Perfect repeats mutate about 2 × more often than imperfect repeats. De novo STRs are significantly enriched in Alu elements. CONCLUSIONS: Approximately 30% of new STR mutations occur within Alu elements, which compose only 11% of the genome, but only 10% are found in LINE-1 insertions, which compose 17% of the genome. Phasing these mutations to the parent of origin shows that parental transmission biases vary among families. We estimate the average number of de novo genome-wide STR mutations per individual to be approximately 85, which is similar to the average number of observed de novo single nucleotide variants.


Subject(s)
Extended Family , Microsatellite Repeats , Humans , Mutation , Pedigree , Genome
15.
Proc Natl Acad Sci U S A ; 105(29): 10149-54, 2008 Jul 22.
Article in English | MEDLINE | ID: mdl-18626011

ABSTRACT

The ETS gene family is frequently involved in chromosome translocations that cause human cancer, including prostate cancer, leukemia, and sarcoma. However, the mechanisms by which oncogenic ETS proteins, which are DNA-binding transcription factors, target genes necessary for tumorigenesis is not well understood. Ewing's sarcoma serves as a paradigm for the entire class of ETS-associated tumors because nearly all cases harbor recurrent chromosomal translocations involving ETS genes. The most common translocation in Ewing's sarcoma encodes the EWS/FLI oncogenic transcription factor. We used whole genome localization (ChIP-chip) to identify target genes that are directly bound by EWS/FLI. Analysis of the promoters of these genes demonstrated a significant over-representation of highly repetitive GGAA-containing elements (microsatellites). In a parallel approach, we found that EWS/FLI uses GGAA microsatellites to regulate the expression of some of its target genes including NR0B1, a gene required for Ewing's sarcoma oncogenesis. The microsatellite in the NR0B1 promoter bound EWS/FLI in vitro and in vivo and was both necessary and sufficient to confer EWS/FLI regulation to a reporter gene. Genome wide computational studies demonstrated that GGAA microsatellites were enriched close to EWS/FLI-up-regulated genes but not down-regulated genes. Mechanistic studies demonstrated that the ability of EWS/FLI to bind DNA and modulate gene expression through these repetitive elements depended on the number of consecutive GGAA motifs. These findings illustrate an unprecedented route to specificity for ETS proteins and use of microsatellites in tumorigenesis.


Subject(s)
Microsatellite Repeats , Oncogene Proteins, Fusion/genetics , Response Elements , Sarcoma, Ewing/genetics , Transcription Factors/genetics , Base Sequence , Cell Line, Tumor , Chromatin Immunoprecipitation , DNA, Neoplasm/genetics , Humans , Molecular Sequence Data , Proto-Oncogene Protein c-fli-1 , RNA-Binding Protein EWS , Transfection
16.
Genomics ; 96(4): 199-210, 2010 Oct.
Article in English | MEDLINE | ID: mdl-20643205

ABSTRACT

High-throughput genotyping data are useful for making inferences about human evolutionary history. However, the populations sampled to date are unevenly distributed, and some areas (e.g., South and Central Asia) have rarely been sampled in large-scale studies. To assess human genetic variation more evenly, we sampled 296 individuals from 13 worldwide populations that are not covered by previous studies. By combining these samples with a data set from our laboratory and the HapMap II samples, we assembled a final dataset of ~250,000 SNPs in 850 individuals from 40 populations. With more uniform sampling, the estimate of global genetic differentiation (F(ST)) substantially decreases from ~16% with the HapMap II samples to ~11%. A panel of copy number variations typed in the same populations shows patterns of diversity similar to the SNP data, with highest diversity in African populations. This unique sample collection also permits new inferences about human evolutionary history. The comparison of haplotype variation among populations supports a single out-of-Africa migration event and suggests that the founding population of Eurasia may have been relatively large but isolated from Africans for a period of time. We also found a substantial affinity between populations from central Asia (Kyrgyzstani and Mongolian Buryat) and America, suggesting a central Asian contribution to New World founder populations.


Subject(s)
Genetic Variation , Genetics, Medical/statistics & numerical data , Genetics, Medical/trends , Genetics, Population/methods , DNA Copy Number Variations , Demography/methods , Demography/statistics & numerical data , Demography/trends , Genetic Speciation , Genetic Variation/physiology , Genetics, Medical/methods , Genetics, Population/statistics & numerical data , Genetics, Population/trends , Genome-Wide Association Study/methods , Genome-Wide Association Study/statistics & numerical data , Genome-Wide Association Study/trends , Genotype , Geography , Haplotypes , Humans , Phylogeny , Polymorphism, Single Nucleotide , Sampling Studies
17.
Genes (Basel) ; 12(5)2021 04 27.
Article in English | MEDLINE | ID: mdl-33925651

ABSTRACT

There is strong evidence for a genetic contribution to non-syndromic congenital heart defects (CHDs). However, exome- and genome-wide studies conducted at the variant and gene-level have identified few genome-wide significant CHD-related genes. Gene-set analyses are a useful complement to such studies and candidate gene-set analyses of rare variants have provided insight into the genetics of CHDs. However, similar analyses have not been conducted using data on common genetic variants. Consequently, we conducted common variant analyses of 15 CHD candidate gene-sets, using data from two common types of CHDs: conotruncal heart defects (1431 cases) and left ventricular outflow tract defects (509 cases). After Bonferroni correction for evaluation of multiple gene-sets, the cytoskeletal gene-set was significantly associated with conotruncal heart defects (ßS = 0.09; 95% confidence interval (CI) 0.03-0.15). This association was stronger when analyses were restricted to the sub-set of cytoskeletal genes that have been observed to harbor rare damaging genotypes in at least two CHD cases (ßS = 0.32, 95% CI 0.08-0.56). These findings add to the evidence linking cytoskeletal genes to CHDs and suggest that, for cytoskeletal genes, common variation may contribute to the risk of CHDs.


Subject(s)
Cytoskeleton/genetics , Heart Defects, Congenital/genetics , Polymorphism, Single Nucleotide/genetics , Case-Control Studies , Genome, Human/genetics , Genotype , Humans , Risk Factors
18.
BMC Genomics ; 11: 410, 2010 Jun 30.
Article in English | MEDLINE | ID: mdl-20591181

ABSTRACT

BACKGROUND: Mobile elements (MEs) are diverse, common and dynamic inhabitants of nearly all genomes. ME transposition generates a steady stream of polymorphic genetic markers, deleterious and adaptive mutations, and substrates for further genomic rearrangements. Research on the impacts, population dynamics, and evolution of MEs is constrained by the difficulty of ascertaining rare polymorphic ME insertions that occur against a large background of pre-existing fixed elements and then genotyping them in many individuals. RESULTS: Here we present a novel method for identifying nearly all insertions of a ME subfamily in the whole genomes of multiple individuals and simultaneously genotyping (for presence or absence) those insertions that are variable in the population. We use ME-specific primers to construct DNA libraries that contain the junctions of all ME insertions of the subfamily, with their flanking genomic sequences, from many individuals. Individual-specific "index" sequences are designed into the oligonucleotide adapters used to construct the individual libraries. These libraries are then pooled and sequenced using a ME-specific sequencing primer. Mobile element insertion loci of the target subfamily are uniquely identified by their junction sequence, and all insertion junctions are linked to their individual libraries by the corresponding index sequence. To test this method's feasibility, we apply it to the human AluYb8 and AluYb9 subfamilies. In four individuals, we identified a total of 2,758 AluYb8 and AluYb9 insertions, including nearly all those that are present in the reference genome, as well as 487 that are not. Index counts show the sequenced products from each sample reflect the intended proportions to within 1%. At a sequencing depth of 355,000 paired reads per sample, the sensitivity and specificity of ME-Scan are both approximately 95%. CONCLUSIONS: Mobile Element Scanning (ME-Scan) is an efficient method for quickly genotyping mobile element insertions with very high sensitivity and specificity. In light of recent improvements to high-throughput sequencing technology, it should be possible to employ ME-Scan to genotype insertions of almost any mobile element family in many individuals from any species.


Subject(s)
Genome Components/genetics , Sequence Analysis, DNA/methods , Base Sequence , Genetic Loci/genetics , Genotype , Humans , Reproducibility of Results , Retroelements/genetics
19.
Ann Hum Genet ; 74(2): 184-8, 2010 Mar.
Article in English | MEDLINE | ID: mdl-20201939

ABSTRACT

Heart failure is a leading cause of death of people in South Asia, and cardiomyopathy is a major cause of heart failure. Myosin binding protein C (MYBPC3) is expressed in the heart muscle, where it regulates the cardiac response to adrenergic stimulation and is important for the structural integrity of the sarcomere. Mutations in the MYBPC3 gene are associated with hypertrophic or dilated cardiomyopathies. A 25-base-pair deletion in intron 32 causes skipping of the downstream exon and is associated with familial cardiomyopathy. To date, this deletion is found primarily in India and South Asia, although it is also found at low frequency in Southeast Asia. In order to better characterise the distribution of this variant, we determined its frequency in 447 individuals from 19 populations, including 10 populations from India and neighbouring populations from Pakistan and Nepal. The deletion frequency is over 8% in some of our Indian samples, and it is not present in any of the populations we sampled outside of India. The differences in the deletion frequencies among populations in India are consistent with patterns of variation previously reported and with patterns we observed among Indian populations based on high-density SNP chip data. Our results indicate that the MYBPC3 deletion is primarily found among Indian populations and that its distribution is consistent with genome-wide patterns of variation in India.


Subject(s)
Cardiomyopathies/genetics , Carrier Proteins/genetics , Gene Deletion , Cardiomyopathies/metabolism , Humans , India , Nepal , Pakistan
20.
Genome Biol Evol ; 12(6): 779-794, 2020 06 01.
Article in English | MEDLINE | ID: mdl-32359137

ABSTRACT

Ongoing retrotransposition of Alu, LINE-1, and SINE-VNTR-Alu elements generates diversity and variation among human populations. Previous analyses investigating the population genetics of mobile element insertions (MEIs) have been limited by population ascertainment bias or by relatively small numbers of populations and low sequencing coverage. Here, we use 296 individuals representing 142 global populations from the Simons Genome Diversity Project (SGDP) to discover and characterize MEI diversity from deeply sequenced whole-genome data. We report 5,742 MEIs not originally reported by the 1000 Genomes Project and show that high sampling diversity leads to a 4- to 7-fold increase in MEI discovery rates over the original 1000 Genomes Project data. As a result of negative selection, nonreference polymorphic MEIs are underrepresented within genes, and MEIs within genes are often found in the transcriptional orientation opposite that of the gene. Globally, 80% of Alu subfamilies predate the expansion of modern humans from Africa. Polymorphic MEIs show heterozygosity gradients that decrease from Africa to Eurasia to the Americas, and the number of MEIs found uniquely in a single individual are also distributed in this general pattern. The maximum fraction of MEI diversity partitioned among the seven major SGDP population groups (FST) is 7.4%, similar to, but slightly lower than, previous estimates and likely attributable to the diverse sampling strategy of the SGDP. Finally, we utilize these MEIs to extrapolate the primary Native American shared ancestry component to back to Asia and provide new evidence from genome-wide identical-by-descent genetic markers that add additional support for a southeastern Siberian origin for most Native Americans.


Subject(s)
Alu Elements , Genetic Variation , Genome, Human , Long Interspersed Nucleotide Elements , Humans , Phylogeography
SELECTION OF CITATIONS
SEARCH DETAIL