Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 19 de 19
1.
Nat Commun ; 12(1): 6442, 2021 11 08.
Article En | MEDLINE | ID: mdl-34750360

The genetic architecture of atrial fibrillation (AF) encompasses low impact, common genetic variants and high impact, rare variants. Here, we characterize a high impact AF-susceptibility allele, KCNQ1 R231H, and describe its transcontinental geographic distribution and history. Induced pluripotent stem cell-derived cardiomyocytes procured from risk allele carriers exhibit abbreviated action potential duration, consistent with a gain-of-function effect. Using identity-by-descent (IBD) networks, we estimate the broad- and fine-scale population ancestry of risk allele carriers and their relatives. Analysis of ancestral migration routes reveals ancestors who inhabited Denmark in the 1700s, migrated to the Northeastern United States in the early 1800s, and traveled across the Midwest to arrive in Utah in the late 1800s. IBD/coalescent-based allele dating analysis reveals a relatively recent origin of the AF risk allele (~5000 years). Thus, our approach broadens the scope of study for disease susceptibility alleles to the context of human migration and ancestral origins.


Atrial Fibrillation/genetics , Genetic Predisposition to Disease/genetics , KCNQ1 Potassium Channel/genetics , Mutation, Missense , Polymorphism, Single Nucleotide , Action Potentials , Alleles , Denmark , Emigrants and Immigrants , Female , Genotype , Geography , Humans , Induced Pluripotent Stem Cells/cytology , Induced Pluripotent Stem Cells/metabolism , Male , Middle Aged , Myocytes, Cardiac/cytology , Myocytes, Cardiac/metabolism , Myocytes, Cardiac/physiology , Pedigree , Risk Factors , Utah
2.
BMC Bioinformatics ; 22(1): 459, 2021 Sep 25.
Article En | MEDLINE | ID: mdl-34563119

BACKGROUND: We present ARCHes, a fast and accurate haplotype-based approach for inferring an individual's ancestry composition. Our approach works by modeling haplotype diversity from a large, admixed cohort of hundreds of thousands, then annotating those models with population information from reference panels of known ancestry. RESULTS: The running time of ARCHes does not depend on the size of a reference panel because training and testing are separate processes, and the inferred population-annotated haplotype models can be written to disk and reused to label large test sets in parallel (in our experiments, it averages less than one minute to assign ancestry from 32 populations using 10 CPU). We test ARCHes on public data from the 1000 Genomes Project and the Human Genome Diversity Project (HGDP) as well as simulated examples of known admixture. CONCLUSIONS: Our results demonstrate that ARCHes outperforms RFMix at correctly assigning both global and local ancestry at finer population scales regardless of the amount of population admixture.


Genetics, Population , Genome, Human , Haplotypes , Humans , Polymorphism, Single Nucleotide
3.
G3 (Bethesda) ; 9(9): 2863-2878, 2019 09 04.
Article En | MEDLINE | ID: mdl-31484785

We present a massive investigation into the genetic basis of human lifespan. Beginning with a genome-wide association (GWA) study using a de-identified snapshot of the unique AncestryDNA database - more than 300,000 genotyped individuals linked to pedigrees of over 400,000,000 people - we mapped six genome-wide significant loci associated with parental lifespan. We compared these results to a GWA analysis of the traditional lifespan proxy trait, age, and found only one locus, APOE, to be associated with both age and lifespan. By combining the AncestryDNA results with those of an independent UK Biobank dataset, we conducted a meta-analysis of more than 650,000 individuals and identified fifteen parental lifespan-associated loci. Beyond just those significant loci, our genome-wide set of polymorphisms accounts for up to 8% of the variance in human lifespan; this value represents a large fraction of the heritability estimated from phenotypic correlations between relatives.


Genome-Wide Association Study/methods , Longevity/genetics , Aged , Aged, 80 and over , Apolipoproteins E/genetics , Carrier Proteins/genetics , Databases, Genetic , Female , Humans , Male , Nuclear Proteins/genetics , Pedigree , Polymorphism, Single Nucleotide , Prospective Studies , Proto-Oncogene Proteins/genetics
4.
Genetics ; 210(3): 1109-1124, 2018 11.
Article En | MEDLINE | ID: mdl-30401766

Human life span is a phenotype that integrates many aspects of health and environment into a single ultimate quantity: the elapsed time between birth and death. Though it is widely believed that long life runs in families for genetic reasons, estimates of life span "heritability" are consistently low (∼15-30%). Here, we used pedigree data from Ancestry public trees, including hundreds of millions of historical persons, to estimate the heritability of human longevity. Although "nominal heritability" estimates based on correlations among genetic relatives agreed with prior literature, the majority of that correlation was also captured by correlations among nongenetic (in-law) relatives, suggestive of highly assortative mating around life span-influencing factors (genetic and/or environmental). We used structural equation modeling to account for assortative mating, and concluded that the true heritability of human longevity for birth cohorts across the 1800s and early 1900s was well below 10%, and that it has been generally overestimated due to the effect of assortative mating.


Longevity/genetics , Reproduction , Female , Humans , Male , Models, Genetic , Pedigree
5.
Nat Commun ; 8: 14238, 2017 02 07.
Article En | MEDLINE | ID: mdl-28169989

Despite strides in characterizing human history from genetic polymorphism data, progress in identifying genetic signatures of recent demography has been limited. Here we identify very recent fine-scale population structure in North America from a network of over 500 million genetic (identity-by-descent, IBD) connections among 770,000 genotyped individuals of US origin. We detect densely connected clusters within the network and annotate these clusters using a database of over 20 million genealogical records. Recent population patterns captured by IBD clustering include immigrants such as Scandinavians and French Canadians; groups with continental admixture such as Puerto Ricans; settlers such as the Amish and Appalachians who experienced geographic or cultural isolation; and broad historical trends, including reduced north-south gene flow. Our results yield a detailed historical portrait of North America after European settlement and support substantial genetic heterogeneity in the United States beyond that uncovered by previous studies.


Demography/statistics & numerical data , Genetics, Population/methods , Population Dynamics/trends , Population/genetics , Cluster Analysis , Demography/methods , Emigrants and Immigrants , Gene Flow/genetics , Genotyping Techniques , Haplotypes/genetics , Humans , Polymorphism, Single Nucleotide , Population Dynamics/statistics & numerical data , Sequence Analysis, DNA , United States/ethnology
6.
PLoS Genet ; 9(11): e1003925, 2013 Nov.
Article En | MEDLINE | ID: mdl-24244192

The Caribbean basin is home to some of the most complex interactions in recent history among previously diverged human populations. Here, we investigate the population genetic history of this region by characterizing patterns of genome-wide variation among 330 individuals from three of the Greater Antilles (Cuba, Puerto Rico, Hispaniola), two mainland (Honduras, Colombia), and three Native South American (Yukpa, Bari, and Warao) populations. We combine these data with a unique database of genomic variation in over 3,000 individuals from diverse European, African, and Native American populations. We use local ancestry inference and tract length distributions to test different demographic scenarios for the pre- and post-colonial history of the region. We develop a novel ancestry-specific PCA (ASPCA) method to reconstruct the sub-continental origin of Native American, European, and African haplotypes from admixed genomes. We find that the most likely source of the indigenous ancestry in Caribbean islanders is a Native South American component shared among inland Amazonian tribes, Central America, and the Yucatan peninsula, suggesting extensive gene flow across the Caribbean in pre-Columbian times. We find evidence of two pulses of African migration. The first pulse--which today is reflected by shorter, older ancestry tracts--consists of a genetic component more similar to coastal West African regions involved in early stages of the trans-Atlantic slave trade. The second pulse--reflected by longer, younger tracts--is more similar to present-day West-Central African populations, supporting historical records of later transatlantic deportation. Surprisingly, we also identify a Latino-specific European component that has significantly diverged from its parental Iberian source populations, presumably as a result of small European founder population size. We demonstrate that the ancestral components in admixed genomes can be traced back to distinct sub-continental source populations with far greater resolution than previously thought, even when limited pre-Columbian Caribbean haplotypes have survived.


Black People/genetics , Gene Flow , Genetics, Population , Indians, North American/genetics , White People/genetics , Caribbean Region , DNA, Mitochondrial/genetics , Demography , Genomics , Haplotypes , Hispanic or Latino/genetics , Humans
7.
PLoS Genet ; 9(12): e1004023, 2013.
Article En | MEDLINE | ID: mdl-24385924

There is great scientific and popular interest in understanding the genetic history of populations in the Americas. We wish to understand when different regions of the continent were inhabited, where settlers came from, and how current inhabitants relate genetically to earlier populations. Recent studies unraveled parts of the genetic history of the continent using genotyping arrays and uniparental markers. The 1000 Genomes Project provides a unique opportunity for improving our understanding of population genetic history by providing over a hundred sequenced low coverage genomes and exomes from Colombian (CLM), Mexican-American (MXL), and Puerto Rican (PUR) populations. Here, we explore the genomic contributions of African, European, and especially Native American ancestry to these populations. Estimated Native American ancestry is 48% in MXL, 25% in CLM, and 13% in PUR. Native American ancestry in PUR is most closely related to populations surrounding the Orinoco River basin, confirming the Southern American ancestry of the Taíno people of the Caribbean. We present new methods to estimate the allele frequencies in the Native American fraction of the populations, and model their distribution using a demographic model for three ancestral Native American populations. These ancestral populations likely split in close succession: the most likely scenario, based on a peopling of the Americas 16 thousand years ago (kya), supports that the MXL Ancestors split 12.2kya, with a subsequent split of the ancestors to CLM and PUR 11.7kya. The model also features effective populations of 62,000 in Mexico, 8,700 in Colombia, and 1,900 in Puerto Rico. Modeling Identity-by-descent (IBD) and ancestry tract length, we show that post-contact populations also differ markedly in their effective sizes and migration patterns, with Puerto Rico showing the smallest effective size and the earlier migration from Europe. Finally, we compare IBD and ancestry assignments to find evidence for relatedness among European founders to the three populations.


Gene Frequency/genetics , Genetics, Population , Human Migration , Indians, North American/genetics , Black People/genetics , Chromosome Mapping , Exome , Genome, Human , Hispanic or Latino/genetics , Human Genome Project , Humans , Mexican Americans/genetics , Mexico , Puerto Rico , Racial Groups/genetics , White People/genetics
8.
Hum Biol ; 84(4): 343-64, 2012 Aug.
Article En | MEDLINE | ID: mdl-23249312

Identifying ancestry along each chromosome in admixed individuals provides a wealth of information for understanding the population genetic history of admixture events and is valuable for admixture mapping and identifying recent targets of selection. We present PCAdmix (available at https://sites.google.com/site/pcadmix/home ), a Principal Components-based algorithm for determining ancestry along each chromosome from a high-density, genome-wide set of phased single-nucleotide polymorphism (SNP) genotypes of admixed individuals. We compare our method to HAPMIX on simulated data from two ancestral populations, and we find high concordance between the methods. Our method also has better accuracy than LAMP when applied to three-population admixture, a situation as yet unaddressed by HAPMIX. Finally, we apply our method to a data set of four Latino populations with European, African, and Native American ancestry. We find evidence of assortative mating in each of the four populations, and we identify regions of shared ancestry that may be recent targets of selection and could serve as candidate regions for admixture-based association mapping.


Chromosomes, Human , Genotype , Models, Genetic , Polymorphism, Single Nucleotide , Population Dynamics , Principal Component Analysis/methods , Racial Groups/genetics , Algorithms , Computer Simulation , Genomics , Humans , Phylogeography , United States
9.
Nat Genet ; 44(12): 1294-301, 2012 Dec.
Article En | MEDLINE | ID: mdl-23104008

To further investigate susceptibility loci identified by genome-wide association studies, we genotyped 5,500 SNPs across 14 associated regions in 8,000 samples from a control group and 3 diseases: type 2 diabetes (T2D), coronary artery disease (CAD) and Graves' disease. We defined, using Bayes theorem, credible sets of SNPs that were 95% likely, based on posterior probability, to contain the causal disease-associated SNPs. In 3 of the 14 regions, TCF7L2 (T2D), CTLA4 (Graves' disease) and CDKN2A-CDKN2B (T2D), much of the posterior probability rested on a single SNP, and, in 4 other regions (CDKN2A-CDKN2B (CAD) and CDKAL1, FTO and HHEX (T2D)), the 95% sets were small, thereby excluding most SNPs as potentially causal. Very few SNPs in our credible sets had annotated functions, illustrating the limitations in understanding the mechanisms underlying susceptibility to common diseases. Our results also show the value of more detailed mapping to target sequences for functional studies.


Coronary Artery Disease/genetics , Diabetes Mellitus, Type 2/genetics , Genetic Loci , Genetic Predisposition to Disease , Genome-Wide Association Study , Graves Disease/genetics , Alpha-Ketoglutarate-Dependent Dioxygenase FTO , Bayes Theorem , CTLA-4 Antigen/genetics , Cyclin-Dependent Kinase 5/genetics , Cyclin-Dependent Kinase Inhibitor p15/genetics , Genes, p16 , Homeodomain Proteins/genetics , Humans , Polymorphism, Single Nucleotide , Proteins/genetics , Transcription Factor 7-Like 2 Protein/genetics , Transcription Factors/genetics , tRNA Methyltransferases
10.
Am J Hum Genet ; 91(4): 660-71, 2012 Oct 05.
Article En | MEDLINE | ID: mdl-23040495

Full sequencing of individual human genomes has greatly expanded our understanding of human genetic variation and population history. Here, we present a systematic analysis of 50 human genomes from 11 diverse global populations sequenced at high coverage. Our sample includes 12 individuals who have admixed ancestry and who have varying degrees of recent (within the last 500 years) African, Native American, and European ancestry. We found over 21 million single-nucleotide variants that contribute to a 1.75-fold range in nucleotide heterozygosity across diverse human genomes. This heterozygosity ranged from a high of one heterozygous site per kilobase in west African genomes to a low of 0.57 heterozygous sites per kilobase in segments inferred to have diploid Native American ancestry from the genomes of Mexican and Puerto Rican individuals. We show evidence of all three continental ancestries in the genomes of Mexican, Puerto Rican, and African American populations, and the genome-wide statistics are highly consistent across individuals from a population once ancestry proportions have been accounted for. Using a generalized linear model, we identified subtle variations across populations in the proportion of neutral versus deleterious variation and found that genome-wide statistics vary in admixed populations even once ancestry proportions have been factored in. We further infer that multiple periods of gene flow shaped the diversity of admixed populations in the Americas-70% of the European ancestry in today's African Americans dates back to European gene flow happening only 7-8 generations ago.


Genome, Human , Haplotypes/genetics , Population/genetics , Racial Groups/genetics , Genetics, Population/methods , Heterozygote , Humans , Polymorphism, Single Nucleotide
11.
J Neurol Neurosurg Psychiatry ; 83(8): 793-5, 2012 Aug.
Article En | MEDLINE | ID: mdl-22626946

OBJECTIVE: Pregnancy has a well documented effect on relapse risk in multiple sclerosis (MS). Prospective studies have reported a significant decline by two-thirds in the rate of relapses during the third trimester of pregnancy and a significant increase by two-thirds during the first 3 months postpartum. However, it is unclear as to whether there are any long term effects on disability. METHODS: Data were collated from clinical records and family histories systematically collected from the University of British Columbia MS Clinic. RESULTS: Clinical and term pregnancy data were available from 2105 female MS patients. MS patients having children after MS onset took the longest time to reach an Expanded Disability Status Scale (EDSS) score of 6 (mean 22.9 years) and patients having children before MS onset were the quickest (mean 13.2 years). However, these effects were not related to term pregnancy and were fully accounted for by age of MS onset. CONCLUSIONS: Pregnancy had no effect on the time to reach an EDSS score 6. As MS predominantly affects women of childbearing age, women with MS can be reassured that term pregnancies do not appear to have any long term effects on disability.


Multiple Sclerosis/etiology , Pregnancy Complications/epidemiology , Activities of Daily Living , Adult , Age of Onset , Cohort Studies , Female , Humans , Maternal Age , Multiple Sclerosis/pathology , Parity , Pregnancy , Pregnancy Complications/pathology , Pregnancy Outcome , Young Adult
12.
PLoS Genet ; 8(1): e1002397, 2012 Jan.
Article En | MEDLINE | ID: mdl-22253600

North African populations are distinct from sub-Saharan Africans based on cultural, linguistic, and phenotypic attributes; however, the time and the extent of genetic divergence between populations north and south of the Sahara remain poorly understood. Here, we interrogate the multilayered history of North Africa by characterizing the effect of hypothesized migrations from the Near East, Europe, and sub-Saharan Africa on current genetic diversity. We present dense, genome-wide SNP genotyping array data (730,000 sites) from seven North African populations, spanning from Egypt to Morocco, and one Spanish population. We identify a gradient of likely autochthonous Maghrebi ancestry that increases from east to west across northern Africa; this ancestry is likely derived from "back-to-Africa" gene flow more than 12,000 years ago (ya), prior to the Holocene. The indigenous North African ancestry is more frequent in populations with historical Berber ethnicity. In most North African populations we also see substantial shared ancestry with the Near East, and to a lesser extent sub-Saharan Africa and Europe. To estimate the time of migration from sub-Saharan populations into North Africa, we implement a maximum likelihood dating method based on the distribution of migrant tracts. In order to first identify migrant tracts, we assign local ancestry to haplotypes using a novel, principal component-based analysis of three ancestral populations. We estimate that a migration of western African origin into Morocco began about 40 generations ago (approximately 1,200 ya); a migration of individuals with Nilotic ancestry into Egypt occurred about 25 generations ago (approximately 750 ya). Our genomic data reveal an extraordinarily complex history of migrations, involving at least five ancestral populations, into North Africa.


Black People/genetics , Gene Flow/genetics , Genetic Variation , Population Dynamics , Population , Africa South of the Sahara/ethnology , Africa, Northern , Black People/history , DNA, Mitochondrial/genetics , Egypt, Ancient , Emigration and Immigration , Europe , Gene Pool , Genomics , Genotype , Haplotypes , History, Ancient , Humans , Middle East , Morocco , Polymorphism, Single Nucleotide , White People/genetics , White People/history
13.
PLoS Genet ; 7(9): e1002280, 2011 Sep.
Article En | MEDLINE | ID: mdl-21935354

Whole-genome sequencing harbors unprecedented potential for characterization of individual and family genetic variation. Here, we develop a novel synthetic human reference sequence that is ethnically concordant and use it for the analysis of genomes from a nuclear family with history of familial thrombophilia. We demonstrate that the use of the major allele reference sequence results in improved genotype accuracy for disease-associated variant loci. We infer recombination sites to the lowest median resolution demonstrated to date (< 1,000 base pairs). We use family inheritance state analysis to control sequencing error and inform family-wide haplotype phasing, allowing quantification of genome-wide compound heterozygosity. We develop a sequence-based methodology for Human Leukocyte Antigen typing that contributes to disease risk prediction. Finally, we advance methods for analysis of disease and pharmacogenomic risk across the coding and non-coding genome that incorporate phased variant data. We show these methods are capable of identifying multigenic risk for inherited thrombophilia and informing the appropriate pharmacological therapy. These ethnicity-specific, family-based approaches to interpretation of genetic variation are emblematic of the next generation of genetic risk assessment using whole-genome sequencing.


DNA Mutational Analysis/methods , Genes, Synthetic , Genetic Variation , Genome-Wide Association Study/methods , Thrombophilia/genetics , Alleles , Base Sequence , Female , Genetic Predisposition to Disease , Genome, Human , Genotype , Haplotypes , Humans , Male , Pedigree , Reference Standards , Risk Assessment , Sequence Alignment , Sequence Analysis, DNA
14.
Nature ; 464(7289): 713-20, 2010 Apr 01.
Article En | MEDLINE | ID: mdl-20360734

Copy number variants (CNVs) account for a major proportion of human genetic polymorphism and have been predicted to have an important role in genetic susceptibility to common disease. To address this we undertook a large, direct genome-wide study of association between CNVs and eight common human diseases. Using a purpose-designed array we typed approximately 19,000 individuals into distinct copy-number classes at 3,432 polymorphic CNVs, including an estimated approximately 50% of all common CNVs larger than 500 base pairs. We identified several biological artefacts that lead to false-positive associations, including systematic CNV differences between DNAs derived from blood and cell lines. Association testing and follow-up replication analyses confirmed three loci where CNVs were associated with disease-IRGM for Crohn's disease, HLA for Crohn's disease, rheumatoid arthritis and type 1 diabetes, and TSPAN8 for type 2 diabetes-although in each case the locus had previously been identified in single nucleotide polymorphism (SNP)-based studies, reflecting our observation that most common CNVs that are well-typed on our array are well tagged by SNPs and so have been indirectly explored through SNP studies. We conclude that common CNVs that can be typed on existing platforms are unlikely to contribute greatly to the genetic basis of common human diseases.


DNA Copy Number Variations/genetics , Disease , Genetic Predisposition to Disease/genetics , Genome-Wide Association Study , Arthritis, Rheumatoid/genetics , Case-Control Studies , Crohn Disease/genetics , Diabetes Mellitus/genetics , Gene Frequency/genetics , Humans , Nucleic Acid Hybridization , Oligonucleotide Array Sequence Analysis , Pilot Projects , Polymorphism, Single Nucleotide/genetics , Quality Control
15.
J Hum Genet ; 54(9): 547-9, 2009 Sep.
Article En | MEDLINE | ID: mdl-19629136

Multiple sclerosis (MS) is a complex neurological trait. Allelic variation in the MHC class II region exerts the single strongest effect on MS genetic risk. The clinical onset of the disease is extremely variable, and can range from the first to the ninth decade of life. Epidemiological studies have suggested a modest genetic component to the age of onset (AO) of MS. Previous studies have shown that HLA-DRB1*1501 may be associated with a younger AO. Here, we sought to uncover any effect of HLA-DRB1*1501 on the AO of MS in a large Canadian cohort. A total of 1816 MS patients were genotyped for HLA-DRB1. Patients carrying HLA-DRB1*1501 were shown to have a small, but significantly lower, AO than patients without the allele (P=0.03). HLA-DRB1*1501 was also shown to reduce the mean AO in both progressive and relapsing forms of the disease. An investigation of parent-of-origin effects indicated that the lower AO for HLA-DRB1*1501 patients arises from maternally transmitted HLA-DRB1*1501 haplotypes (maternal HLA-DRB1*1501 mean AO=28.4 years, paternal=30.3 years; P=0.009). HLA-DRB1*1501 exerts a modest, but significant effect on the AO of all forms of MS. Parent-of-origin effects at the MHC are further implicated in MS disease pathogenesis.


HLA-DR Antigens/genetics , Haplotypes/genetics , Multiple Sclerosis/genetics , Adult , Age of Onset , Alleles , Canada , Female , Genetic Predisposition to Disease , Genotype , HLA-DR Antigens/immunology , HLA-DRB1 Chains , Humans , Male , Parents , Phenotype , Risk Factors
16.
Genome Biol ; 9(11): R165, 2008.
Article En | MEDLINE | ID: mdl-19025653

Whole genome tiling arrays are a key tool for profiling global genetic and expression variation. In this study we present our methods for detecting transcript level variation, splicing variation and allele specific expression in Arabidopsis thaliana. We also developed a generalized hidden Markov model for profiling transcribed fragment variation de novo. Our study demonstrates that whole genome tiling arrays are a powerful platform for dissecting natural transcriptome variation at multi-dimension and high resolution.


Arabidopsis/genetics , Gene Expression Profiling , Genome, Plant , Polymorphism, Genetic , Alternative Splicing , Arabidopsis/metabolism , Gene Expression Regulation, Plant , Markov Chains
17.
Proc Natl Acad Sci U S A ; 103(39): 14412-6, 2006 Sep 26.
Article En | MEDLINE | ID: mdl-16971485

Many Saccharomyces cerevisiae duplicate genes that were derived from an ancient whole-genome duplication (WGD) unexpectedly show a small synonymous divergence (K(S)), a higher sequence similarity to each other than to orthologues in Saccharomyces bayanus, or slow evolution compared with the orthologue in Kluyveromyces waltii, a non-WGD species. This decelerated evolution was attributed to gene conversion between duplicates. Using approximately 300 WGD gene pairs in four species and their orthologues in non-WGD species, we show that codon-usage bias and protein-sequence conservation are two important causes for decelerated evolution of duplicate genes, whereas gene conversion is effective only in the presence of strong codon-usage bias or protein-sequence conservation. Furthermore, we find that change in mutation pattern or in tDNA copy number changed codon-usage bias and increased the K(S) distance between K. waltii and S. cerevisiae. Intriguingly, some proteins showed fast evolution before the radiation of WGD species but little or no sequence divergence between orthologues and paralogues thereafter, indicating that functional conservation after the radiation may also be responsible for decelerated evolution in duplicates.


Codon/genetics , Gene Duplication , Phylogeny , Yeasts/genetics , DNA, Fungal/genetics , Genes, Fungal/genetics
18.
Mol Biol Evol ; 23(6): 1136-43, 2006 Jun.
Article En | MEDLINE | ID: mdl-16527865

In Saccharomyces, an ancient whole-genome duplication (WGD) and widespread duplicate gene deletion resulted in extensive reorganization of adjacent gene relationships. We have studied the evolution of adjacent gene pairs' identity, orientation, and spacing following whole-genome duplication and deletion (WGD-D) using comparative genomic analyses and simulations. Surveying adjacent gene organization across the Saccharomyces species complex, we find a genome-wide bias toward divergently and convergently transcribed gene pairs in all species but a reduction in this bias in the species that underwent WGD-D. Among neutral models of WGD-D, only single-gene deletion can produce the appropriate reduction in orientation bias and recapitulate the pattern of short, highly dispersed deletions we observe in Saccharomyces cerevisiae. To characterize the dynamics of WGD-D, we trace the conservation and creation of adjacent gene pairs along the S. cerevisiae lineage. We find that newly created adjacencies have a tandem orientation bias, while adjacencies conserved from prior to WGD-D have the same divergent-convergent bias as found in the species that diverged before WGD. We also find that adjacent gene pairs produced by WGD-D gained greater intergenic spacing but that this is reduced in the older adjacencies. Given this, and the preponderance of short deleted blocks, we argue that the deletion phase of WGD-D occurred primarily by small inactivating mutations followed by numerous small deletions. Newly created adjacent gene pairs also have an initial increase in mean log2 expression ratios and maximal expression levels, suggesting that increased intergenic spacing caused a genome-wide reduction in transcriptional interference.


Gene Deletion , Gene Duplication , Genome, Fungal , Saccharomyces cerevisiae/genetics , Saccharomyces/genetics , Evolution, Molecular , Gene Expression , Genes, Fungal , Oligonucleotide Array Sequence Analysis
19.
Proc Natl Acad Sci U S A ; 103(7): 2232-6, 2006 Feb 14.
Article En | MEDLINE | ID: mdl-16461903

The question of how duplicate genes are retained in a population remains controversial. The duplication-degeneration-complementation model, which involves no positive selection, stipulates a higher retention rate of duplicate genes in a small population than in a large one. This model has been accepted by many evolutionists. However, we found considerably more retentions and fewer losses of duplicate genes in the mouse genome than in the human genome, although the population size of rodents is in general larger than that of primates. Indeed, in nearly every interval of synonymous divergence between duplicate genes, the number of gene retentions in mouse is larger than that in human. Our findings suggest a more important role of positive selection in duplicate retention than duplication-degeneration-complementation. In addition, certain functional categories show a higher tendency of lineage-specific expansion than expected, suggesting lineage-specific selection or functional bias in retained duplicates.


Gene Duplication , Genes, Duplicate/genetics , Genome, Human/genetics , Genome/genetics , Selection, Genetic , Animals , Cell Lineage/genetics , Humans , Mice , Models, Genetic
...