RESUMO
Human populations have experienced dramatic growth since the Neolithic revolution. Recent studies that sequenced a very large number of individuals observed an extreme excess of rare variants and provided clear evidence of recent rapid growth in effective population size, although estimates have varied greatly among studies. All these studies were based on protein-coding genes, in which variants are also impacted by natural selection. In this study, we introduce targeted sequencing data for studying recent human history with minimal confounding by natural selection. We sequenced loci far from genes that meet a wide array of additional criteria such that mutations in these loci are putatively neutral. As population structure also skews allele frequencies, we sequenced 500 individuals of relatively homogeneous ancestry by first analyzing the population structure of 9,716 European Americans. We used very high coverage sequencing to reliably call rare variants and fit an extensive array of models of recent European demographic history to the site frequency spectrum. The best-fit model estimates â¼ 3.4% growth per generation during the last â¼ 140 generations, resulting in a population size increase of two orders of magnitude. This model fits the data very well, largely due to our observation that assumptions of more ancient demography can impact estimates of recent growth. This observation and results also shed light on the discrepancy in demographic estimates among recent studies.
Assuntos
Variação Genética , Modelos Genéticos , Crescimento Demográfico , Sequência de Bases , Genética Populacional , Humanos , Dados de Sequência Molecular , Análise de Componente Principal , Análise de Sequência de DNA , Estados Unidos , População Branca/genéticaRESUMO
Accurately determining the distribution of rare variants is an important goal of human genetics, but resequencing of a sample large enough for this purpose has been unfeasible until now. Here, we applied Sanger sequencing of genomic PCR amplicons to resequence the diabetes-associated genes KCNJ11 and HHEX in 13,715 people (10,422 European Americans and 3,293 African Americans) and validated amplicons potentially harbouring rare variants using 454 pyrosequencing. We observed far more variation (expected variant-site count â¼578) than would have been predicted on the basis of earlier surveys, which could only capture the distribution of common variants. By comparison with earlier estimates based on common variants, our model shows a clear genetic signal of accelerating population growth, suggesting that humanity harbours a myriad of rare, deleterious variants, and that disease risk and the burden of disease in contemporary populations may be heavily influenced by the distribution of rare variants.
RESUMO
We present a highly accurate method for identifying genes with conserved RNA secondary structure by searching multiple sequence alignments of a large set of candidate orthologs for correlated arrangements of reverse-complementary regions. This approach is growing increasingly feasible as the genomes of ever more organisms are sequenced. A program called msari implements this method and is significantly more accurate than existing methods in the context of automatically generated alignments, making it particularly applicable to high-throughput scans. In our tests, it discerned clustalw-generated multiple sequence alignments of signal recognition particle or RNaseP orthologs from controls with 89.1% sensitivity at 97.5% specificity and with 74.4% sensitivity with no false positives in 494 controls. We used msari to conduct a comprehensive scan for secondary structure in mRNAs of coding genes, and we found many genes with known mRNA secondary structure and compelling evidence for secondary structure in other genes. msari uses a method for coping with sequence redundancy that is likely to have applications in a large set of other comparison-based search methods. The program is available for download from http://theory.csail.mit.edu/MSARi.