RESUMO
Here, we demonstrate how comparative sequence analysis facilitates genome-wide base-pair-level interpretation of individual genetic variation and address two questions of importance for human personal genomics: first, whether an individual's functional variation comes mostly from noncoding or coding polymorphisms; and, second, whether population-specific or globally-present polymorphisms contribute more to functional variation in any given individual. Neither has been definitively answered by analyses of existing variation data because of a focus on coding polymorphisms, ascertainment biases in favor of common variation, and a lack of base-pair-level resolution for identifying functional variants. We resequenced 575 amplicons within 432 individuals at genomic sites enriched for evolutionary constraint and also analyzed variation within three published human genomes. We find that single-site measures of evolutionary constraint derived from mammalian multiple sequence alignments are strongly predictive of reductions in modern-day genetic diversity across a range of annotation categories and across the allele frequency spectrum from rare (<1%) to high frequency (>10% minor allele frequency). Furthermore, we show that putatively functional variation in an individual genome is dominated by polymorphisms that do not change protein sequence and that originate from our shared ancestral population and commonly segregate in human populations. These observations show that common, noncoding alleles contribute substantially to human phenotypes and that constraint-based analyses will be of value to identify phenotypically relevant variants in individual genomes.
Assuntos
Alelos , Frequência do Gene , Variação Genética , Genoma Humano , Alinhamento de Sequência , Sequência de Aminoácidos , Animais , Sequência de Bases , Evolução Biológica , Testes Genéticos , Genoma , Genômica , Humanos , Mamíferos/genética , Fenótipo , Polimorfismo Genético , Sequências Reguladoras de Ácido NucleicoRESUMO
As the final sequencing of the human genome has now been completed, we present the results of the largest examination of the quality of the finished DNA sequence. The completed study covers the major contributing sequencing centres and is based on a rigorous combination of laboratory experiments and computational analysis.
Assuntos
Biologia Computacional/normas , Genoma Humano , Projeto Genoma Humano , Análise de Sequência de DNA/normas , Pareamento de Bases , Biologia Computacional/tendências , Humanos , Controle de Qualidade , Projetos de Pesquisa , Sensibilidade e Especificidade , Análise de Sequência de DNA/tendênciasRESUMO
Human chromosome 16 features one of the highest levels of segmentally duplicated sequence among the human autosomes. We report here the 78,884,754 base pairs of finished chromosome 16 sequence, representing over 99.9% of its euchromatin. Manual annotation revealed 880 protein-coding genes confirmed by 1,670 aligned transcripts, 19 transfer RNA genes, 341 pseudogenes and three RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukaemia. Several large-scale structural polymorphisms spanning hundreds of kilobase pairs were identified and result in gene content differences among humans. Whereas the segmental duplications of chromosome 16 are enriched in the relatively gene-poor pericentromere of the p arm, some are involved in recent gene duplication and conversion events that are likely to have had an impact on the evolution of primates and human disease susceptibility.
Assuntos
Cromossomos Humanos Par 16/genética , Duplicação Gênica , Mapeamento Físico do Cromossomo , Animais , Genes/genética , Genômica , Heterocromatina/genética , Humanos , Dados de Sequência Molecular , Polimorfismo Genético/genética , Análise de Sequência de DNA , Sintenia/genéticaRESUMO
Chromosome 5 is one of the largest human chromosomes and contains numerous intrachromosomal duplications, yet it has one of the lowest gene densities. This is partially explained by numerous gene-poor regions that display a remarkable degree of noncoding conservation with non-mammalian vertebrates, suggesting that they are functionally constrained. In total, we compiled 177.7 million base pairs of highly accurate finished sequence containing 923 manually curated protein-coding genes including the protocadherin and interleukin gene families. We also completely sequenced versions of the large chromosome-5-specific internal duplications. These duplications are very recent evolutionary events and probably have a mechanistic role in human physiological variation, as deletions in these regions are the cause of debilitating disorders including spinal muscular atrophy.
Assuntos
Cromossomos Humanos Par 5/genética , Análise de Sequência de DNA , Animais , Composição de Bases , Caderinas/genética , Sequência Conservada/genética , Duplicação Gênica , Genes/genética , Doenças Genéticas Inatas/genética , Genômica , Humanos , Interleucinas/genética , Dados de Sequência Molecular , Atrofia Muscular Espinal/genética , Pan troglodytes/genética , Mapeamento Físico do Cromossomo , Pseudogenes/genética , Sintenia/genética , Vertebrados/genéticaRESUMO
Chromosome 19 has the highest gene density of all human chromosomes, more than double the genome-wide average. The large clustered gene families, corresponding high G + C content, CpG islands and density of repetitive DNA indicate a chromosome rich in biological and evolutionary significance. Here we describe 55.8 million base pairs of highly accurate finished sequence representing 99.9% of the euchromatin portion of the chromosome. Manual curation of gene loci reveals 1,461 protein-coding genes and 321 pseudogenes. Among these are genes directly implicated in mendelian disorders, including familial hypercholesterolaemia and insulin-resistant diabetes. Nearly one-quarter of these genes belong to tandemly arranged families, encompassing more than 25% of the chromosome. Comparative analyses show a fascinating picture of conservation and divergence, revealing large blocks of gene orthology with rodents, scattered regions with more recent gene family expansions and deletions, and segments of coding and non-coding conservation with the distant fish species Takifugu.