Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Cell ; 149(4): 912-22, 2012 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-22559943

RESUMO

Gene duplication is an important source of phenotypic change and adaptive evolution. We leverage a haploid hydatidiform mole to identify highly identical sequences missing from the reference genome, confirming that the cortical development gene Slit-Robo Rho GTPase-activating protein 2 (SRGAP2) duplicated three times exclusively in humans. We show that the promoter and first nine exons of SRGAP2 duplicated from 1q32.1 (SRGAP2A) to 1q21.1 (SRGAP2B) ∼3.4 million years ago (mya). Two larger duplications later copied SRGAP2B to chromosome 1p12 (SRGAP2C) and to proximal 1q21.1 (SRGAP2D) ∼2.4 and ∼1 mya, respectively. Sequence and expression analyses show that SRGAP2C is the most likely duplicate to encode a functional protein and is among the most fixed human-specific duplicate genes. Our data suggest a mechanism where incomplete duplication created a novel gene function-antagonizing parental SRGAP2 function-immediately "at birth" 2-3 mya, which is a time corresponding to the transition from Australopithecus to Homo and the beginning of neocortex expansion.


Assuntos
Evolução Molecular , Proteínas Ativadoras de GTPase/genética , Primatas/genética , Duplicações Segmentares Genômicas , Animais , Variações do Número de Cópias de DNA , Feminino , Genética Médica , Humanos , Mola Hidatiforme/genética , Hibridização in Situ Fluorescente , Mamíferos/genética , Dados de Sequência Molecular , Gravidez
2.
Cell ; 141(7): 1159-70, 2010 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-20602998

RESUMO

Highly active (i.e., "hot") long interspersed element-1 (LINE-1 or L1) sequences comprise the bulk of retrotransposition activity in the human genome; however, the abundance of hot L1s in the human population remains largely unexplored. Here, we used a fosmid-based, paired-end DNA sequencing strategy to identify 68 full-length L1s that are differentially present among individuals but are absent from the human genome reference sequence. The majority of these L1s were highly active in a cultured cell retrotransposition assay. Genotyping 26 elements revealed that two L1s are only found in Africa and that two more are absent from the H952 subset of the Human Genome Diversity Panel. Therefore, these results suggest that hot L1s are more abundant in the human population than previously appreciated, and that ongoing L1 retrotransposition continues to be a major source of interindividual genetic variation.


Assuntos
Genoma Humano , Elementos Nucleotídeos Longos e Dispersos , Sequência de Bases , Frequência do Gene , Genética Populacional , Humanos , Dados de Sequência Molecular , Filogenia
3.
Cell ; 143(5): 837-47, 2010 Nov 24.
Artigo em Inglês | MEDLINE | ID: mdl-21111241

RESUMO

Understanding the prevailing mutational mechanisms responsible for human genome structural variation requires uniformity in the discovery of allelic variants and precision in terms of breakpoint delineation. We develop a resource based on capillary end sequencing of 13.8 million fosmid clones from 17 human genomes and characterize the complete sequence of 1054 large structural variants corresponding to 589 deletions, 384 insertions, and 81 inversions. We analyze the 2081 breakpoint junctions and infer potential mechanism of origin. Three mechanisms account for the bulk of germline structural variation: microhomology-mediated processes involving short (2-20 bp) stretches of sequence (28%), nonallelic homologous recombination (22%), and L1 retrotransposition (19%). The high quality and long-range continuity of the sequence reveals more complex mutational mechanisms, including repeat-mediated inversions and gene conversion, that are most often missed by other methods, such as comparative genomic hybridization, single nucleotide polymorphism microarrays, and next-generation sequencing.


Assuntos
Genoma Humano , Variação Estrutural do Genoma , Mutação , Sequência de Bases , Conversão Gênica , Humanos , Dados de Sequência Molecular , Análise de Sequência de DNA
4.
Nature ; 536(7615): 205-9, 2016 08 11.
Artigo em Inglês | MEDLINE | ID: mdl-27487209

RESUMO

Genetic differences that specify unique aspects of human evolution have typically been identified by comparative analyses between the genomes of humans and closely related primates, including more recently the genomes of archaic hominins. Not all regions of the genome, however, are equally amenable to such study. Recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism and is mediated by a complex set of segmental duplications, many of which arose recently during human evolution. Here we reconstruct the evolutionary history of the locus and identify bolA family member 2 (BOLA2) as a gene duplicated exclusively in Homo sapiens. We estimate that a 95-kilobase-pair segment containing BOLA2 duplicated across the critical region approximately 282 thousand years ago (ka), one of the latest among a series of genomic changes that dramatically restructured the locus during hominid evolution. All humans examined carried one or more copies of the duplication, which nearly fixed early in the human lineage--a pattern unlikely to have arisen so rapidly in the absence of selection (P < 0.0097). We show that the duplication of BOLA2 led to a novel, human-specific in-frame fusion transcript and that BOLA2 copy number correlates with both RNA expression (r = 0.36) and protein level (r = 0.65), with the greatest expression difference between human and chimpanzee in experimentally derived stem cells. Analyses of 152 patients carrying a chromosome 16p11. rearrangement show that more than 96% of breakpoints occur within the H. sapiens-specific duplication. In summary, the duplicative transposition of BOLA2 at the root of the H. sapiens lineage about 282 ka simultaneously increased copy number of a gene associated with iron homeostasis and predisposed our species to recurrent rearrangements associated with disease.


Assuntos
Cromossomos Humanos Par 16/genética , Variações do Número de Cópias de DNA/genética , Evolução Molecular , Predisposição Genética para Doença , Proteínas/genética , Animais , Transtorno Autístico/genética , Quebra Cromossômica , Duplicação Gênica , Homeostase/genética , Humanos , Ferro/metabolismo , Pan troglodytes/genética , Pongo/genética , Proteínas/análise , Recombinação Genética , Especificidade da Espécie , Fatores de Tempo
5.
Proc Natl Acad Sci U S A ; 116(13): 6260-6269, 2019 03 26.
Artigo em Inglês | MEDLINE | ID: mdl-30850542

RESUMO

R-loops are abundant three-stranded nucleic-acid structures that form in cis during transcription. Experimental evidence suggests that R-loop formation is affected by DNA sequence and topology. However, the exact manner by which these factors interact to determine R-loop susceptibility is unclear. To investigate this, we developed a statistical mechanical equilibrium model of R-loop formation in superhelical DNA. In this model, the energy involved in forming an R-loop includes four terms-junctional and base-pairing energies and energies associated with superhelicity and with the torsional winding of the displaced DNA single strand around the RNA:DNA hybrid. This model shows that the significant energy barrier imposed by the formation of junctions can be overcome in two ways. First, base-pairing energy can favor RNA:DNA over DNA:DNA duplexes in favorable sequences. Second, R-loops, by absorbing negative superhelicity, partially or fully relax the rest of the DNA domain, thereby returning it to a lower energy state. In vitro transcription assays confirmed that R-loops cause plasmid relaxation and that negative superhelicity is required for R-loops to form, even in a favorable region. Single-molecule R-loop footprinting following in vitro transcription showed a strong agreement between theoretical predictions and experimental mapping of stable R-loop positions and further revealed the impact of DNA topology on the R-loop distribution landscape. Our results clarify the interplay between base sequence and DNA superhelicity in controlling R-loop stability. They also reveal R-loops as powerful and reversible topology sinks that cells may use to nonenzymatically relieve superhelical stress during transcription.


Assuntos
Sequência de Bases , DNA Super-Helicoidal/química , DNA/química , Conformação de Ácido Nucleico , DNA de Cadeia Simples/química , Modelos Genéticos , Hibridização de Ácido Nucleico , Plasmídeos/química , RNA/química , Transcrição Gênica
6.
Nature ; 517(7536): 608-11, 2015 Jan 29.
Artigo em Inglês | MEDLINE | ID: mdl-25383537

RESUMO

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Genômica , Análise de Sequência de DNA/métodos , Inversão Cromossômica/genética , Cromossomos Humanos Par 10/genética , Clonagem Molecular , Sequência Rica em GC/genética , Haploidia , Humanos , Mutagênese Insercional/genética , Padrões de Referência , Sequências de Repetição em Tandem/genética
7.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-26432246

RESUMO

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Assuntos
Variação Genética/genética , Genoma Humano/genética , Mapeamento Físico do Cromossomo , Sequência de Aminoácidos , Predisposição Genética para Doença , Genética Médica , Genética Populacional , Estudo de Associação Genômica Ampla , Genômica , Genótipo , Haplótipos/genética , Homozigoto , Humanos , Dados de Sequência Molecular , Taxa de Mutação , Polimorfismo de Nucleotídeo Único/genética , Locos de Características Quantitativas/genética , Análise de Sequência de DNA , Deleção de Sequência/genética
8.
Nature ; 499(7459): 471-5, 2013 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-23823723

RESUMO

Most great ape genetic variation remains uncharacterized; however, its study is critical for understanding population history, recombination, selection and susceptibility to disease. Here we sequence to high coverage a total of 79 wild- and captive-born individuals representing all six great ape species and seven subspecies and report 88.8 million single nucleotide polymorphisms. Our analysis provides support for genetically distinct populations within each species, signals of gene flow, and the split of common chimpanzees into two distinct groups: Nigeria-Cameroon/western and central/eastern populations. We find extensive inbreeding in almost all wild populations, with eastern gorillas being the most extreme. Inferred effective population sizes have varied radically over time in different lineages and this appears to have a profound effect on the genetic diversity at, or close to, genes in almost all species. We discover and assign 1,982 loss-of-function variants throughout the human and great ape lineages, determining that the rate of gene loss has not been different in the human branch compared to other internal branches in the great ape phylogeny. This comprehensive catalogue of great ape genome diversity provides a framework for understanding evolution and a resource for more effective management of wild and captive great ape populations.


Assuntos
Variação Genética , Hominidae/genética , África , Animais , Animais Selvagens/genética , Animais de Zoológico/genética , Sudeste Asiático , Evolução Molecular , Fluxo Gênico/genética , Genética Populacional , Genoma/genética , Gorilla gorilla/classificação , Gorilla gorilla/genética , Hominidae/classificação , Humanos , Endogamia , Pan paniscus/classificação , Pan paniscus/genética , Pan troglodytes/classificação , Pan troglodytes/genética , Filogenia , Polimorfismo de Nucleotídeo Único/genética , Densidade Demográfica
9.
Nature ; 485(7397): 246-50, 2012 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-22495309

RESUMO

It is well established that autism spectrum disorders (ASD) have a strong genetic component; however, for at least 70% of cases, the underlying genetic cause is unknown. Under the hypothesis that de novo mutations underlie a substantial fraction of the risk for developing ASD in families with no previous history of ASD or related phenotypes--so-called sporadic or simplex families--we sequenced all coding regions of the genome (the exome) for parent-child trios exhibiting sporadic ASD, including 189 new trios and 20 that were previously reported. Additionally, we also sequenced the exomes of 50 unaffected siblings corresponding to these new (n = 31) and previously reported trios (n = 19), for a total of 677 individual exomes from 209 families. Here we show that de novo point mutations are overwhelmingly paternal in origin (4:1 bias) and positively correlated with paternal age, consistent with the modest increased risk for children of older fathers to develop ASD. Moreover, 39% (49 of 126) of the most severe or disruptive de novo mutations map to a highly interconnected ß-catenin/chromatin remodelling protein network ranked significantly for autism candidate genes. In proband exomes, recurrent protein-altering mutations were observed in two genes: CHD8 and NTNG1. Mutation screening of six candidate genes in 1,703 ASD probands identified additional de novo, protein-altering mutations in GRIN2B, LAMC3 and SCN1A. Combined with copy number variant (CNV) data, these results indicate extreme locus heterogeneity but also provide a target for future discovery, diagnostics and therapeutics.


Assuntos
Transtorno Autístico/genética , Exoma/genética , Éxons/genética , Mutação Puntual/genética , Mapas de Interação de Proteínas/genética , Proteínas de Ligação a DNA/genética , Proteínas Ligadas por GPI/genética , Predisposição Genética para Doença/genética , Humanos , Laminina/genética , Canal de Sódio Disparado por Voltagem NAV1.1 , Proteínas do Tecido Nervoso/genética , Netrinas , Pais , Receptores de N-Metil-D-Aspartato/genética , Reprodutibilidade dos Testes , Irmãos , Transdução de Sinais , Canais de Sódio/genética , Processos Estocásticos , Fatores de Transcrição/genética , Proteína Supressora de Tumor p53/metabolismo , beta Catenina/metabolismo
10.
Proc Natl Acad Sci U S A ; 112(52): E7223-9, 2015 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-26668394

RESUMO

NK-lysin is an antimicrobial peptide and effector protein in the host innate immune system. It is coded by a single gene in humans and most other mammalian species. In this study, we provide evidence for the existence of four NK-lysin genes in a repetitive region on cattle chromosome 11. The NK2A, NK2B, and NK2C genes are tandemly arrayed as three copies in ∼30-35-kb segments, located 41.8 kb upstream of NK1. All four genes are functional, albeit with differential tissue expression. NK1, NK2A, and NK2B exhibited the highest expression in intestine Peyer's patch, whereas NK2C was expressed almost exclusively in lung. The four peptide products were synthesized ex vivo, and their antimicrobial effects against both Gram-positive and Gram-negative bacteria were confirmed with a bacteria-killing assay. Transmission electron microcopy indicated that bovine NK-lysins exhibited their antimicrobial activities by lytic action in the cell membranes. In summary, the single NK-lysin gene in other mammals has expanded to a four-member gene family by tandem duplications in cattle; all four genes are transcribed, and the synthetic peptides corresponding to the core regions are biologically active and likely contribute to innate immunity in ruminants.


Assuntos
Bovinos/genética , Dosagem de Genes , Família Multigênica , Proteolipídeos/genética , Sequência de Aminoácidos , Animais , Sequência de Bases , Cromossomos de Mamíferos/genética , Escherichia coli/efeitos dos fármacos , Escherichia coli/crescimento & desenvolvimento , Escherichia coli/ultraestrutura , Perfilação da Expressão Gênica , Ordem dos Genes , Microscopia Eletrônica de Transmissão , Dados de Sequência Molecular , Especificidade de Órgãos/genética , Peptídeos/farmacologia , Filogenia , Proteolipídeos/classificação , Proteolipídeos/farmacologia , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico
11.
Genome Res ; 24(4): 688-96, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24418700

RESUMO

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.


Assuntos
Cromossomos Humanos Par 17/genética , Genoma Bacteriano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Animais , Cromossomos Artificiais Bacterianos/genética , Humanos , Camundongos , Dados de Sequência Molecular , Pan troglodytes/genética
12.
Am J Hum Genet ; 92(2): 221-37, 2013 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-23375656

RESUMO

Rare copy-number variants (CNVs) have been implicated in autism and intellectual disability. These variants are large and affect many genes but lack clear specificity toward autism as opposed to developmental-delay phenotypes. We exploited the repeat architecture of the genome to target segmental duplication-mediated rearrangement hotspots (n = 120, median size 1.78 Mbp, range 240 kbp to 13 Mbp) and smaller hotspots flanked by repetitive sequence (n = 1,247, median size 79 kbp, range 3-96 kbp) in 2,588 autistic individuals from simplex and multiplex families and in 580 controls. Our analysis identified several recurrent large hotspot events, including association with 1q21 duplications, which are more likely to be identified in individuals with autism than in those with developmental delay (p = 0.01; OR = 2.7). Within larger hotspots, we also identified smaller atypical CNVs that implicated CHD1L and ACACA for the 1q21 and 17q12 deletions, respectively. Our analysis, however, suggested no overall increase in the burden of smaller hotspots in autistic individuals as compared to controls. By focusing on gene-disruptive events, we identified recurrent CNVs, including DPP10, PLCB1, TRPM1, NRXN1, FHIT, and HYDIN, that are enriched in autism. We found that as the size of deletions increases, nonverbal IQ significantly decreases, but there is no impact on autism severity; and as the size of duplications increases, autism severity significantly increases but nonverbal IQ is not affected. The absence of an increased burden of smaller CNVs in individuals with autism and the failure of most large hotspots to refine to single genes is consistent with a model where imbalance of multiple genes contributes to a disease state.


Assuntos
Transtornos Globais do Desenvolvimento Infantil/genética , Variações do Número de Cópias de DNA/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Duplicações Segmentares Genômicas/genética , Estudos de Casos e Controles , Criança , Deleção Cromossômica , Duplicação Cromossômica/genética , Éxons/genética , Rearranjo Gênico/genética , Genoma Humano/genética , Humanos , Fenótipo
13.
Am J Hum Genet ; 92(4): 530-46, 2013 Apr 04.
Artigo em Inglês | MEDLINE | ID: mdl-23541343

RESUMO

The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (Fst = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.


Assuntos
Variações do Número de Cópias de DNA/genética , Fusão Gênica/genética , Genes de Cadeia Pesada de Imunoglobulina , Haplótipos/genética , Mola Hidatiforme/genética , Cadeias Pesadas de Imunoglobulinas/genética , Região Variável de Imunoglobulina/genética , Alelos , Cromossomos Artificiais Bacterianos , Feminino , Genética Populacional , Genótipo , Humanos , Dados de Sequência Molecular , Gravidez , Análise de Sequência de DNA , Recombinação V(D)J
14.
Genome Res ; 23(1): 46-59, 2013 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-23064749

RESUMO

Core duplicons in the human genome represent ancestral duplication modules shared by the majority of intrachromosomal duplication blocks within a given chromosome. These cores are associated with the emergence of novel gene families in the hominoid lineage, but their genomic organization and gene characterization among other primates are largely unknown. Here, we investigate the genomic organization and expression of the core duplicon on chromosome 17 that led to the expansion of LRRC37 during primate evolution. A comparison of the LRRC37 gene family organization in human, orangutan, macaque, marmoset, and lemur genomes shows the presence of both orthologous and species-specific gene copies in all primate lineages. Expression profiling in mouse, macaque, and human tissues reveals that the ancestral expression of LRRC37 was restricted to the testis. In the hominid lineage, the pattern of LRRC37 became increasingly ubiquitous, with significantly higher levels of expression in the cerebellum and thymus, and showed a remarkable diversity of alternative splice forms. Transfection studies in HeLa cells indicate that the human FLAG-tagged recombinant LRRC37 protein is secreted after cleavage of a transmembrane precursor and its overexpression can induce filipodia formation.


Assuntos
Evolução Molecular , Família Multigênica/genética , Primatas/genética , Proteínas/genética , Processamento Alternativo , Animais , Sequência de Bases , Cerebelo/metabolismo , Cromossomos de Mamíferos/genética , DNA/química , Duplicação Gênica , Perfilação da Expressão Gênica , Genoma/genética , Células HeLa , Humanos , Proteínas de Repetições Ricas em Leucina , Masculino , Camundongos , Dados de Sequência Molecular , Especificidade de Órgãos , Proteínas/metabolismo , Testículo/metabolismo , Timo/metabolismo , Transcrição Gênica/genética
15.
Genome Res ; 23(11): 1763-73, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24077392

RESUMO

Ape chromosomes homologous to human chromosomes 14 and 15 were generated by a fission event of an ancestral submetacentric chromosome, where the two chromosomes were joined head-to-tail. The hominoid ancestral chromosome most closely resembles the macaque chromosome 7. In this work, we provide insights into the evolution of human chromosomes 14 and 15, performing a comparative study between macaque boundary region 14/15 and the orthologous human regions. We construct a 1.6-Mb contig of macaque BAC clones in the region orthologous to the ancestral hominoid fission site and use it to define the structural changes that occurred on human 14q pericentromeric and 15q subtelomeric regions. We characterize the novel euchromatin-heterochromatin transition region (∼20 Mb) acquired during the neocentromere establishment on chromosome 14, and find it was mainly derived through pericentromeric duplications from ancestral hominoid chromosomes homologous to human 2q14-qter and 10. Further, we show a relationship between evolutionary hotspots and low-copy repeat loci for chromosome 15, revealing a possible role of segmental duplications not only in mediating but also in "stitching" together rearrangement breakpoints.


Assuntos
Cromossomos Humanos Par 14/genética , Cromossomos Humanos Par 15/genética , Cromossomos de Mamíferos/genética , Evolução Molecular , Hominidae/genética , Duplicações Segmentares Genômicas , Animais , Pontos de Quebra do Cromossomo , Duplicação Cromossômica , Cromossomos Artificiais Bacterianos , Clonagem Molecular , Eucromatina/genética , Heterocromatina/genética , Humanos , Dados de Sequência Molecular , Filogenia
16.
Genome Res ; 23(9): 1373-82, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23825009

RESUMO

Copy number variation (CNV) contributes to disease and has restructured the genomes of great apes. The diversity and rate of this process, however, have not been extensively explored among great ape lineages. We analyzed 97 deeply sequenced great ape and human genomes and estimate 16% (469 Mb) of the hominid genome has been affected by recent CNV. We identify a comprehensive set of fixed gene deletions (n = 340) and duplications (n = 405) as well as >13.5 Mb of sequence that has been specifically lost on the human lineage. We compared the diversity and rates of copy number and single nucleotide variation across the hominid phylogeny. We find that CNV diversity partially correlates with single nucleotide diversity (r(2) = 0.5) and recapitulates the phylogeny of apes with few exceptions. Duplications significantly outpace deletions (2.8-fold). The load of segregating duplications remains significantly higher in bonobos, Western chimpanzees, and Sumatran orangutans-populations that have experienced recent genetic bottlenecks (P = 0.0014, 0.02, and 0.0088, respectively). The rate of fixed deletion has been more clocklike with the exception of the chimpanzee lineage, where we observe a twofold increase in the chimpanzee-bonobo ancestor (P = 4.79 × 10(-9)) and increased deletion load among Western chimpanzees (P = 0.002). The latter includes the first genomic disorder in a chimpanzee with features resembling Smith-Magenis syndrome mediated by a chimpanzee-specific increase in segmental duplication complexity. We hypothesize that demographic effects, such as bottlenecks, have contributed to larger and more gene-rich segments being deleted in the chimpanzee lineage and that this effect, more generally, may account for episodic bursts in CNV during hominid evolution.


Assuntos
Variações do Número de Cópias de DNA , Evolução Molecular , Hominidae/genética , Filogenia , Animais , Sequência de Bases , Deleção de Genes , Duplicação Gênica , Carga Genética , Genoma Humano , Humanos , Dados de Sequência Molecular , Linhagem , Polimorfismo de Nucleotídeo Único , Análise de Sequência de DNA
17.
Proc Natl Acad Sci U S A ; 110(33): 13457-62, 2013 Aug 13.
Artigo em Inglês | MEDLINE | ID: mdl-23884656

RESUMO

We analyzed 83 fully sequenced great ape genomes for mobile element insertions, predicting a total of 49,452 fixed and polymorphic Alu and long interspersed element 1 (L1) insertions not present in the human reference assembly and assigning each retrotransposition event to a different time point during great ape evolution. We used these homoplasy-free markers to construct a mobile element insertions-based phylogeny of humans and great apes and demonstrate their differential power to discern ape subspecies and populations. Within this context, we find a good correlation between L1 diversity and single-nucleotide polymorphism heterozygosity (r(2) = 0.65) in contrast to Alu repeats, which show little correlation (r(2) = 0.07). We estimate that the "rate" of Alu retrotransposition has differed by a factor of 15-fold in these lineages. Humans, chimpanzees, and bonobos show the highest rates of Alu accumulation--the latter two since divergence 1.5 Mya. The L1 insertion rate, in contrast, has remained relatively constant, with rates differing by less than a factor of three. We conclude that Alu retrotransposition has been the most variable form of genetic variation during recent human-great ape evolution, with increases and decreases occurring over very short periods of evolutionary time.


Assuntos
Variação Genética , Genoma/genética , Hominidae/genética , Filogenia , Elementos Alu/genética , Animais , Análise por Conglomerados , Primers do DNA/genética , Genômica , Hominidae/classificação , Humanos , Funções Verossimilhança , Elementos Nucleotídeos Longos e Dispersos/genética , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único , Análise de Componente Principal , Especificidade da Espécie
18.
Genes Immun ; 16(1): 24-34, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25338678

RESUMO

Germline variation at immunoglobulin (IG) loci is critical for pathogen-mediated immunity, but establishing complete haplotype sequences in these regions has been problematic because of complex sequence architecture and diploid source DNA. We sequenced BAC clones from the effectively haploid human hydatidiform mole cell line, CHM1htert, across the light chain IG loci, kappa (IGK) and lambda (IGL), creating single haplotype representations of these regions. The IGL haplotype generated here is 1.25 Mb of contiguous sequence, including four novel IGLV alleles, one novel IGLC allele, and an 11.9-kb insertion. The CH17 IGK haplotype consists of two 644 kb proximal and 466 kb distal contigs separated by a large gap of unknown size; these assemblies added 49 kb of unique sequence extending into this gap. Our analysis also resulted in the characterization of seven novel IGKV alleles and a 16.7-kb region exhibiting signatures of interlocus sequence exchange between distal and proximal IGKV gene clusters. Genetic diversity in IGK/IGL was compared with that of the IG heavy chain (IGH) locus within the same haploid genome, revealing threefold (IGK) and sixfold (IGL) higher diversity in the IGH locus, potentially associated with increased levels of segmental duplication and the telomeric location of IGH.


Assuntos
Genes de Cadeia Leve de Imunoglobulina , Mola Hidatiforme/genética , Linhagem Celular Tumoral , Cromossomos Artificiais Bacterianos , Feminino , Genes de Cadeia Pesada de Imunoglobulina , Humanos , Dados de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Gravidez
19.
Genome Res ; 22(8): 1525-32, 2012 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-22585873

RESUMO

While exome sequencing is readily amenable to single-nucleotide variant discovery, the sparse and nonuniform nature of the exome capture reaction has hindered exome-based detection and characterization of genic copy number variation. We developed a novel method using singular value decomposition (SVD) normalization to discover rare genic copy number variants (CNVs) as well as genotype copy number polymorphic (CNP) loci with high sensitivity and specificity from exome sequencing data. We estimate the precision of our algorithm using 122 trios (366 exomes) and show that this method can be used to reliably predict (94% overall precision) both de novo and inherited rare CNVs involving three or more consecutive exons. We demonstrate that exome-based genotyping of CNPs strongly correlates with whole-genome data (median r(2) = 0.91), especially for loci with fewer than eight copies, and can estimate the absolute copy number of multi-allelic genes with high accuracy (78% call level). The resulting user-friendly computational pipeline, CoNIFER (copy number inference from exome reads), can reliably be used to discover disruptive genic CNVs missed by standard approaches and should have broad application in human genetic studies of disease.


Assuntos
Biologia Computacional/métodos , Variações do Número de Cópias de DNA , Exoma , Técnicas de Genotipagem/métodos , Algoritmos , Transtorno Autístico/genética , Éxons , Loci Gênicos , Genoma Humano , Projeto HapMap , Humanos , Reprodutibilidade dos Testes , Sensibilidade e Especificidade , Interface Usuário-Computador
20.
Am J Hum Genet ; 88(3): 317-32, 2011 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-21397061

RESUMO

Copy-number variants (CNVs) can reach appreciable frequencies in the human population, and recent discoveries have shown that several of these copy-number polymorphisms (CNPs) are associated with human diseases, including lupus, psoriasis, Crohn disease, and obesity. Despite new advances, significant biases remain in terms of CNP discovery and genotyping. We developed a method based on single-channel intensity data and benchmarked against copy numbers determined from sequencing read depth to successfully obtain CNP genotypes for 1495 CNPs from 487 human DNA samples of diverse ethnic backgrounds. This microarray contained CNPs in segmental duplication-rich regions and insertions of sequences not represented in the reference genome assembly or on standard SNP microarray platforms. We observe that CNPs in segmental duplications are more likely to be population differentiated than CNPs in unique regions (p = 0.015) and that biallelic CNPs show greater stratification when compared to frequency-matched SNPs (p = 0.0026). Although biallelic CNPs show a strong correlation of copy number with flanking SNP genotypes, the majority of multicopy CNPs do not (40% with r > 0.8). We selected a subset of CNPs for further characterization in 1876 additional samples from 62 populations; this revealed striking population-differentiated structural variants in genes of clinical significance such as OCLN, a tight junction protein involved in hepatitis C viral entry. Our microarray design allows these variants to be rapidly tested for disease association and our results suggest that CNPs (especially those that cannot be imputed from SNP genotypes) might have contributed disproportionately to human diversity and selection.


Assuntos
Variações do Número de Cópias de DNA/genética , Genética Populacional , Hibridização Genômica Comparativa , Loci Gênicos/genética , Genótipo , Geografia , Humanos , Desequilíbrio de Ligação/genética , Mutagênese Insercional/genética , Polimorfismo de Nucleotídeo Único/genética
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa