Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 54
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Mol Biol Evol ; 38(7): 2767-2777, 2021 06 25.
Artículo en Inglés | MEDLINE | ID: mdl-33749787

RESUMEN

Seasonal influenza viruses repeatedly infect humans in part because they rapidly change their antigenic properties and evade host immune responses, necessitating frequent updates of the vaccine composition. Accurate predictions of strains circulating in the future could therefore improve the vaccine match. Here, we studied the predictability of frequency dynamics and fixation of amino acid substitutions. Current frequency was the strongest predictor of eventual fixation, as expected in neutral evolution. Other properties, such as occurrence in previously characterized epitopes or high Local Branching Index (LBI) had little predictive power. Parallel evolution was found to be moderately predictive of fixation. Although the LBI had little power to predict frequency dynamics, it was still successful at picking strains representative of future populations. The latter is due to a tendency of the LBI to be high for consensus-like sequences that are closer to the future than the average sequence. Simulations of models of adapting populations, in contrast, show clear signals of predictability. This indicates that the evolution of influenza HA and NA, while driven by strong selection pressure to change, is poorly described by common models of directional selection such as traveling fitness waves.


Asunto(s)
Evolución Molecular , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Subtipo H1N1 del Virus de la Influenza A/genética , Subtipo H3N2 del Virus de la Influenza A/genética , Neuraminidasa/genética , Adaptación Biológica/genética , Sustitución de Aminoácidos , Subtipo H1N1 del Virus de la Influenza A/enzimología , Subtipo H3N2 del Virus de la Influenza A/enzimología , Modelos Genéticos
2.
Nature ; 536(7615): 205-9, 2016 08 11.
Artículo en Inglés | MEDLINE | ID: mdl-27487209

RESUMEN

Genetic differences that specify unique aspects of human evolution have typically been identified by comparative analyses between the genomes of humans and closely related primates, including more recently the genomes of archaic hominins. Not all regions of the genome, however, are equally amenable to such study. Recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism and is mediated by a complex set of segmental duplications, many of which arose recently during human evolution. Here we reconstruct the evolutionary history of the locus and identify bolA family member 2 (BOLA2) as a gene duplicated exclusively in Homo sapiens. We estimate that a 95-kilobase-pair segment containing BOLA2 duplicated across the critical region approximately 282 thousand years ago (ka), one of the latest among a series of genomic changes that dramatically restructured the locus during hominid evolution. All humans examined carried one or more copies of the duplication, which nearly fixed early in the human lineage--a pattern unlikely to have arisen so rapidly in the absence of selection (P < 0.0097). We show that the duplication of BOLA2 led to a novel, human-specific in-frame fusion transcript and that BOLA2 copy number correlates with both RNA expression (r = 0.36) and protein level (r = 0.65), with the greatest expression difference between human and chimpanzee in experimentally derived stem cells. Analyses of 152 patients carrying a chromosome 16p11. rearrangement show that more than 96% of breakpoints occur within the H. sapiens-specific duplication. In summary, the duplicative transposition of BOLA2 at the root of the H. sapiens lineage about 282 ka simultaneously increased copy number of a gene associated with iron homeostasis and predisposed our species to recurrent rearrangements associated with disease.


Asunto(s)
Cromosomas Humanos Par 16/genética , Variaciones en el Número de Copia de ADN/genética , Evolución Molecular , Predisposición Genética a la Enfermedad , Proteínas/genética , Animales , Trastorno Autístico/genética , Rotura Cromosómica , Duplicación de Gen , Homeostasis/genética , Humanos , Hierro/metabolismo , Pan troglodytes/genética , Pongo/genética , Proteínas/análisis , Recombinación Genética , Especificidad de la Especie , Factores de Tiempo
3.
Nature ; 517(7536): 608-11, 2015 Jan 29.
Artículo en Inglés | MEDLINE | ID: mdl-25383537

RESUMEN

The human genome is arguably the most complete mammalian reference assembly, yet more than 160 euchromatic gaps remain and aspects of its structural variation remain poorly understood ten years after its completion. To identify missing sequence and genetic variation, here we sequence and analyse a haploid human genome (CHM1) using single-molecule, real-time DNA sequencing. We close or extend 55% of the remaining interstitial gaps in the human GRCh37 reference genome--78% of which carried long runs of degenerate short tandem repeats, often several kilobases in length, embedded within (G+C)-rich genomic regions. We resolve the complete sequence of 26,079 euchromatic structural variants at the base-pair level, including inversions, complex insertions and long tracts of tandem repeats. Most have not been previously reported, with the greatest increases in sensitivity occurring for events less than 5 kilobases in size. Compared to the human reference, we find a significant insertional bias (3:1) in regions corresponding to complex insertions and long short tandem repeats. Our results suggest a greater complexity of the human genome in the form of variation of longer and more complex repetitive DNA that can now be largely resolved with the application of this longer-read sequencing technology.


Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Genómica , Análisis de Secuencia de ADN/métodos , Inversión Cromosómica/genética , Cromosomas Humanos Par 10/genética , Clonación Molecular , Secuencia Rica en GC/genética , Haploidia , Humanos , Mutagénesis Insercional/genética , Estándares de Referencia , Secuencias Repetidas en Tándem/genética
4.
Nature ; 526(7571): 75-81, 2015 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-26432246

RESUMEN

Structural variants are implicated in numerous diseases and make up the majority of varying nucleotides among human genomes. Here we describe an integrated set of eight structural variant classes comprising both balanced and unbalanced variants, which we constructed using short-read DNA sequencing data and statistically phased onto haplotype blocks in 26 human populations. Analysing this set, we identify numerous gene-intersecting structural variants exhibiting population stratification and describe naturally occurring homozygous gene knockouts that suggest the dispensability of a variety of human genes. We demonstrate that structural variants are enriched on haplotypes identified by genome-wide association studies and exhibit enrichment for expression quantitative trait loci. Additionally, we uncover appreciable levels of structural variant complexity at different scales, including genic loci subject to clusters of repeated rearrangement and complex structural variants with multiple breakpoints likely to have formed through individual mutational events. Our catalogue will enhance future studies into structural variant demography, functional impact and disease association.


Asunto(s)
Variación Genética/genética , Genoma Humano/genética , Mapeo Físico de Cromosoma , Secuencia de Aminoácidos , Predisposición Genética a la Enfermedad , Genética Médica , Genética de Población , Estudio de Asociación del Genoma Completo , Genómica , Genotipo , Haplotipos/genética , Homocigoto , Humanos , Datos de Secuencia Molecular , Tasa de Mutación , Polimorfismo de Nucleótido Simple/genética , Sitios de Carácter Cuantitativo/genética , Análisis de Secuencia de ADN , Eliminación de Secuencia/genética
5.
Proc Natl Acad Sci U S A ; 115(35): E8276-E8285, 2018 08 28.
Artículo en Inglés | MEDLINE | ID: mdl-30104379

RESUMEN

Human influenza virus rapidly accumulates mutations in its major surface protein hemagglutinin (HA). The evolutionary success of influenza virus lineages depends on how these mutations affect HA's functionality and antigenicity. Here we experimentally measure the effects on viral growth in cell culture of all single amino acid mutations to the HA from a recent human H3N2 influenza virus strain. We show that mutations that are measured to be more favorable for viral growth are enriched in evolutionarily successful H3N2 viral lineages relative to mutations that are measured to be less favorable for viral growth. Therefore, despite the well-known caveats about cell-culture measurements of viral fitness, such measurements can still be informative for understanding evolution in nature. We also compare our measurements for H3 HA to similar data previously generated for a distantly related H1 HA and find substantial differences in which amino acids are preferred at many sites. For instance, the H3 HA has less disparity in mutational tolerance between the head and stalk domains than the H1 HA. Overall, our work suggests that experimental measurements of mutational effects can be leveraged to help understand the evolutionary fates of viral lineages in nature-but only when the measurements are made on a viral strain similar to the ones being studied in nature.


Asunto(s)
Evolución Molecular , Glicoproteínas Hemaglutininas del Virus de la Influenza , Subtipo H3N2 del Virus de la Influenza A , Mutación , Animales , Perros , Glicoproteínas Hemaglutininas del Virus de la Influenza/química , Glicoproteínas Hemaglutininas del Virus de la Influenza/genética , Humanos , Subtipo H3N2 del Virus de la Influenza A/química , Subtipo H3N2 del Virus de la Influenza A/genética , Células de Riñón Canino Madin Darby , Mutagénesis , Dominios Proteicos
6.
Proc Natl Acad Sci U S A ; 115(19): E4433-E4442, 2018 05 08.
Artículo en Inglés | MEDLINE | ID: mdl-29686068

RESUMEN

Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ∼360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ∼25-35 Mya and CFHR1 and CFHR3 ∼7-13 Mya). Remarkably, all evolutionary breakpoints share a common ∼4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH [P = 5.81 × 10-8, odds ratio (OR) = 9.8 (3.67-Infinity)]. A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10-3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.


Asunto(s)
Evolución Molecular , Degeneración Macular/genética , Degeneración Macular/patología , Mutación , Polimorfismo de Nucleótido Simple , Selección Genética , Animales , Factor H de Complemento/genética , Exones , Predisposición Genética a la Enfermedad , Genotipo , Haplotipos , Humanos , Familia de Multigenes , Fenotipo , Primates , Factores de Riesgo
7.
Genome Res ; 27(5): 677-685, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-27895111

RESUMEN

In an effort to more fully understand the full spectrum of human genetic variation, we generated deep single-molecule, real-time (SMRT) sequencing data from two haploid human genomes. By using an assembly-based approach (SMRT-SV), we systematically assessed each genome independently for structural variants (SVs) and indels resolving the sequence structure of 461,553 genetic variants from 2 bp to 28 kbp in length. We find that >89% of these variants have been missed as part of analysis of the 1000 Genomes Project even after adjusting for more common variants (MAF > 1%). We estimate that this theoretical human diploid differs by as much as ∼16 Mbp with respect to the human reference, with long-read sequencing data providing a fivefold increase in sensitivity for genetic variants ranging in size from 7 bp to 1 kbp compared with short-read sequence data. Although a large fraction of genetic variants were not detected by short-read approaches, once the alternate allele is sequence-resolved, we show that 61% of SVs can be genotyped in short-read sequence data sets with high accuracy. Uncoupling discovery from genotyping thus allows for the majority of this missed common variation to be genotyped in the human population. Interestingly, when we repeat SV detection on a pseudodiploid genome constructed in silico by merging the two haploids, we find that ∼59% of the heterozygous SVs are no longer detected by SMRT-SV. These results indicate that haploid resolution of long-read sequencing data will significantly increase sensitivity of SV detection.


Asunto(s)
Mapeo Contig/métodos , Genoma Humano , Variación Estructural del Genoma , Haploidia , Análisis de Secuencia de ADN/métodos , Mapeo Contig/normas , Proyecto Genoma Humano , Humanos , Análisis de Secuencia de ADN/normas
8.
Nature ; 513(7517): 195-201, 2014 Sep 11.
Artículo en Inglés | MEDLINE | ID: mdl-25209798

RESUMEN

Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ∼5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.


Asunto(s)
Genoma/genética , Hylobates/clasificación , Hylobates/genética , Cariotipo , Filogenia , Animales , Evolución Molecular , Hominidae/clasificación , Hominidae/genética , Humanos , Datos de Secuencia Molecular , Retroelementos/genética , Selección Genética , Terminación de la Transcripción Genética
9.
Mol Biol Evol ; 35(4): 837-854, 2018 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-29272536

RESUMEN

Variation in regulatory DNA is thought to drive phenotypic variation, evolution, and disease. Prior studies of regulatory DNA and transcription factors across animal species highlighted a fundamental conundrum: Transcription factor binding domains and cognate binding sites are conserved, while regulatory DNA sequences are not. It remains unclear how conserved transcription factors and dynamic regulatory sites produce conserved expression patterns across species. Here, we explore regulatory DNA variation and its functional consequences within Arabidopsis thaliana, using chromatin accessibility to delineate regulatory DNA genome-wide. Unlike in previous cross-species comparisons, the positional homology of regulatory DNA is maintained among A. thaliana ecotypes and less nucleotide divergence has occurred. Of the ∼50,000 regulatory sites in A. thaliana, we found that 15% varied in accessibility among ecotypes. Some of these accessibility differences were associated with extensive, previously unannotated sequence variation, encompassing many deletions and ancient hypervariable alleles. Unexpectedly, for the majority of such regulatory sites, nearby gene expression was unaffected. Nevertheless, regulatory sites with high levels of sequence variation and differential chromatin accessibility were the most likely to be associated with differential gene expression. Finally, and most surprising, we found that the vast majority of differentially accessible sites show no underlying sequence variation. We argue that these surprising results highlight the necessity to consider higher-order regulatory context in evaluating regulatory variation and predicting its phenotypic consequences.


Asunto(s)
Arabidopsis/genética , Ecotipo , Elementos Reguladores de la Transcripción , Arabidopsis/metabolismo , Secuencia de Bases , Desoxirribonucleasa I , Variación Estructural del Genoma , Análisis de Secuencia de ADN
10.
Am J Hum Genet ; 98(1): 58-74, 2016 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-26749308

RESUMEN

We performed whole-genome sequencing (WGS) of 208 genomes from 53 families affected by simplex autism. For the majority of these families, no copy-number variant (CNV) or candidate de novo gene-disruptive single-nucleotide variant (SNV) had been detected by microarray or whole-exome sequencing (WES). We integrated multiple CNV and SNV analyses and extensive experimental validation to identify additional candidate mutations in eight families. We report that compared to control individuals, probands showed a significant (p = 0.03) enrichment of de novo and private disruptive mutations within fetal CNS DNase I hypersensitive sites (i.e., putative regulatory regions). This effect was only observed within 50 kb of genes that have been previously associated with autism risk, including genes where dosage sensitivity has already been established by recurrent disruptive de novo protein-coding mutations (ARID1B, SCN2A, NR3C2, PRKCA, and DSCAM). In addition, we provide evidence of gene-disruptive CNVs (in DISC1, WNT7A, RBFOX1, and MBD5), as well as smaller de novo CNVs and exon-specific SNVs missed by exome sequencing in neurodevelopmental genes (e.g., CANX, SAE1, and PIK3CA). Our results suggest that the detection of smaller, often multiple CNVs affecting putative regulatory elements might help explain additional risk of simplex autism.


Asunto(s)
Trastorno Autístico/genética , ADN/genética , Genoma Humano , Exoma , Femenino , Humanos , Masculino , Linaje , Polimorfismo de Nucleótido Simple
11.
Genome Res ; 26(11): 1453-1467, 2016 11.
Artículo en Inglés | MEDLINE | ID: mdl-27803192

RESUMEN

Recurrent rearrangements of Chromosome 8p23.1 are associated with congenital heart defects and developmental delay. The complexity of this region has led to inconsistencies in the current reference assembly, confounding studies of genetic variation. Using comparative sequence-based approaches, we generated a high-quality 6.3-Mbp alternate reference assembly of an inverted Chromosome 8p23.1 haplotype. Comparison with nonhuman primates reveals a 746-kbp duplicative transposition and two separate inversion events that arose in the last million years of human evolution. The breakpoints associated with these rearrangements map to an ape-specific interchromosomal core duplicon that clusters at sites of evolutionary inversion (P = 7.8 × 10-5). Refinement of microdeletion breakpoints identifies a subgroup of patients that map to the same interchromosomal core involved in the evolutionary formation of the duplication blocks. Our results define a higher-order genomic instability element that has shaped the structure of specific chromosomes during primate evolution contributing to rearrangements associated with inversion and disease.


Asunto(s)
Evolución Molecular , Predisposición Genética a la Enfermedad , Inestabilidad Genómica , Duplicaciones Segmentarias en el Genoma , Animales , Puntos de Rotura del Cromosoma , Deleción Cromosómica , Cromosomas Humanos Par 8/genética , Humanos , Primates/genética
12.
Bioinformatics ; 34(23): 4121-4123, 2018 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-29790939

RESUMEN

Summary: Understanding the spread and evolution of pathogens is important for effective public health measures and surveillance. Nextstrain consists of a database of viral genomes, a bioinformatics pipeline for phylodynamics analysis, and an interactive visualization platform. Together these present a real-time view into the evolution and spread of a range of viral pathogens of high public health importance. The visualization integrates sequence data with other data types such as geographic information, serology, or host species. Nextstrain compiles our current understanding into a single accessible location, open to health professionals, epidemiologists, virologists and the public alike. Availability and implementation: All code (predominantly JavaScript and Python) is freely available from github.com/nextstrain and the web-application is available at nextstrain.org.


Asunto(s)
Biología Computacional , Evolución Molecular , Genoma Viral , Programas Informáticos , Virus/patogenicidad , Bases de Datos Genéticas
13.
Nucleic Acids Res ; 45(D1): D804-D811, 2017 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-27907889

RESUMEN

Whole-exome and whole-genome sequencing have facilitated the large-scale discovery of de novo variants in human disease. To date, most de novo discovery through next-generation sequencing focused on congenital heart disease and neurodevelopmental disorders (NDDs). Currently, de novo variants are one of the most significant risk factors for NDDs with a substantial overlap of genes involved in more than one NDD. To facilitate better usage of published data, provide standardization of annotation, and improve accessibility, we created denovo-db (http://denovo-db.gs.washington.edu), a database for human de novo variants. As of July 2016, denovo-db contained 40 different studies and 32,991 de novo variants from 23,098 trios. Database features include basic variant information (chromosome location, change, type); detailed annotation at the transcript and protein levels; severity scores; frequency; validation status; and, most importantly, the phenotype of the individual with the variant. We included a feature on our browsable website to download any query result, including a downloadable file of the full database with additional variant details. denovo-db provides necessary information for researchers to compare their data to other individuals with the same phenotype and also to controls allowing for a better understanding of the biology of de novo variants and their contribution to disease.


Asunto(s)
Biología Computacional/métodos , Bases de Datos de Ácidos Nucleicos , Variación Genética , Mutación de Línea Germinal , Polimorfismo de Nucleótido Simple , Estudios de Asociación Genética , Humanos , Anotación de Secuencia Molecular , Navegador Web
14.
Proc Natl Acad Sci U S A ; 112(52): E7223-9, 2015 Dec 29.
Artículo en Inglés | MEDLINE | ID: mdl-26668394

RESUMEN

NK-lysin is an antimicrobial peptide and effector protein in the host innate immune system. It is coded by a single gene in humans and most other mammalian species. In this study, we provide evidence for the existence of four NK-lysin genes in a repetitive region on cattle chromosome 11. The NK2A, NK2B, and NK2C genes are tandemly arrayed as three copies in ∼30-35-kb segments, located 41.8 kb upstream of NK1. All four genes are functional, albeit with differential tissue expression. NK1, NK2A, and NK2B exhibited the highest expression in intestine Peyer's patch, whereas NK2C was expressed almost exclusively in lung. The four peptide products were synthesized ex vivo, and their antimicrobial effects against both Gram-positive and Gram-negative bacteria were confirmed with a bacteria-killing assay. Transmission electron microcopy indicated that bovine NK-lysins exhibited their antimicrobial activities by lytic action in the cell membranes. In summary, the single NK-lysin gene in other mammals has expanded to a four-member gene family by tandem duplications in cattle; all four genes are transcribed, and the synthetic peptides corresponding to the core regions are biologically active and likely contribute to innate immunity in ruminants.


Asunto(s)
Bovinos/genética , Dosificación de Gen , Familia de Multigenes , Proteolípidos/genética , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Cromosomas de los Mamíferos/genética , Escherichia coli/efectos de los fármacos , Escherichia coli/crecimiento & desarrollo , Escherichia coli/ultraestructura , Perfilación de la Expresión Génica , Orden Génico , Microscopía Electrónica de Transmisión , Datos de Secuencia Molecular , Especificidad de Órganos/genética , Péptidos/farmacología , Filogenia , Proteolípidos/clasificación , Proteolípidos/farmacología , Homología de Secuencia de Aminoácido , Homología de Secuencia de Ácido Nucleico
15.
Genome Res ; 24(12): 2066-76, 2014 12.
Artículo en Inglés | MEDLINE | ID: mdl-25373144

RESUMEN

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.


Asunto(s)
Genoma Humano , Haplotipos , Mola Hidatiforme/genética , Alelos , Mapeo Cromosómico , Cromosomas Artificiales Bacterianos , Biología Computacional/métodos , Femenino , Genómica/métodos , Heterocigoto , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Polimorfismo de Nucleótido Simple , Embarazo , Secuencias Repetitivas de Ácidos Nucleicos , Duplicaciones Segmentarias en el Genoma , Análisis de Secuencia de ADN
16.
Genome Res ; 24(4): 688-96, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24418700

RESUMEN

Obtaining high-quality sequence continuity of complex regions of recent segmental duplication remains one of the major challenges of finishing genome assemblies. In the human and mouse genomes, this was achieved by targeting large-insert clones using costly and laborious capillary-based sequencing approaches. Sanger shotgun sequencing of clone inserts, however, has now been largely abandoned, leaving most of these regions unresolved in newer genome assemblies generated primarily by next-generation sequencing hybrid approaches. Here we show that it is possible to resolve regions that are complex in a genome-wide context but simple in isolation for a fraction of the time and cost of traditional methods using long-read single molecule, real-time (SMRT) sequencing and assembly technology from Pacific Biosciences (PacBio). We sequenced and assembled BAC clones corresponding to a 1.3-Mbp complex region of chromosome 17q21.31, demonstrating 99.994% identity to Sanger assemblies of the same clones. We targeted 44 differences using Illumina sequencing and find that PacBio and Sanger assemblies share a comparable number of validated variants, albeit with different sequence context biases. Finally, we targeted a poorly assembled 766-kbp duplicated region of the chimpanzee genome and resolved the structure and organization for a fraction of the cost and time of traditional finishing approaches. Our data suggest a straightforward path for upgrading genomes to a higher quality finished state.


Asunto(s)
Cromosomas Humanos Par 17/genética , Genoma Bacteriano/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Animales , Cromosomas Artificiales Bacterianos/genética , Humanos , Ratones , Datos de Secuencia Molecular , Pan troglodytes/genética
17.
Am J Hum Genet ; 92(4): 530-46, 2013 Apr 04.
Artículo en Inglés | MEDLINE | ID: mdl-23541343

RESUMEN

The immunoglobulin heavy-chain locus (IGH) encodes variable (IGHV), diversity (IGHD), joining (IGHJ), and constant (IGHC) genes and is responsible for antibody heavy-chain biosynthesis, which is vital to the adaptive immune response. Programmed V-(D)-J somatic rearrangement and the complex duplicated nature of the locus have impeded attempts to reconcile its genomic organization based on traditional B-lymphocyte derived genetic material. As a result, sequence descriptions of germline variation within IGHV are lacking, haplotype inference using traditional linkage disequilibrium methods has been difficult, and the human genome reference assembly is missing several expressed IGHV genes. By using a hydatidiform mole BAC clone resource, we present the most complete haplotype of IGHV, IGHD, and IGHJ gene regions derived from a single chromosome, representing an alternate assembly of ∼1 Mbp of high-quality finished sequence. From this we add 101 kbp of previously uncharacterized sequence, including functional IGHV genes, and characterize four large germline copy-number variants (CNVs). In addition to this germline reference, we identify and characterize eight CNV-containing haplotypes from a panel of nine diploid genomes of diverse ethnic origin, discovering previously unmapped IGHV genes and an additional 121 kbp of insertion sequence. We genotype four of these CNVs by using PCR in 425 individuals from nine human populations. We find that all four are highly polymorphic and show considerable evidence of stratification (Fst = 0.3-0.5), with the greatest differences observed between African and Asian populations. These CNVs exhibit weak linkage disequilibrium with SNPs from two commercial arrays in most of the populations tested.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Fusión Génica/genética , Genes de las Cadenas Pesadas de las Inmunoglobulinas , Haplotipos/genética , Mola Hidatiforme/genética , Cadenas Pesadas de Inmunoglobulina/genética , Región Variable de Inmunoglobulina/genética , Alelos , Cromosomas Artificiales Bacterianos , Femenino , Genética de Población , Genotipo , Humanos , Datos de Secuencia Molecular , Embarazo , Análisis de Secuencia de ADN , Recombinación V(D)J
18.
Genome Res ; 23(11): 1763-73, 2013 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-24077392

RESUMEN

Ape chromosomes homologous to human chromosomes 14 and 15 were generated by a fission event of an ancestral submetacentric chromosome, where the two chromosomes were joined head-to-tail. The hominoid ancestral chromosome most closely resembles the macaque chromosome 7. In this work, we provide insights into the evolution of human chromosomes 14 and 15, performing a comparative study between macaque boundary region 14/15 and the orthologous human regions. We construct a 1.6-Mb contig of macaque BAC clones in the region orthologous to the ancestral hominoid fission site and use it to define the structural changes that occurred on human 14q pericentromeric and 15q subtelomeric regions. We characterize the novel euchromatin-heterochromatin transition region (∼20 Mb) acquired during the neocentromere establishment on chromosome 14, and find it was mainly derived through pericentromeric duplications from ancestral hominoid chromosomes homologous to human 2q14-qter and 10. Further, we show a relationship between evolutionary hotspots and low-copy repeat loci for chromosome 15, revealing a possible role of segmental duplications not only in mediating but also in "stitching" together rearrangement breakpoints.


Asunto(s)
Cromosomas Humanos Par 14/genética , Cromosomas Humanos Par 15/genética , Cromosomas de los Mamíferos/genética , Evolución Molecular , Hominidae/genética , Duplicaciones Segmentarias en el Genoma , Animales , Puntos de Rotura del Cromosoma , Duplicación Cromosómica , Cromosomas Artificiales Bacterianos , Clonación Molecular , Eucromatina/genética , Heterocromatina/genética , Humanos , Datos de Secuencia Molecular , Filogenia
19.
Genome Res ; 23(9): 1373-82, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23825009

RESUMEN

Copy number variation (CNV) contributes to disease and has restructured the genomes of great apes. The diversity and rate of this process, however, have not been extensively explored among great ape lineages. We analyzed 97 deeply sequenced great ape and human genomes and estimate 16% (469 Mb) of the hominid genome has been affected by recent CNV. We identify a comprehensive set of fixed gene deletions (n = 340) and duplications (n = 405) as well as >13.5 Mb of sequence that has been specifically lost on the human lineage. We compared the diversity and rates of copy number and single nucleotide variation across the hominid phylogeny. We find that CNV diversity partially correlates with single nucleotide diversity (r(2) = 0.5) and recapitulates the phylogeny of apes with few exceptions. Duplications significantly outpace deletions (2.8-fold). The load of segregating duplications remains significantly higher in bonobos, Western chimpanzees, and Sumatran orangutans-populations that have experienced recent genetic bottlenecks (P = 0.0014, 0.02, and 0.0088, respectively). The rate of fixed deletion has been more clocklike with the exception of the chimpanzee lineage, where we observe a twofold increase in the chimpanzee-bonobo ancestor (P = 4.79 × 10(-9)) and increased deletion load among Western chimpanzees (P = 0.002). The latter includes the first genomic disorder in a chimpanzee with features resembling Smith-Magenis syndrome mediated by a chimpanzee-specific increase in segmental duplication complexity. We hypothesize that demographic effects, such as bottlenecks, have contributed to larger and more gene-rich segments being deleted in the chimpanzee lineage and that this effect, more generally, may account for episodic bursts in CNV during hominid evolution.


Asunto(s)
Variaciones en el Número de Copia de ADN , Evolución Molecular , Hominidae/genética , Filogenia , Animales , Secuencia de Bases , Eliminación de Gen , Duplicación de Gen , Carga Genética , Genoma Humano , Humanos , Datos de Secuencia Molecular , Linaje , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA