Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Mol Biol Evol ; 2021 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-34464971

RESUMO

Human centromeres are mainly composed of alpha satellite DNA hierarchically organized as higher-order repeats (HORs). Alpha satellite dynamics is shown by sequence homogenization in centromeric arrays and by its transfer to other centromeric locations, for example during the maturation of new centromeres. We identified during prenatal aneuploidy diagnosis by FISH a de novo insertion of alpha satellite DNA from the centromere of chromosome 18 (D18Z1) into cytoband 15q26. Although bound by CENP-B, this locus did not acquire centromeric functionality as demonstrated by lack of constriction and absence of CENP-A binding. The insertion was associated with a 2.8 kbp deletion and likely occurred in the paternal germline. The site was enriched in long terminal repeats (LTRs) and located ∼10 Mbp from the location where a centromere was ancestrally seeded and became inactive in the common ancestor of humans and apes 20-25 million years ago. Long-read mapping to the T2T-CHM13 human genome assembly revealed that the insertion derives from a specific region of chromosome 18 centromeric 12-mer HOR array in which the monomer size follows a regular pattern. The rearrangement did not directly disrupt any gene or predicted regulatory element and did not alter the methylation status of the surrounding region, consistent with the absence of phenotypic consequences in the carrier. This case demonstrates a likely rare but new class of structural variation that we name 'alpha satellite insertion'. It also expands our knowledge on alphoid DNA dynamics and conveys the possibility that alphoid arrays can relocate near vestigial centromeric sites.

2.
Nat Commun ; 12(1): 5118, 2021 08 25.
Artigo em Inglês | MEDLINE | ID: mdl-34433829

RESUMO

TRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.


Assuntos
Duplicação Gênica , Hominidae/genética , Proteínas de Membrana/genética , Seleção Genética , Animais , Variações do Número de Cópias de DNA , Evolução Molecular , Genoma Humano , Haplótipos , Humanos , Homem de Neandertal , Filogenia
3.
Am J Hum Genet ; 108(8): 1436-1449, 2021 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-34216551

RESUMO

Despite widespread clinical genetic testing, many individuals with suspected genetic conditions lack a precise diagnosis, limiting their opportunity to take advantage of state-of-the-art treatments. In some cases, testing reveals difficult-to-evaluate structural differences, candidate variants that do not fully explain the phenotype, single pathogenic variants in recessive disorders, or no variants in genes of interest. Thus, there is a need for better tools to identify a precise genetic diagnosis in individuals when conventional testing approaches have been exhausted. We performed targeted long-read sequencing (T-LRS) using adaptive sampling on the Oxford Nanopore platform on 40 individuals, 10 of whom lacked a complete molecular diagnosis. We computationally targeted up to 151 Mbp of sequence per individual and searched for pathogenic substitutions, structural variants, and methylation differences using a single data source. We detected all genomic aberrations-including single-nucleotide variants, copy number changes, repeat expansions, and methylation differences-identified by prior clinical testing. In 8/8 individuals with complex structural rearrangements, T-LRS enabled more precise resolution of the mutation, leading to changes in clinical management in one case. In ten individuals with suspected Mendelian conditions lacking a precise genetic diagnosis, T-LRS identified pathogenic or likely pathogenic variants in six and variants of uncertain significance in two others. T-LRS accurately identifies pathogenic structural variants, resolves complex rearrangements, and identifies Mendelian variants not detected by other technologies. T-LRS represents an efficient and cost-effective strategy to evaluate high-priority genes and regions or complex clinical testing results.


Assuntos
Aberrações Cromossômicas , Análise Citogenética/métodos , Doenças Genéticas Inatas/diagnóstico , Doenças Genéticas Inatas/genética , Predisposição Genética para Doença , Genoma Humano , Mutação , Variações do Número de Cópias de DNA , Feminino , Testes Genéticos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Cariotipagem , Masculino , Análise de Sequência de DNA
4.
Nature ; 594(7861): 77-81, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-33953399

RESUMO

The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.


Assuntos
Evolução Molecular , Genoma/genética , Genômica , Pan paniscus/genética , Filogenia , Animais , Fator de Iniciação 4A em Eucariotos/genética , Feminino , Genes , Gorilla gorilla/genética , Anotação de Sequência Molecular/normas , Pan troglodytes/genética , Pongo/genética , Duplicações Segmentares Genômicas , Análise de Sequência de DNA
5.
Nature ; 593(7857): 101-107, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33828295

RESUMO

The complete assembly of each human chromosome is essential for understanding human biology and evolution1,2. Here we use complementary long-read sequencing technologies to complete the linear assembly of human chromosome 8. Our assembly resolves the sequence of five previously long-standing gaps, including a 2.08-Mb centromeric α-satellite array, a 644-kb copy number polymorphism in the ß-defensin gene cluster that is important for disease risk, and an 863-kb variable number tandem repeat at chromosome 8q21.2 that can function as a neocentromere. We show that the centromeric α-satellite array is generally methylated except for a 73-kb hypomethylated region of diverse higher-order α-satellites enriched with CENP-A nucleosomes, consistent with the location of the kinetochore. In addition, we confirm the overall organization and methylation pattern of the centromere in a diploid human genome. Using a dual long-read sequencing approach, we complete high-quality draft assemblies of the orthologous centromere from chromosome 8 in chimpanzee, orangutan and macaque to reconstruct its evolutionary history. Comparative and phylogenetic analyses show that the higher-order α-satellite structure evolved in the great ape ancestor with a layered symmetry, in which more ancient higher-order repeats locate peripherally to monomeric α-satellites. We estimate that the mutation rate of centromeric satellite DNA is accelerated by more than 2.2-fold compared to the unique portions of the genome, and this acceleration extends into the flanking sequence.


Assuntos
Cromossomos Humanos Par 8/química , Cromossomos Humanos Par 8/genética , Evolução Molecular , Animais , Linhagem Celular , Centrômero/química , Centrômero/genética , Centrômero/metabolismo , Cromossomos Humanos Par 8/fisiologia , Metilação de DNA , DNA Satélite/genética , Epigênese Genética , Feminino , Humanos , Macaca mulatta/genética , Masculino , Repetições Minissatélites/genética , Pan troglodytes/genética , Filogenia , Pongo abelii/genética , Telômero/química , Telômero/genética , Telômero/metabolismo
6.
Nat Commun ; 12(1): 1935, 2021 04 28.
Artigo em Inglês | MEDLINE | ID: mdl-33911078

RESUMO

Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.


Assuntos
Mapeamento de Sequências Contíguas/métodos , Genoma Humano/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Algoritmos , Animais , Bovinos , Haplótipos/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Peixe-Zebra/genética
7.
Science ; 372(6537)2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33632895

RESUMO

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.


Assuntos
Variação Genética , Genoma Humano , Haplótipos , Feminino , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Mutação INDEL , Sequências Repetitivas Dispersas , Masculino , Grupos Populacionais/genética , Locos de Características Quantitativas , Retroelementos , Análise de Sequência de DNA , Inversão de Sequência , Sequenciamento Completo do Genoma
8.
Nat Biotechnol ; 39(3): 302-308, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33288906

RESUMO

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.


Assuntos
Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Pais , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Algoritmos , Haplótipos , Humanos , Porto Rico/etnologia
9.
Science ; 370(6523)2020 12 18.
Artigo em Inglês | MEDLINE | ID: mdl-33335035

RESUMO

The rhesus macaque (Macaca mulatta) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.


Assuntos
Predisposição Genética para Doença , Genoma , Macaca mulatta/genética , Polimorfismo de Nucleotídeo Único , Animais , Variação Genética , Humanos , Anotação de Sequência Molecular , Sequenciamento Completo do Genoma
10.
J Med Genet ; 2020 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-33060287

RESUMO

Current clinical approaches for mutation discovery are based on short sequence reads (100-300 bp) of exons and flanking splice sites targeted by multigene panels or whole exomes. Short-read sequencing is highly accurate for detection of single nucleotide variants, small indels and simple copy number differences but is of limited use for identifying complex insertions and deletions and other structural rearrangements. We used CRISPR-Cas9 to excise complete BRCA1 and BRCA2 genomic regions from lymphoblast cells of patients with breast cancer, then sequenced these regions with long reads (>10 000 bp) to fully characterise all non-coding regions for structural variation. In a family severely affected with early-onset bilateral breast cancer and with negative (normal) results by gene panel and exome sequencing, we identified an intronic SINE-VNTR-Alu retrotransposon insertion that led to the creation of a pseudoexon in the BRCA1 message and introduced a premature truncation. This combination of CRISPR-Cas9 excision and long-read sequencing reveals a class of complex, damaging and otherwise cryptic mutations that may be particularly frequent in tumour suppressor genes replete with intronic repeats.

11.
Genome Biol ; 21(1): 202, 2020 08 10.
Artigo em Inglês | MEDLINE | ID: mdl-32778141

RESUMO

BACKGROUND: The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS: Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS: LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.

12.
Nat Genet ; 52(2): 146-159, 2020 02.
Artigo em Inglês | MEDLINE | ID: mdl-32060489

RESUMO

In many repeat diseases, such as Huntington's disease (HD), ongoing repeat expansions in affected tissues contribute to disease onset, progression and severity. Inducing contractions of expanded repeats by exogenous agents is not yet possible. Traditional approaches would target proteins driving repeat mutations. Here we report a compound, naphthyridine-azaquinolone (NA), that specifically binds slipped-CAG DNA intermediates of expansion mutations, a previously unsuspected target. NA efficiently induces repeat contractions in HD patient cells as well as en masse contractions in medium spiny neurons of HD mouse striatum. Contractions are specific for the expanded allele, independently of DNA replication, require transcription across the coding CTG strand and arise by blocking repair of CAG slip-outs. NA-induced contractions depend on active expansions driven by MutSß. NA injections in HD mouse striatum reduce mutant HTT protein aggregates, a biomarker of HD pathogenesis and severity. Repeat-structure-specific DNA ligands are a novel avenue to contract expanded repeats.


Assuntos
Proteína Huntingtina/genética , Doença de Huntington/genética , Naftiridinas/farmacologia , Quinolonas/farmacologia , Expansão das Repetições de Trinucleotídeos/efeitos dos fármacos , Animais , Corpo Estriado/efeitos dos fármacos , DNA/metabolismo , Reparo de Erro de Pareamento de DNA/efeitos dos fármacos , Replicação do DNA/efeitos dos fármacos , Modelos Animais de Doenças , Humanos , Proteína Huntingtina/metabolismo , Doença de Huntington/tratamento farmacológico , Doença de Huntington/patologia , Masculino , Camundongos , Camundongos Transgênicos , Instabilidade de Microssatélites , Mutação , Ribonucleases/metabolismo , Proteína de Ligação a TATA-Box/genética , Transcrição Genética
13.
Ann Hum Genet ; 84(2): 125-140, 2020 03.
Artigo em Inglês | MEDLINE | ID: mdl-31711268

RESUMO

The sequence and assembly of human genomes using long-read sequencing technologies has revolutionized our understanding of structural variation and genome organization. We compared the accuracy, continuity, and gene annotation of genome assemblies generated from either high-fidelity (HiFi) or continuous long-read (CLR) datasets from the same complete hydatidiform mole human genome. We find that the HiFi sequence data assemble an additional 10% of duplicated regions and more accurately represent the structure of tandem repeats, as validated with orthogonal analyses. As a result, an additional 5 Mbp of pericentromeric sequences are recovered in the HiFi assembly, resulting in a 2.5-fold increase in the NG50 within 1 Mbp of the centromere (HiFi 480.6 kbp, CLR 191.5 kbp). Additionally, the HiFi genome assembly was generated in significantly less time with fewer computational resources than the CLR assembly. Although the HiFi assembly has significantly improved continuity and accuracy in many complex regions of the genome, it still falls short of the assembly of centromeric DNA and the largest regions of segmental duplication using existing assemblers. Despite these shortcomings, our results suggest that HiFi may be the most effective standalone technology for de novo assembly of human genomes.


Assuntos
Biomarcadores/análise , Variação Genética , Genoma Humano , Haploidia , Mola Hidatiforme/genética , Análise de Sequência de DNA/métodos , Análise de Célula Única/métodos , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Anotação de Sequência Molecular , Gravidez
14.
Am J Hum Genet ; 105(5): 947-958, 2019 11 07.
Artigo em Inglês | MEDLINE | ID: mdl-31668704

RESUMO

Human-specific duplications at chromosome 16p11.2 mediate recurrent pathogenic 600 kbp BP4-BP5 copy-number variations, which are among the most common genetic causes of autism. These copy-number polymorphic duplications are under positive selection and include three to eight copies of BOLA2, a gene involved in the maturation of cytosolic iron-sulfur proteins. To investigate the potential advantage provided by the rapid expansion of BOLA2, we assessed hematological traits and anemia prevalence in 379,385 controls and individuals who have lost or gained copies of BOLA2: 89 chromosome 16p11.2 BP4-BP5 deletion carriers and 56 reciprocal duplication carriers in the UK Biobank. We found that the 16p11.2 deletion is associated with anemia (18/89 carriers, 20%, p = 4e-7, OR = 5), particularly iron-deficiency anemia. We observed similar enrichments in two clinical 16p11.2 deletion cohorts, which included 6/63 (10%) and 7/20 (35%) unrelated individuals with anemia, microcytosis, low serum iron, or low blood hemoglobin. Upon stratification by BOLA2 copy number, our data showed an association between low BOLA2 dosage and the above phenotypes (8/15 individuals with three copies, 53%, p = 1e-4). In parallel, we analyzed hematological traits in mice carrying the 16p11.2 orthologous deletion or duplication, as well as Bola2+/- and Bola2-/- animals. The Bola2-deficient mice and the mice carrying the deletion showed early evidence of iron deficiency, including a mild decrease in hemoglobin, lower plasma iron, microcytosis, and an increased red blood cell zinc-protoporphyrin-to-heme ratio. Our results indicate that BOLA2 participates in iron homeostasis in vivo, and its expansion has a potential adaptive role in protecting against iron deficiency.


Assuntos
Anemia/genética , Transtorno Autístico/genética , Duplicação Cromossômica/genética , Cromossomos Humanos Par 16/genética , Homeostase/genética , Proteínas/genética , Animais , Deleção Cromossômica , Transtornos Cromossômicos/genética , Variações do Número de Cópias de DNA/genética , Feminino , Genótipo , Heterozigoto , Humanos , Ferro , Masculino , Fenótipo
15.
Science ; 366(6463)2019 10 18.
Artigo em Inglês | MEDLINE | ID: mdl-31624180

RESUMO

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.


Assuntos
Introgressão Genética , Animais , Duplicação Cromossômica , Cromossomos Humanos Par 16/genética , Cromossomos Humanos Par 8/genética , Variações do Número de Cópias de DNA , Evolução Molecular , Genoma Humano , Haplótipos , Hominidae/genética , Humanos , Melanesia , Modelos Genéticos , Homem de Neandertal/genética , Polimorfismo Genético , Seleção Genética , Sequenciamento Completo do Genoma
16.
Genome Res ; 28(10): 1566-1576, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30228200

RESUMO

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.


Assuntos
Encéfalo/metabolismo , Duplicações Segmentares Genômicas , Análise de Sequência de DNA/métodos , Análise de Sequência de RNA/métodos , Evolução Molecular , Duplicação Gênica , Perfilação da Expressão Gênica , Humanos , Anotação de Sequência Molecular , Família Multigênica , Fases de Leitura Aberta , Pseudogenes
17.
Science ; 360(6393)2018 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-29880660

RESUMO

Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.


Assuntos
Evolução Molecular , Genoma Humano , Hominidae/genética , Animais , Mapeamento de Sequências Contíguas , Variação Genética , Humanos , Anotação de Sequência Molecular , Análise de Sequência de DNA
18.
Proc Natl Acad Sci U S A ; 115(19): E4433-E4442, 2018 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-29686068

RESUMO

Structural variation and single-nucleotide variation of the complement factor H (CFH) gene family underlie several complex genetic diseases, including age-related macular degeneration (AMD) and atypical hemolytic uremic syndrome (AHUS). To understand its diversity and evolution, we performed high-quality sequencing of this ∼360-kbp locus in six primate lineages, including multiple human haplotypes. Comparative sequence analyses reveal two distinct periods of gene duplication leading to the emergence of four CFH-related (CFHR) gene paralogs (CFHR2 and CFHR4 ∼25-35 Mya and CFHR1 and CFHR3 ∼7-13 Mya). Remarkably, all evolutionary breakpoints share a common ∼4.8-kbp segment corresponding to an ancestral CFHR gene promoter that has expanded independently throughout primate evolution. This segment is recurrently reused and juxtaposed with a donor duplication containing exons 8 and 9 from ancestral CFH, creating four CFHR fusion genes that include lineage-specific members of the gene family. Combined analysis of >5,000 AMD cases and controls identifies a significant burden of a rare missense mutation that clusters at the N terminus of CFH [P = 5.81 × 10-8, odds ratio (OR) = 9.8 (3.67-Infinity)]. A bipolar clustering pattern of rare nonsynonymous mutations in patients with AMD (P < 10-3) and AHUS (P = 0.0079) maps to functional domains that show evidence of positive selection during primate evolution. Our structural variation analysis in >2,400 individuals reveals five recurrent rearrangement breakpoints that show variable frequency among AMD cases and controls. These data suggest a dynamic and recurrent pattern of mutation critical to the emergence of new CFHR genes but also in the predisposition to complex human genetic disease phenotypes.


Assuntos
Evolução Molecular , Degeneração Macular/genética , Degeneração Macular/patologia , Mutação , Polimorfismo de Nucleotídeo Único , Seleção Genética , Animais , Fator H do Complemento/genética , Éxons , Predisposição Genética para Doença , Genótipo , Haplótipos , Humanos , Família Multigênica , Fenótipo , Primatas , Fatores de Risco
19.
Cell ; 172(5): 897-909.e21, 2018 02 22.
Artigo em Inglês | MEDLINE | ID: mdl-29474918

RESUMO

X-linked Dystonia-Parkinsonism (XDP) is a Mendelian neurodegenerative disease that is endemic to the Philippines and is associated with a founder haplotype. We integrated multiple genome and transcriptome assembly technologies to narrow the causal mutation to the TAF1 locus, which included a SINE-VNTR-Alu (SVA) retrotransposition into intron 32 of the gene. Transcriptome analyses identified decreased expression of the canonical cTAF1 transcript among XDP probands, and de novo assembly across multiple pluripotent stem-cell-derived neuronal lineages discovered aberrant TAF1 transcription that involved alternative splicing and intron retention (IR) in proximity to the SVA that was anti-correlated with overall TAF1 expression. CRISPR/Cas9 excision of the SVA rescued this XDP-specific transcriptional signature and normalized TAF1 expression in probands. These data suggest an SVA-mediated aberrant transcriptional mechanism associated with XDP and may provide a roadmap for layered technologies and integrated assembly-based analyses for other unsolved Mendelian disorders.


Assuntos
Distúrbios Distônicos/genética , Doenças Genéticas Ligadas ao Cromossomo X/genética , Genoma Humano , Transcriptoma/genética , Processamento Alternativo/genética , Elementos Alu/genética , Sequência de Bases , Sistemas CRISPR-Cas/genética , Estudos de Coortes , Família , Feminino , Loci Gênicos , Haplótipos/genética , Sequenciamento de Nucleotídeos em Larga Escala , Histona Acetiltransferases/genética , Histona Acetiltransferases/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/metabolismo , Íntrons/genética , Masculino , Repetições Minissatélites/genética , Modelos Genéticos , Degeneração Neural/genética , Degeneração Neural/patologia , Células-Tronco Neurais/metabolismo , Neurônios/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Elementos Nucleotídeos Curtos e Dispersos , Fatores Associados à Proteína de Ligação a TATA/genética , Fatores Associados à Proteína de Ligação a TATA/metabolismo , Fator de Transcrição TFIID/genética , Fator de Transcrição TFIID/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...