RESUMO
Direct comparisons of human and non-human primate brains can reveal molecular pathways underlying remarkable specializations of the human brain. However, chimpanzee tissue is inaccessible during neocortical neurogenesis when differences in brain size first appear. To identify human-specific features of cortical development, we leveraged recent innovations that permit generating pluripotent stem cell-derived cerebral organoids from chimpanzee. Despite metabolic differences, organoid models preserve gene regulatory networks related to primary cell types and developmental processes. We further identified 261 differentially expressed genes in human compared to both chimpanzee organoids and macaque cortex, enriched for recent gene duplications, and including multiple regulators of PI3K-AKT-mTOR signaling. We observed increased activation of this pathway in human radial glia, dependent on two receptors upregulated specifically in human: INSR and ITGB8. Our findings establish a platform for systematic analysis of molecular changes contributing to human brain development and evolution.
Assuntos
Córtex Cerebral/citologia , Organoides/metabolismo , Animais , Evolução Biológica , Encéfalo/citologia , Técnicas de Cultura de Células/métodos , Diferenciação Celular/genética , Córtex Cerebral/metabolismo , Redes Reguladoras de Genes/genética , Humanos , Células-Tronco Pluripotentes Induzidas/citologia , Macaca , Neurogênese/genética , Organoides/crescimento & desenvolvimento , Pan troglodytes , Células-Tronco Pluripotentes/citologia , Análise de Célula Única , Especificidade da Espécie , Transcriptoma/genéticaRESUMO
Genetic changes causing brain size expansion in human evolution have remained elusive. Notch signaling is essential for radial glia stem cell proliferation and is a determinant of neuronal number in the mammalian cortex. We find that three paralogs of human-specific NOTCH2NL are highly expressed in radial glia. Functional analysis reveals that different alleles of NOTCH2NL have varying potencies to enhance Notch signaling by interacting directly with NOTCH receptors. Consistent with a role in Notch signaling, NOTCH2NL ectopic expression delays differentiation of neuronal progenitors, while deletion accelerates differentiation into cortical neurons. Furthermore, NOTCH2NL genes provide the breakpoints in 1q21.1 distal deletion/duplication syndrome, where duplications are associated with macrocephaly and autism and deletions with microcephaly and schizophrenia. Thus, the emergence of human-specific NOTCH2NL genes may have contributed to the rapid evolution of the larger human neocortex, accompanied by loss of genomic stability at the 1q21.1 locus and resulting recurrent neurodevelopmental disorders.
Assuntos
Encéfalo/embriologia , Córtex Cerebral/fisiologia , Neurogênese/fisiologia , Receptor Notch2/metabolismo , Transdução de Sinais , Animais , Diferenciação Celular , Células-Tronco Embrionárias/metabolismo , Feminino , Deleção de Genes , Genes Reporter , Gorilla gorilla , Células HEK293 , Humanos , Neocórtex/citologia , Células-Tronco Neurais/metabolismo , Neuroglia/metabolismo , Neurônios/metabolismo , Pan troglodytes , Receptor Notch2/genética , Análise de Sequência de RNARESUMO
The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.
Assuntos
Evolução Molecular , Genoma/genética , Genômica , Pan paniscus/genética , Filogenia , Animais , Fator de Iniciação 4A em Eucariotos/genética , Feminino , Genes , Gorilla gorilla/genética , Anotação de Sequência Molecular/normas , Pan troglodytes/genética , Pongo/genética , Duplicações Segmentares Genômicas , Análise de Sequência de DNARESUMO
New genome assemblies have been arriving at a rapidly increasing pace, thanks to decreases in sequencing costs and improvements in third-generation sequencing technologies1-3. For example, the number of vertebrate genome assemblies currently in the NCBI (National Center for Biotechnology Information) database4 increased by more than 50% to 1,485 assemblies in the year from July 2018 to July 2019. In addition to this influx of assemblies from different species, new human de novo assemblies5 are being produced, which enable the analysis of not only small polymorphisms, but also complex, large-scale structural differences between human individuals and haplotypes. This coming era and its unprecedented amount of data offer the opportunity to uncover many insights into genome evolution but also present challenges in how to adapt current analysis methods to meet the increased scale. Cactus6, a reference-free multiple genome alignment program, has been shown to be highly accurate, but the existing implementation scales poorly with increasing numbers of genomes, and struggles in regions of highly duplicated sequences. Here we describe progressive extensions to Cactus to create Progressive Cactus, which enables the reference-free alignment of tens to thousands of large vertebrate genomes while maintaining high alignment quality. We describe results from an alignment of more than 600 amniote genomes, which is to our knowledge the largest multiple vertebrate genome alignment created so far.
Assuntos
Genoma/genética , Genômica/métodos , Alinhamento de Sequência/métodos , Software , Vertebrados/genética , Âmnio , Animais , Simulação por Computador , Genômica/normas , Haplótipos , Humanos , Controle de Qualidade , Alinhamento de Sequência/normas , Software/normasRESUMO
The GENCODE project annotates human and mouse genes and transcripts supported by experimental data with high accuracy, providing a foundational resource that supports genome biology and clinical genomics. GENCODE annotation processes make use of primary data and bioinformatic tools and analysis generated both within the consortium and externally to support the creation of transcript structures and the determination of their function. Here, we present improvements to our annotation infrastructure, bioinformatics tools, and analysis, and the advances they support in the annotation of the human and mouse genomes including: the completion of first pass manual annotation for the mouse reference genome; targeted improvements to the annotation of genes associated with SARS-CoV-2 infection; collaborative projects to achieve convergence across reference annotation databases for the annotation of human and mouse protein-coding genes; and the first GENCODE manually supervised automated annotation of lncRNAs. Our annotation is accessible via Ensembl, the UCSC Genome Browser and https://www.gencodegenes.org.
Assuntos
COVID-19/prevenção & controle , Biologia Computacional/métodos , Bases de Dados Genéticas , Genômica/métodos , Anotação de Sequência Molecular/métodos , SARS-CoV-2/genética , Animais , COVID-19/epidemiologia , COVID-19/virologia , Epidemias , Humanos , Internet , Camundongos , Pseudogenes/genética , RNA Longo não Codificante/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Transcrição Gênica/genéticaRESUMO
Large-scale population analyses coupled with advances in technology have demonstrated that the human genome is more diverse than originally thought. To date, this diversity has largely been uncovered using short-read whole-genome sequencing. However, these short-read approaches fail to give a complete picture of a genome. They struggle to identify structural events, cannot access repetitive regions, and fail to resolve the human genome into haplotypes. Here, we describe an approach that retains long range information while maintaining the advantages of short reads. Starting from â¼1 ng of high molecular weight DNA, we produce barcoded short-read libraries. Novel informatic approaches allow for the barcoded short reads to be associated with their original long molecules producing a novel data type known as "Linked-Reads". This approach allows for simultaneous detection of small and large variants from a single library. In this manuscript, we show the advantages of Linked-Reads over standard short-read approaches for reference-based analysis. Linked-Reads allow mapping to 38 Mb of sequence not accessible to short reads, adding sequence in 423 difficult-to-sequence genes including disease-relevant genes STRC, SMN1, and SMN2 Both Linked-Read whole-genome and whole-exome sequencing identify complex structural variations, including balanced events and single exon deletions and duplications. Further, Linked-Reads extend the region of high-confidence calls by 68.9 Mb. The data presented here show that Linked-Reads provide a scalable approach for comprehensive genome analysis that is not possible using short reads alone.
Assuntos
Estudo de Associação Genômica Ampla/métodos , Polimorfismo Genético , Sequenciamento Completo do Genoma/métodos , Linhagem Celular , Genoma Humano , Humanos , Peptídeos e Proteínas de Sinalização Intercelular , Proteínas de Membrana/genética , Proteína 1 de Sobrevivência do Neurônio Motor/genética , Proteína 2 de Sobrevivência do Neurônio Motor/genéticaRESUMO
The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.
Assuntos
Genoma Humano/genética , Algoritmos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Anotação de Sequência Molecular/métodos , RNA/genética , RatosRESUMO
SUMMARY: Bulk RNA sequencing studies have demonstrated that human leukocyte antigen (HLA) genes may be expressed in a cell type-specific and allele-specific fashion. Single-cell gene expression assays have the potential to further resolve these expression patterns, but currently available methods do not perform allele-specific quantification at the molecule level. Here, we present scHLAcount, a post-processing workflow for single-cell RNA-seq data that computes allele-specific molecule counts of the HLA genes based on a personalized reference constructed from the sample's HLA genotypes. AVAILABILITY AND IMPLEMENTATION: scHLAcount is available under the MIT license at https://github.com/10XGenomics/scHLAcount. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Análise de Célula Única , Software , Alelos , Expressão Gênica , Humanos , Análise de Sequência de RNA , Fluxo de TrabalhoRESUMO
The accurate identification and description of the genes in the human and mouse genomes is a fundamental requirement for high quality analysis of data informing both genome biology and clinical genomics. Over the last 15 years, the GENCODE consortium has been producing reference quality gene annotations to provide this foundational resource. The GENCODE consortium includes both experimental and computational biology groups who work together to improve and extend the GENCODE gene annotation. Specifically, we generate primary data, create bioinformatics tools and provide analysis to support the work of expert manual gene annotators and automated gene annotation pipelines. In addition, manual and computational annotation workflows use any and all publicly available data and analysis, along with the research literature to identify and characterise gene loci to the highest standard. GENCODE gene annotations are accessible via the Ensembl and UCSC Genome Browsers, the Ensembl FTP site, Ensembl Biomart, Ensembl Perl and REST APIs as well as https://www.gencodegenes.org.
Assuntos
Bases de Dados Genéticas , Genoma Humano/genética , Genômica , Pseudogenes/genética , Animais , Biologia Computacional , Humanos , Internet , Camundongos , Anotação de Sequência Molecular , SoftwareRESUMO
The UCSC Genome Browser (https://genome.ucsc.edu) provides a web interface for exploring annotated genome assemblies. The assemblies and annotation tracks are updated on an ongoing basis-12 assemblies and more than 28 tracks were added in the past year. Two recent additions are a display of CRISPR/Cas9 guide sequences and an interactive navigator for gene interactions. Other upgrades from the past year include a command-line version of the Variant Annotation Integrator, support for Human Genome Variation Society variant nomenclature input and output, and a revised highlighting tool that now supports multiple simultaneous regions and colors.
Assuntos
Bases de Dados Genéticas , Genoma , Navegador , Sistemas CRISPR-Cas , Apresentação de Dados , Redes Reguladoras de Genes , Genoma Humano , Humanos , Anotação de Sequência Molecular , Terminologia como Assunto , Interface Usuário-ComputadorRESUMO
Sequences encoding Olduvai (DUF1220) protein domains show the largest human-specific increase in copy number of any coding region in the genome and have been linked to human brain evolution. Most human-specific copies of Olduvai (119/165) are encoded by three NBPF genes that are adjacent to three human-specific NOTCH2NL genes that have been shown to promote cortical neurogenesis. Here, employing genomic, phylogenetic, and transcriptomic evidence, we show that these NOTCH2NL/NBPF gene pairs evolved jointly, as two-gene units, very recently in human evolution, and are likely co-regulated. Remarkably, while three NOTCH2NL paralogs were added, adjacent Olduvai sequences hyper-amplified, adding 119 human-specific copies. The data suggest that human-specific Olduvai domains and adjacent NOTCH2NL genes may function in a coordinated, complementary fashion to promote neurogenesis and human brain expansion in a dosage-related manner.
Assuntos
Evolução Biológica , Encéfalo/crescimento & desenvolvimento , Encéfalo/metabolismo , Proteínas de Transporte/genética , Genoma Humano , Receptor Notch2/genética , Genômica , Humanos , Filogenia , Domínios ProteicosRESUMO
Speed, single-base sensitivity and long read lengths make nanopores a promising technology for high-throughput sequencing. We evaluated and optimized the performance of the MinION nanopore sequencer using M13 genomic DNA and used expectation maximization to obtain robust maximum-likelihood estimates for insertion, deletion and substitution error rates (4.9%, 7.8% and 5.1%, respectively). Over 99% of high-quality 2D MinION reads mapped to the reference at a mean identity of 85%. We present a single-nucleotide-variant detection tool that uses maximum-likelihood parameter estimates and marginalization over many possible read alignments to achieve precision and recall of up to 99%. By pairing our high-confidence alignment strategy with long MinION reads, we resolved the copy number for a cancer-testis gene family (CT47) within an unresolved region of human chromosome Xq24.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Nanoporos , Algoritmos , Dosagem de Genes , Humanos , Neoplasias/genéticaRESUMO
Reptiles exhibit a variety of modes of sex determination, including both temperature-dependent and genetic mechanisms. Among those species with genetic sex determination, sex chromosomes of varying heterogamety (XX/XY and ZZ/ZW) have been observed with different degrees of differentiation. Karyotype studies have demonstrated that Gila monsters (Heloderma suspectum) have ZZ/ZW sex determination and this system is likely homologous to the ZZ/ZW system in the Komodo dragon (Varanus komodoensis), but little else is known about their sex chromosomes. Here, we report the assembly and analysis of the Gila monster genome. We generated a de novo draft genome assembly for a male using 10X Genomics technology. We further generated and analyzed short-read whole genome sequencing and whole transcriptome sequencing data for three males and three females. By comparing female and male genomic data, we identified four putative Z chromosome scaffolds. These putative Z chromosome scaffolds are homologous to Z-linked scaffolds identified in the Komodo dragon. Further, by analyzing RNAseq data, we observed evidence of incomplete dosage compensation between the Gila monster Z chromosome and autosomes and a lack of balance in Z-linked expression between the sexes. In particular, we observe lower expression of the Z in females (ZW) than males (ZZ) on a global basis, though we find evidence suggesting local gene-by-gene compensation. This pattern has been observed in most other ZZ/ZW systems studied to date and may represent a general pattern for female heterogamety in vertebrates.
Assuntos
Animais Peçonhentos , Heloderma suspectum , Lagartos , Animais , Masculino , Feminino , Lagartos/genética , Cromossomos Sexuais/genética , Cariótipo , Mecanismo Genético de Compensação de DoseRESUMO
Reptiles exhibit a variety of modes of sex determination, including both temperature-dependent and genetic mechanisms. Among those species with genetic sex determination, sex chromosomes of varying heterogamety (XX/XY and ZZ/ZW) have been observed with different degrees of differentiation. Karyotype studies have demonstrated that Gila monsters (Heloderma suspectum) have ZZ/ZW sex determination and this system is likely homologous to the ZZ/ZW system in the Komodo dragon (Varanus komodoensis), but little else is known about their sex chromosomes. Here, we report the assembly and analysis of the Gila monster genome. We generated a de novo draft genome assembly for a male using 10X Genomics technology. We further generated and analyzed short-read whole genome sequencing and whole transcriptome sequencing data for three males and three females. By comparing female and male genomic data, we identified four putative Z-chromosome scaffolds. These putative Z-chromosome scaffolds are homologous to Z-linked scaffolds identified in the Komodo dragon. Further, by analyzing RNAseq data, we observed evidence of incomplete dosage compensation between the Gila monster Z chromosome and autosomes and a lack of balance in Z-linked expression between the sexes. In particular, we observe lower expression of the Z in females (ZW) than males (ZZ) on a global basis, though we find evidence suggesting local gene-by-gene compensation. This pattern has been observed in most other ZZ/ZW systems studied to date and may represent a general pattern for female heterogamety in vertebrates.
RESUMO
Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.
RESUMO
Since its initial release in 2000, the human reference genome has covered only the euchromatic fraction of the genome, leaving important heterochromatic regions unfinished. Addressing the remaining 8% of the genome, the Telomere-to-Telomere (T2T) Consortium presents a complete 3.055 billion-base pair sequence of a human genome, T2T-CHM13, that includes gapless assemblies for all chromosomes except Y, corrects errors in the prior references, and introduces nearly 200 million base pairs of sequence containing 1956 gene predictions, 99 of which are predicted to be protein coding. The completed regions include all centromeric satellite arrays, recent segmental duplications, and the short arms of all five acrocentric chromosomes, unlocking these complex regions of the genome to variational and functional studies.
Assuntos
Genoma Humano , Projeto Genoma Humano , Análise de Sequência de DNA/normas , Linhagem Celular , Cromossomos Artificiais Bacterianos/genética , Cromossomos Humanos/genética , Humanos , Valores de ReferênciaRESUMO
We performed shallow single-cell sequencing of genomic DNA across 1475 cells from a cell-line, COLO829, to resolve overall complexity and clonality. This melanoma tumor-line has been previously characterized by multiple technologies and is a benchmark for evaluating somatic alterations. In some of these studies, COLO829 has shown conflicting and/or indeterminate copy number and, thus, single-cell sequencing provides a tool for gaining insight. Following shallow single-cell sequencing, we first identified at least four major sub-clones by discriminant analysis of principal components of single-cell copy number data. Based on clustering, break-point and loss of heterozygosity analysis of aggregated data from sub-clones, we identified distinct hallmark events that were validated within bulk sequencing and spectral karyotyping. In summary, COLO829 exhibits a classical Dutrillaux's monosomic/trisomic pattern of karyotype evolution with endoreduplication, where consistent sub-clones emerge from the loss/gain of abnormal chromosomes. Overall, our results demonstrate how shallow copy number profiling can uncover hidden biological insights.
Assuntos
Melanoma/genética , Melanoma/patologia , Análise de Célula Única/métodos , Linhagem Celular Tumoral , Variações do Número de Cópias de DNA , Humanos , Cariotipagem , Perda de Heterozigosidade , Análise de Sequência de DNARESUMO
Single-cell CRISPR screens enable the exploration of mammalian gene function and genetic regulatory networks. However, use of this technology has been limited by reliance on indirect indexing of single-guide RNAs (sgRNAs). Here we present direct-capture Perturb-seq, a versatile screening approach in which expressed sgRNAs are sequenced alongside single-cell transcriptomes. Direct-capture Perturb-seq enables detection of multiple distinct sgRNA sequences from individual cells and thus allows pooled single-cell CRISPR screens to be easily paired with combinatorial perturbation libraries that contain dual-guide expression vectors. We demonstrate the utility of this approach for high-throughput investigations of genetic interactions and, leveraging this ability, dissect epistatic interactions between cholesterol biogenesis and DNA repair. Using direct capture Perturb-seq, we also show that targeting individual genes with multiple sgRNAs per cell improves efficacy of CRISPR interference and activation, facilitating the use of compact, highly active CRISPR libraries for single-cell screens. Last, we show that hybridization-based target enrichment permits sensitive, specific sequencing of informative transcripts from single-cell RNA-seq experiments.
Assuntos
Sistemas CRISPR-Cas , Técnicas de Amplificação de Ácido Nucleico/métodos , RNA Guia de Cinetoplastídeos/genética , Regulação da Expressão Gênica , Marcação de Genes , Células HEK293 , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Célula Única , TranscriptomaRESUMO
An amendment to this paper has been published and can be accessed via a link at the top of the paper.
RESUMO
New technologies and analysis methods are enabling genomic structural variants (SVs) to be detected with ever-increasing accuracy, resolution and comprehensiveness. To help translate these methods to routine research and clinical practice, we developed a sequence-resolved benchmark set for identification of both false-negative and false-positive germline large insertions and deletions. To create this benchmark for a broadly consented son in a Personal Genome Project trio with broadly available cells and DNA, the Genome in a Bottle Consortium integrated 19 sequence-resolved variant calling methods from diverse technologies. The final benchmark set contains 12,745 isolated, sequence-resolved insertion (7,281) and deletion (5,464) calls ≥50 base pairs (bp). The Tier 1 benchmark regions, for which any extra calls are putative false positives, cover 2.51 Gbp and 5,262 insertions and 4,095 deletions supported by ≥1 diploid assembly. We demonstrate that the benchmark set reliably identifies false negatives and false positives in high-quality SV callsets from short-, linked- and long-read sequencing and optical mapping.