Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 40
Filter
1.
Genome Res ; 2024 Jun 21.
Article in English | MEDLINE | ID: mdl-38906680

ABSTRACT

Transcription and translation are intertwined processes where mRNA isoforms are crucial intermediaries. However, methodological limitations in analyzing translation at the mRNA isoform level have left gaps in our understanding of critical biological processes. To address these gaps, we developed an integrated computational and experimental framework called long-read Ribo-STAMP (LR-Ribo-STAMP) that capitalizes on advancements in long-read sequencing and RNA-base editing-mediated technologies to simultaneously profile translation and transcription at both gene and mRNA isoform levels. We also developed the EditsC metric to quantify editing and leverage the single-molecule, full-length transcript information provided by long-read sequencing. Here, we report concordance between gene-level translation profiles obtained with long-read and short-read Ribo-STAMP. We show that LR-Ribo-STAMP successfully profiles translation of mRNA isoforms and links regulatory features, such as upstream open reading frames (uORFs), to translation measurements. We apply LR-Ribo-STAMP to discovering translational differences at both gene and isoform levels in a triple-negative breast cancer cell line under normoxia and hypoxia and find that LR-Ribo-STAMP effectively delineates orthogonal transcriptional and translation shifts between conditions. We also discover regulatory elements that distinguish translational differences at the isoform level. We highlight GRK6, where hypoxia is observed to increase expression and translation of a shorter mRNA isoform, giving rise to a truncated protein without the AGC Kinase domain. Overall, LR-Ribo-STAMP is an important advance in our repertoire of methods that measure mRNA translation with isoform sensitivity.

2.
Nature ; 594(7861): 77-81, 2021 06.
Article in English | MEDLINE | ID: mdl-33953399

ABSTRACT

The divergence of chimpanzee and bonobo provides one of the few examples of recent hominid speciation1,2. Here we describe a fully annotated, high-quality bonobo genome assembly, which was constructed without guidance from reference genomes by applying a multiplatform genomics approach. We generate a bonobo genome assembly in which more than 98% of genes are completely annotated and 99% of the gaps are closed, including the resolution of about half of the segmental duplications and almost all of the full-length mobile elements. We compare the bonobo genome to those of other great apes1,3-5 and identify more than 5,569 fixed structural variants that specifically distinguish the bonobo and chimpanzee lineages. We focus on genes that have been lost, changed in structure or expanded in the last few million years of bonobo evolution. We produce a high-resolution map of incomplete lineage sorting and estimate that around 5.1% of the human genome is genetically closer to chimpanzee or bonobo and that more than 36.5% of the genome shows incomplete lineage sorting if we consider a deeper phylogeny including gorilla and orangutan. We also show that 26% of the segments of incomplete lineage sorting between human and chimpanzee or human and bonobo are non-randomly distributed and that genes within these clustered segments show significant excess of amino acid replacement compared to the rest of the genome.


Subject(s)
Evolution, Molecular , Genome/genetics , Genomics , Pan paniscus/genetics , Phylogeny , Animals , Eukaryotic Initiation Factor-4A/genetics , Female , Genes , Gorilla gorilla/genetics , Molecular Sequence Annotation/standards , Pan troglodytes/genetics , Pongo/genetics , Segmental Duplications, Genomic , Sequence Analysis, DNA
3.
Nat Methods ; 18(5): 507-519, 2021 05.
Article in English | MEDLINE | ID: mdl-33963355

ABSTRACT

RNA-binding proteins (RBPs) are critical regulators of gene expression and RNA processing that are required for gene function. Yet the dynamics of RBP regulation in single cells is unknown. To address this gap in understanding, we developed STAMP (Surveying Targets by APOBEC-Mediated Profiling), which efficiently detects RBP-RNA interactions. STAMP does not rely on ultraviolet cross-linking or immunoprecipitation and, when coupled with single-cell capture, can identify RBP-specific and cell-type-specific RNA-protein interactions for multiple RBPs and cell types in single, pooled experiments. Pairing STAMP with long-read sequencing yields RBP target sites in an isoform-specific manner. Finally, Ribo-STAMP leverages small ribosomal subunits to measure transcriptome-wide ribosome association in single cells. STAMP enables the study of RBP-RNA interactomes and translational landscapes with unprecedented cellular resolution.


Subject(s)
RNA-Binding Proteins/metabolism , RNA/metabolism , Single-Cell Analysis/methods , Animals , Binding Sites , Gene Expression Profiling , HEK293 Cells , Humans , Nanopore Sequencing , RNA/chemistry , RNA-Binding Proteins/chemistry , Sequence Analysis, RNA , Transcriptome
4.
Nucleic Acids Res ; 50(14): 7801-7815, 2022 08 12.
Article in English | MEDLINE | ID: mdl-35253883

ABSTRACT

Centromeres are the chromosomal loci essential for faithful chromosome segregation during cell division. Although centromeres are transcribed and produce non-coding RNAs (cenRNAs) that affect centromere function, we still lack a mechanistic understanding of how centromere transcription is regulated. Here, using a targeted RNA isoform sequencing approach, we identified the transcriptional landscape at and surrounding all centromeres in budding yeast. Overall, cenRNAs are derived from transcription readthrough of pericentromeric regions but rarely span the entire centromere and are a complex mixture of molecules that are heterogeneous in abundance, orientation, and sequence. While most pericentromeres are transcribed throughout the cell cycle, centromere accessibility to the transcription machinery is restricted to S-phase. This temporal restriction is dependent on Cbf1, a centromere-binding transcription factor, that we demonstrate acts locally as a transcriptional roadblock. Cbf1 deletion leads to an accumulation of cenRNAs at all phases of the cell cycle which correlates with increased chromosome mis-segregation that is partially rescued when the roadblock activity is restored. We propose that a Cbf1-mediated transcriptional roadblock protects yeast centromeres from untimely transcription to ensure genomic stability.


Centromeres are essential chromosomal regions that do not encode gene products and instead ensure the accurate partitioning of chromosomes during cell division. Despite the lack of genes, transcription has been detected at centromeres. It has not been clear where this centromeric RNA comes from and how it is regulated. In this study, the authors identified all of the centromeric RNAs at and around budding yeast centromeres during the cell cycle. Unlike RNAs that encode for proteins, centromeric RNAs are a complex mixture of transcripts that result from adjacent RNAs that continue into the centromere. The authors found that most transcription is blocked at the centromere border by a protein called Cbf1. This mechanism shields the centromere from untimely transcription to ensure genome stability.


Subject(s)
Centromere , Saccharomyces cerevisiae Proteins , Basic Helix-Loop-Helix Leucine Zipper Transcription Factors/metabolism , Centromere/genetics , Centromere/metabolism , Chromosome Segregation/genetics , Saccharomyces cerevisiae/genetics , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae Proteins/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism , Transcription, Genetic
5.
Genome Res ; 28(10): 1566-1576, 2018 10.
Article in English | MEDLINE | ID: mdl-30228200

ABSTRACT

Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.


Subject(s)
Brain/metabolism , Segmental Duplications, Genomic , Sequence Analysis, DNA/methods , Sequence Analysis, RNA/methods , Evolution, Molecular , Gene Duplication , Gene Expression Profiling , Humans , Molecular Sequence Annotation , Multigene Family , Open Reading Frames , Pseudogenes
6.
Genome Res ; 28(7): 1029-1038, 2018 07.
Article in English | MEDLINE | ID: mdl-29884752

ABSTRACT

The recent introductions of low-cost, long-read, and read-cloud sequencing technologies coupled with intense efforts to develop efficient algorithms have made affordable, high-quality de novo sequence assembly a realistic proposition. The result is an explosion of new, ultracontiguous genome assemblies. To compare these genomes, we need robust methods for genome annotation. We describe the fully open source Comparative Annotation Toolkit (CAT), which provides a flexible way to simultaneously annotate entire clades and identify orthology relationships. We show that CAT can be used to improve annotations on the rat genome, annotate the great apes, annotate a diverse set of mammals, and annotate personal, diploid human genomes. We demonstrate the resulting discovery of novel genes, isoforms, and structural variants-even in genomes as well studied as rat and the great apes-and how these annotations improve cross-species RNA expression experiments.


Subject(s)
Genome, Human/genetics , Algorithms , Animals , High-Throughput Nucleotide Sequencing/methods , Humans , Molecular Sequence Annotation/methods , RNA/genetics , Rats
7.
Genet Med ; 21(2): 477-486, 2019 02.
Article in English | MEDLINE | ID: mdl-29955105

ABSTRACT

PURPOSE: Rh antigens can provoke severe alloimmune reactions, particularly in high-risk transfusion contexts, such as sickle cell disease. Rh antigens are encoded by the paralogs, RHD and RHCE, located in one of the most complex genetic loci. Our goal was to characterize RH genetic variation in multi-ethnic cohorts, with the focus on detecting RH structural variation (SV). METHODS: We customized analytical methods to estimate paralog-specific copy number from next-generation sequencing (NGS) data. We applied these methods to clinically characterized samples, including four World Health Organization (WHO) genotyping references and 1135 Asian and Native American blood donors. Subsequently, we surveyed 1715 African American samples from the Jackson Heart Study. RESULTS: Most samples in each dataset exhibited SV. SV detection enabled prediction of the immunogenic RhD and RhC antigens in concordance (>99%) with serological phenotyping. RhC antigen expression was associated with exon 2 hybrid alleles (RHCE*CE-D(2)-CE). Clinically relevant exon 4-7 hybrid alleles (RHD*D-CE(4-7)-D) and exon 9 hybrid alleles (RHCE*CE-D(9)-CE) were prevalent in African Americans. CONCLUSION: This study shows custom NGS methods can accurately detect RH SV, and that SV is important to inform prediction of relevant RH alleles. Additionally, this study provides the first large NGS survey of RH alleles in African Americans.


Subject(s)
Anemia, Sickle Cell/genetics , Genomics , High-Throughput Nucleotide Sequencing , Rh-Hr Blood-Group System/genetics , Black or African American/genetics , Alleles , Anemia, Sickle Cell/epidemiology , Anemia, Sickle Cell/physiopathology , Asian People/genetics , DNA Copy Number Variations/genetics , Ethnicity/genetics , Female , Genomic Structural Variation/genetics , Humans , Indians, North American/genetics , Male , Rh-Hr Blood-Group System/chemistry , Rh-Hr Blood-Group System/immunology , World Health Organization
8.
Nucleic Acids Res ; 43(18): e116, 2015 Oct 15.
Article in English | MEDLINE | ID: mdl-26040699

ABSTRACT

We developed an innovative hybrid sequencing approach, IDP-fusion, to detect fusion genes, determine fusion sites and identify and quantify fusion isoforms. IDP-fusion is the first method to study gene fusion events by integrating Third Generation Sequencing long reads and Second Generation Sequencing short reads. We applied IDP-fusion to PacBio data and Illumina data from the MCF-7 breast cancer cells. Compared with the existing tools, IDP-fusion detects fusion genes at higher precision and a very low false positive rate. The results show that IDP-fusion will be useful for unraveling the complexity of multiple fusion splices and fusion isoforms within tumorigenesis-relevant fusion genes.


Subject(s)
Carcinogenesis/genetics , Gene Expression Profiling , Gene Fusion , High-Throughput Nucleotide Sequencing/methods , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Female , Humans , MCF-7 Cells , Protein Isoforms/genetics , Protein Isoforms/metabolism , Sequence Alignment
9.
Proc Natl Acad Sci U S A ; 110(50): E4821-30, 2013 Dec 10.
Article in English | MEDLINE | ID: mdl-24282307

ABSTRACT

Although transcriptional and posttranscriptional events are detected in RNA-Seq data from second-generation sequencing, full-length mRNA isoforms are not captured. On the other hand, third-generation sequencing, which yields much longer reads, has current limitations of lower raw accuracy and throughput. Here, we combine second-generation sequencing and third-generation sequencing with a custom-designed method for isoform identification and quantification to generate a high-confidence isoform dataset for human embryonic stem cells (hESCs). We report 8,084 RefSeq-annotated isoforms detected as full-length and an additional 5,459 isoforms predicted through statistical inference. Over one-third of these are novel isoforms, including 273 RNAs from gene loci that have not previously been identified. Further characterization of the novel loci indicates that a subset is expressed in pluripotent cells but not in diverse fetal and adult tissues; moreover, their reduced expression perturbs the network of pluripotency-associated genes. Results suggest that gene identification, even in well-characterized human cell lines and tissues, is likely far from complete.


Subject(s)
Alternative Splicing/genetics , Embryonic Stem Cells/metabolism , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods , Protein Isoforms/genetics , Transcriptome/genetics , Embryonic Stem Cells/chemistry , Humans , Male
10.
Nat Methods ; 7(12): 995-1001, 2010 Dec.
Article in English | MEDLINE | ID: mdl-21057495

ABSTRACT

Classical approaches to determine structures of noncoding RNA (ncRNA) probed only one RNA at a time with enzymes and chemicals, using gel electrophoresis to identify reactive positions. To accelerate RNA structure inference, we developed fragmentation sequencing (FragSeq), a high-throughput RNA structure probing method that uses high-throughput RNA sequencing of fragments generated by digestion with nuclease P1, which specifically cleaves single-stranded nucleic acids. In experiments probing the entire mouse nuclear transcriptome, we accurately and simultaneously mapped single-stranded RNA regions in multiple ncRNAs with known structure. We probed in two cell types to verify reproducibility. We also identified and experimentally validated structured regions in ncRNAs with, to our knowledge, no previously reported probing data.


Subject(s)
Gene Expression Profiling/methods , RNA/chemistry , RNA/genetics , Animals , Base Pairing , Base Sequence , Chromosome Mapping/methods , DNA Primers , Gene Library , Histones/genetics , Humans , Mice , Models, Molecular , Molecular Sequence Data , Neurons/physiology , Nucleic Acid Conformation , RNA, Untranslated/chemistry
11.
bioRxiv ; 2023 Sep 27.
Article in English | MEDLINE | ID: mdl-37808736

ABSTRACT

Resolving the molecular basis of a Mendelian condition (MC) remains challenging owing to the diverse mechanisms by which genetic variants cause disease. To address this, we developed a synchronized long-read genome, methylome, epigenome, and transcriptome sequencing approach, which enables accurate single-nucleotide, insertion-deletion, and structural variant calling and diploid de novo genome assembly, and permits the simultaneous elucidation of haplotype-resolved CpG methylation, chromatin accessibility, and full-length transcript information in a single long-read sequencing run. Application of this approach to an Undiagnosed Diseases Network (UDN) participant with a chromosome X;13 balanced translocation of uncertain significance revealed that this translocation disrupted the functioning of four separate genes (NBEA, PDK3, MAB21L1, and RB1) previously associated with single-gene MCs. Notably, the function of each gene was disrupted via a distinct mechanism that required integration of the four 'omes' to resolve. These included nonsense-mediated decay, fusion transcript formation, enhancer adoption, transcriptional readthrough silencing, and inappropriate X chromosome inactivation of autosomal genes. Overall, this highlights the utility of synchronized long-read multi-omic profiling for mechanistically resolving complex phenotypes.

12.
G3 (Bethesda) ; 12(3)2022 03 04.
Article in English | MEDLINE | ID: mdl-35100340

ABSTRACT

Understanding hibernation in brown bears (Ursus arctos) can provide insight into some human diseases. During hibernation, brown bears experience periods of insulin resistance, physical inactivity, extreme bradycardia, obesity, and the absence of urine production. These states closely mimic aspects of human diseases such as type 2 diabetes, muscle atrophy, as well as renal and heart failure. The reversibility of these states from hibernation to active season enables the identification of mediators with possible therapeutic value for humans. Recent studies have identified genes and pathways that are differentially expressed between active and hibernation seasons in bears. However, little is known about the role of differential expression of gene isoforms on hibernation physiology. To identify both distinct and novel mRNA isoforms, full-length RNA-sequencing (Iso-Seq) was performed on adipose, skeletal muscle, and liver from three individual bears sampled during both active and hibernation seasons. The existing reference genome annotation was improved by combining it with the Iso-Seq data. Short-read RNA-sequencing data from six individuals were mapped to the new reference annotation to quantify differential isoform usage (DIU) between tissues and seasons. We identified differentially expressed isoforms in all three tissues, to varying degrees. Adipose had a high level of DIU with isoform switching, regardless of whether the genes were differentially expressed. Our analyses revealed that DIU, even in the absence of differential gene expression, is an important mechanism for modulating genes during hibernation. These findings demonstrate the value of isoform expression studies and will serve as the basis for deeper exploration into hibernation biology.


Subject(s)
Diabetes Mellitus, Type 2 , Gene Expression Regulation , Hibernation , Ursidae , Adipose Tissue/metabolism , Animals , Diabetes Mellitus, Type 2/genetics , Diabetes Mellitus, Type 2/metabolism , Hibernation/genetics , Humans , Protein Isoforms/genetics , Protein Isoforms/metabolism , Ursidae/genetics , Ursidae/metabolism
13.
Nat Commun ; 12(1): 5118, 2021 08 25.
Article in English | MEDLINE | ID: mdl-34433829

ABSTRACT

TRP channel-associated factor 1/2 (TCAF1/TCAF2) proteins antagonistically regulate the cold-sensor protein TRPM8 in multiple human tissues. Understanding their significance has been complicated given the locus spans a gap-ridden region with complex segmental duplications in GRCh38. Using long-read sequencing, we sequence-resolve the locus, annotate full-length TCAF models in primate genomes, and show substantial human-specific TCAF copy number variation. We identify two human super haplogroups, H4 and H5, and establish that TCAF duplications originated ~1.7 million years ago but diversified only in Homo sapiens by recurrent structural mutations. Conversely, in all archaic-hominin samples the fixation for a specific H4 haplotype without duplication is likely due to positive selection. Here, our results of TCAF copy number expansion, selection signals in hominins, and differential TCAF2 expression between haplogroups and high TCAF2 and TRPM8 expression in liver and prostate in modern-day humans imply TCAF diversification among hominins potentially in response to cold or dietary adaptations.


Subject(s)
Gene Duplication , Hominidae/genetics , Membrane Proteins/genetics , Selection, Genetic , Animals , DNA Copy Number Variations , Evolution, Molecular , Genome, Human , Haplotypes , Humans , Neanderthals , Phylogeny
14.
Elife ; 92020 12 02.
Article in English | MEDLINE | ID: mdl-33263279

ABSTRACT

Our understanding of the beads-on-a-string arrangement of nucleosomes has been built largely on high-resolution sequence-agnostic imaging methods and sequence-resolved bulk biochemical techniques. To bridge the divide between these approaches, we present the single-molecule adenine methylated oligonucleosome sequencing assay (SAMOSA). SAMOSA is a high-throughput single-molecule sequencing method that combines adenine methyltransferase footprinting and single-molecule real-time DNA sequencing to natively and nondestructively measure nucleosome positions on individual chromatin fibres. SAMOSA data allows unbiased classification of single-molecular 'states' of nucleosome occupancy on individual chromatin fibres. We leverage this to estimate nucleosome regularity and spacing on single chromatin fibres genome-wide, at predicted transcription factor binding motifs, and across human epigenomic domains. Our analyses suggest that chromatin is comprised of both regular and irregular single-molecular oligonucleosome patterns that differ subtly in their relative abundance across epigenomic domains. This irregularity is particularly striking in constitutive heterochromatin, which has typically been viewed as a conformationally static entity. Our proof-of-concept study provides a powerful new methodology for studying nucleosome organization at a previously intractable resolution and offers up new avenues for modeling and visualizing higher order chromatin structure.


Subject(s)
Chromatin/genetics , DNA/genetics , High-Throughput Nucleotide Sequencing , Nucleosomes/genetics , Single Molecule Imaging , Acetylation , Binding Sites , Chromatin/chemistry , Chromatin/metabolism , DNA/chemistry , DNA/metabolism , Epigenesis, Genetic , Histones/chemistry , Histones/genetics , Histones/metabolism , Humans , K562 Cells , Nucleic Acid Conformation , Nucleosomes/chemistry , Nucleosomes/metabolism , Proof of Concept Study , Protein Conformation , Protein Processing, Post-Translational , Site-Specific DNA-Methyltransferase (Adenine-Specific)/metabolism , Transcription Factors/genetics , Transcription Factors/metabolism
15.
Nat Commun ; 11(1): 2326, 2020 05 11.
Article in English | MEDLINE | ID: mdl-32393825

ABSTRACT

Most human protein-coding genes are expressed as multiple isoforms, which greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every coding gene, the majority of alternative isoforms remains uncharacterized due to (i) vast differences of overall levels between different isoforms expressed from common genes, and (ii) the difficulty of obtaining full-length transcript sequences. Here, we present ORF Capture-Seq (OCS), a flexible method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As a proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude when compared to unenriched samples. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will accelerate mapping of the human transcriptome.


Subject(s)
Open Reading Frames/genetics , Sequence Analysis, RNA/methods , Humans , Protein Isoforms/genetics , Protein Isoforms/metabolism , RNA, Messenger/genetics , RNA, Messenger/metabolism , Reference Standards , Transcription Factors/genetics
16.
Genome Biol ; 21(1): 202, 2020 08 10.
Article in English | MEDLINE | ID: mdl-32778141

ABSTRACT

BACKGROUND: The complex interspersed pattern of segmental duplications in humans is responsible for rearrangements associated with neurodevelopmental disease, including the emergence of novel genes important in human brain evolution. We investigate the evolution of LCR16a, a putative driver of this phenomenon that encodes one of the most rapidly evolving human-ape gene families, nuclear pore interacting protein (NPIP). RESULTS: Comparative analysis shows that LCR16a has independently expanded in five primate lineages over the last 35 million years of primate evolution. The expansions are associated with independent lineage-specific segmental duplications flanking LCR16a leading to the emergence of large interspersed duplication blocks at non-orthologous chromosomal locations in each primate lineage. The intron-exon structure of the NPIP gene family has changed dramatically throughout primate evolution with different branches showing characteristic gene models yet maintaining an open reading frame. In the African ape lineage, we detect signatures of positive selection that occurred after a transition to more ubiquitous expression among great ape tissues when compared to Old World and New World monkeys. Mouse transgenic experiments from baboon and human genomic loci confirm these expression differences and suggest that the broader ape expression pattern arose due to mutational changes that emerged in cis. CONCLUSIONS: LCR16a promotes serial interspersed duplications and creates hotspots of genomic instability that appear to be an ancient property of primate genomes. Dramatic changes to NPIP gene structure and altered tissue expression preceded major bouts of positive selection in the African ape lineage, suggestive of a gene undergoing strong adaptive evolution.


Subject(s)
Evolution, Molecular , Gene Duplication , Primates/genetics , Segmental Duplications, Genomic , Animals , Biodiversity , Brain , Chromosome Mapping , Chromosomes , Exons , Gene Fusion , Genome, Human , Genomic Instability , Hominidae , Humans , Phylogeny
17.
Science ; 370(6523)2020 12 18.
Article in English | MEDLINE | ID: mdl-33335035

ABSTRACT

The rhesus macaque (Macaca mulatta) is the most widely studied nonhuman primate (NHP) in biomedical research. We present an updated reference genome assembly (Mmul_10, contig N50 = 46 Mbp) that increases the sequence contiguity 120-fold and annotate it using 6.5 million full-length transcripts, thus improving our understanding of gene content, isoform diversity, and repeat organization. With the improved assembly of segmental duplications, we discovered new lineage-specific genes and expanded gene families that are potentially informative in studies of evolution and disease susceptibility. Whole-genome sequencing (WGS) data from 853 rhesus macaques identified 85.7 million single-nucleotide variants (SNVs) and 10.5 million indel variants, including potentially damaging variants in genes associated with human autism and developmental delay, providing a framework for developing noninvasive NHP models of human disease.


Subject(s)
Genetic Predisposition to Disease , Genome , Macaca mulatta/genetics , Polymorphism, Single Nucleotide , Animals , Genetic Variation , Humans , Molecular Sequence Annotation , Whole Genome Sequencing
18.
Science ; 366(6463)2019 10 18.
Article in English | MEDLINE | ID: mdl-31624180

ABSTRACT

Copy number variants (CNVs) are subject to stronger selective pressure than single-nucleotide variants, but their roles in archaic introgression and adaptation have not been systematically investigated. We show that stratified CNVs are significantly associated with signatures of positive selection in Melanesians and provide evidence for adaptive introgression of large CNVs at chromosomes 16p11.2 and 8p21.3 from Denisovans and Neanderthals, respectively. Using long-read sequence data, we reconstruct the structure and complex evolutionary history of these polymorphisms and show that both encode positively selected genes absent from most human populations. Our results collectively suggest that large CNVs originating in archaic hominins and introgressed into modern humans have played an important role in local population adaptation and represent an insufficiently studied source of large-scale genetic variation.


Subject(s)
Genetic Introgression , Animals , Chromosome Duplication , Chromosomes, Human, Pair 16/genetics , Chromosomes, Human, Pair 8/genetics , DNA Copy Number Variations , Evolution, Molecular , Genome, Human , Haplotypes , Hominidae/genetics , Humans , Melanesia , Models, Genetic , Neanderthals/genetics , Polymorphism, Genetic , Selection, Genetic , Whole Genome Sequencing
19.
Mol Cell Biol ; 25(22): 10005-16, 2005 Nov.
Article in English | MEDLINE | ID: mdl-16260614

ABSTRACT

A vertebrate homologue of the Fox-1 protein from C. elegans was recently shown to bind to the element GCAUG and to act as an inhibitor of alternative splicing patterns in muscle. The element UGCAUG is a splicing enhancer element found downstream of numerous neuron-specific exons. We show here that mouse Fox-1 (mFox-1) and another homologue, Fox-2, are both specifically expressed in neurons in addition to muscle and heart. The mammalian Fox genes are very complex transcription units that generate transcripts from multiple promoters and with multiple internal exons whose inclusion is regulated. These genes produce a large family of proteins with variable N and C termini and internal deletions. We show that the overexpression of both Fox-1 and Fox-2 isoforms specifically activates splicing of neuronally regulated exons. This splicing activation requires UGCAUG enhancer elements. Conversely, RNA interference-mediated knockdown of Fox protein expression inhibits splicing of UGCAUG-dependent exons. These experiments show that this large family of proteins regulates splicing in the nervous system. They do this through a splicing enhancer function, in addition to their apparent negative effects on splicing in vertebrate muscle and in worms.


Subject(s)
Caenorhabditis elegans Proteins/physiology , Carrier Proteins/physiology , Neurons/metabolism , RNA Splicing , RNA-Binding Proteins/physiology , Repressor Proteins/physiology , Animals , Blotting, Northern , Blotting, Western , Brain/metabolism , Caenorhabditis elegans , Caenorhabditis elegans Proteins/genetics , Carrier Proteins/genetics , Cross-Linking Reagents/pharmacology , Enhancer Elements, Genetic , Exons , Fibronectins/metabolism , Gene Expression Regulation , HeLa Cells , Hippocampus/metabolism , Humans , Immunohistochemistry , Introns , Mice , Models, Genetic , Muscles/metabolism , Plasmids/metabolism , Promoter Regions, Genetic , Protein Binding , Protein Isoforms , Protein Structure, Tertiary , RNA Interference , RNA Splicing Factors , RNA, Messenger/metabolism , RNA, Small Interfering/metabolism , RNA-Binding Proteins/genetics , Repressor Proteins/genetics , Reverse Transcriptase Polymerase Chain Reaction , Tissue Distribution , Transfection
20.
Science ; 360(6393)2018 06 08.
Article in English | MEDLINE | ID: mdl-29880660

ABSTRACT

Genetic studies of human evolution require high-quality contiguous ape genome assemblies that are not guided by the human reference. We coupled long-read sequence assembly and full-length complementary DNA sequencing with a multiplatform scaffolding approach to produce ab initio chimpanzee and orangutan genome assemblies. By comparing these with two long-read de novo human genome assemblies and a gorilla genome assembly, we characterized lineage-specific and shared great ape genetic variation ranging from single- to mega-base pair-sized variants. We identified ~17,000 fixed human-specific structural variants identifying genic and putative regulatory changes that have emerged in humans since divergence from nonhuman apes. Interestingly, these variants are enriched near genes that are down-regulated in human compared to chimpanzee cerebral organoids, particularly in cells analogous to radial glial neural progenitors.


Subject(s)
Evolution, Molecular , Genome, Human , Hominidae/genetics , Animals , Contig Mapping , Genetic Variation , Humans , Molecular Sequence Annotation , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL