Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 32
Filter
1.
Cell ; 187(5): 1059-1075, 2024 Feb 29.
Article in English | MEDLINE | ID: mdl-38428388

ABSTRACT

Human genetics has emerged as one of the most dynamic areas of biology, with a broadening societal impact. In this review, we discuss recent achievements, ongoing efforts, and future challenges in the field. Advances in technology, statistical methods, and the growing scale of research efforts have all provided many insights into the processes that have given rise to the current patterns of genetic variation. Vast maps of genetic associations with human traits and diseases have allowed characterization of their genetic architecture. Finally, studies of molecular and cellular effects of genetic variants have provided insights into biological processes underlying disease. Many outstanding questions remain, but the field is well poised for groundbreaking discoveries as it increases the use of genetic data to understand both the history of our species and its applications to improve human health.


Subject(s)
Human Genetics , Humans , Genetic Variation , Multifactorial Inheritance , Phenotype
2.
Cell ; 177(4): 1022-1034.e6, 2019 05 02.
Article in English | MEDLINE | ID: mdl-31051098

ABSTRACT

Early genome-wide association studies (GWASs) led to the surprising discovery that, for typical complex traits, most of the heritability is due to huge numbers of common variants with tiny effect sizes. Previously, we argued that new models are needed to understand these patterns. Here, we provide a formal model in which genetic contributions to complex traits are partitioned into direct effects from core genes and indirect effects from peripheral genes acting in trans. We propose that most heritability is driven by weak trans-eQTL SNPs, whose effects are mediated through peripheral genes to impact the expression of core genes. In particular, if the core genes for a trait tend to be co-regulated, then the effects of peripheral variation can be amplified such that nearly all of the genetic variance is driven by weak trans effects. Thus, our model proposes a framework for understanding key features of the architecture of complex traits.


Subject(s)
Gene Expression Regulation/genetics , Heredity/genetics , Multifactorial Inheritance/genetics , Databases, Genetic , Gene Expression/genetics , Gene Expression Profiling/methods , Genetic Variation/genetics , Genome-Wide Association Study , Humans , Models, Theoretical , Phenotype , Polymorphism, Genetic/genetics , Quantitative Trait Loci/genetics
3.
Cell ; 176(3): 535-548.e24, 2019 01 24.
Article in English | MEDLINE | ID: mdl-30661751

ABSTRACT

The splicing of pre-mRNAs into mature transcripts is remarkable for its precision, but the mechanisms by which the cellular machinery achieves such specificity are incompletely understood. Here, we describe a deep neural network that accurately predicts splice junctions from an arbitrary pre-mRNA transcript sequence, enabling precise prediction of noncoding genetic variants that cause cryptic splicing. Synonymous and intronic mutations with predicted splice-altering consequence validate at a high rate on RNA-seq and are strongly deleterious in the human population. De novo mutations with predicted splice-altering consequence are significantly enriched in patients with autism and intellectual disability compared to healthy controls and validate against RNA-seq in 21 out of 28 of these patients. We estimate that 9%-11% of pathogenic mutations in patients with rare genetic disorders are caused by this previously underappreciated class of disease variation.


Subject(s)
Forecasting/methods , RNA Precursors/genetics , RNA Splicing/genetics , Algorithms , Alternative Splicing/genetics , Autistic Disorder/genetics , Deep Learning , Exons/genetics , Humans , Intellectual Disability/genetics , Introns/genetics , Neural Networks, Computer , RNA Precursors/metabolism , RNA Splice Sites/genetics , RNA Splice Sites/physiology
4.
Cell ; 176(3): 663-675.e19, 2019 01 24.
Article in English | MEDLINE | ID: mdl-30661756

ABSTRACT

In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.


Subject(s)
Gene Frequency/genetics , Genome, Human/genetics , Genomic Structural Variation/genetics , Alleles , Euchromatin/genetics , Genomics/methods , Humans , Minisatellite Repeats/genetics , Sequence Analysis, DNA/methods
5.
Cell ; 169(7): 1177-1186, 2017 Jun 15.
Article in English | MEDLINE | ID: mdl-28622505

ABSTRACT

A central goal of genetics is to understand the links between genetic variation and disease. Intuitively, one might expect disease-causing variants to cluster into key pathways that drive disease etiology. But for complex traits, association signals tend to be spread across most of the genome-including near many genes without an obvious connection to disease. We propose that gene regulatory networks are sufficiently interconnected such that all genes expressed in disease-relevant cells are liable to affect the functions of core disease-related genes and that most heritability can be explained by effects on genes outside core pathways. We refer to this hypothesis as an "omnigenic" model.


Subject(s)
Disease/genetics , Multifactorial Inheritance , Animals , Genetic Diseases, Inborn/genetics , Genome-Wide Association Study , Genomics , Humans , Polymorphism, Single Nucleotide
6.
Mol Cell ; 82(24): 4681-4699.e8, 2022 12 15.
Article in English | MEDLINE | ID: mdl-36435176

ABSTRACT

Long introns with short exons in vertebrate genes are thought to require spliceosome assembly across exons (exon definition), rather than introns, thereby requiring transcription of an exon to splice an upstream intron. Here, we developed CoLa-seq (co-transcriptional lariat sequencing) to investigate the timing and determinants of co-transcriptional splicing genome wide. Unexpectedly, 90% of all introns, including long introns, can splice before transcription of a downstream exon, indicating that exon definition is not obligatory for most human introns. Still, splicing timing varies dramatically across introns, and various genetic elements determine this variation. Strong U2AF2 binding to the polypyrimidine tract predicts early splicing, explaining exon definition-independent splicing. Together, our findings question the essentiality of exon definition and reveal features beyond intron and exon length that are determinative for splicing timing.


Subject(s)
Alternative Splicing , RNA Splicing , Humans , Base Sequence , Introns/genetics , Exons/genetics
7.
Nature ; 608(7923): 569-577, 2022 08.
Article in English | MEDLINE | ID: mdl-35922514

ABSTRACT

A major challenge in human genetics is to identify the molecular mechanisms of trait-associated and disease-associated variants. To achieve this, quantitative trait locus (QTL) mapping of genetic variants with intermediate molecular phenotypes such as gene expression and splicing have been widely adopted1,2. However, despite successes, the molecular basis for a considerable fraction of trait-associated and disease-associated variants remains unclear3,4. Here we show that ADAR-mediated adenosine-to-inosine RNA editing, a post-transcriptional event vital for suppressing cellular double-stranded RNA (dsRNA)-mediated innate immune interferon responses5-11, is an important potential mechanism underlying genetic variants associated with common inflammatory diseases. We identified and characterized 30,319 cis-RNA editing QTLs (edQTLs) across 49 human tissues. These edQTLs were significantly enriched in genome-wide association study signals for autoimmune and immune-mediated diseases. Colocalization analysis of edQTLs with disease risk loci further pinpointed key, putatively immunogenic dsRNAs formed by expected inverted repeat Alu elements as well as unexpected, highly over-represented cis-natural antisense transcripts. Furthermore, inflammatory disease risk variants, in aggregate, were associated with reduced editing of nearby dsRNAs and induced interferon responses in inflammatory diseases. This unique directional effect agrees with the established mechanism that lack of RNA editing by ADAR1 leads to the specific activation of the dsRNA sensor MDA5 and subsequent interferon responses and inflammation7-9. Our findings implicate cellular dsRNA editing and sensing as a previously underappreciated mechanism of common inflammatory diseases.


Subject(s)
Adenosine Deaminase , Genetic Predisposition to Disease , Immune System Diseases , Inflammation , RNA Editing , RNA, Double-Stranded , Adenosine/metabolism , Adenosine Deaminase/genetics , Adenosine Deaminase/metabolism , Alu Elements/genetics , Autoimmune Diseases/genetics , Autoimmune Diseases/immunology , Autoimmune Diseases/pathology , Genome-Wide Association Study , Humans , Immune System Diseases/genetics , Immune System Diseases/immunology , Immune System Diseases/pathology , Immunity, Innate , Inflammation/genetics , Inflammation/immunology , Inflammation/pathology , Inosine/metabolism , Interferon-Induced Helicase, IFIH1/metabolism , Interferons/genetics , Interferons/immunology , Quantitative Trait Loci/genetics , RNA Editing/genetics , RNA, Double-Stranded/genetics , RNA-Binding Proteins/metabolism
8.
Genome Res ; 31(4): 698-712, 2021 04.
Article in English | MEDLINE | ID: mdl-33741686

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) technology is poised to replace bulk cell RNA sequencing for many biological and medical applications as it allows users to measure gene expression levels in a cell type-specific manner. However, data produced by scRNA-seq often exhibit batch effects that can be specific to a cell type, to a sample, or to an experiment, which prevent integration or comparisons across multiple experiments. Here, we present Dmatch, a method that leverages an external expression atlas of human primary cells and kernel density matching to align multiple scRNA-seq experiments for downstream biological analysis. Dmatch facilitates alignment of scRNA-seq data sets with cell types that may overlap only partially and thus allows integration of multiple distinct scRNA-seq experiments to extract biological insights. In simulation, Dmatch compares favorably to other alignment methods, both in terms of reducing sample-specific clustering and in terms of avoiding overcorrection. When applied to scRNA-seq data collected from clinical samples in a healthy individual and five autoimmune disease patients, Dmatch enabled cell type-specific differential gene expression comparisons across biopsy sites and disease conditions and uncovered a shared population of pro-inflammatory monocytes across biopsy sites in RA patients. We further show that Dmatch increases the number of eQTLs mapped from population scRNA-seq data. Dmatch is fast, scalable, and improves the utility of scRNA-seq for several important applications. Dmatch is freely available online.


Subject(s)
RNA-Seq/methods , Single-Cell Analysis/methods , Cluster Analysis , Gene Expression Profiling , Humans
9.
PLoS Genet ; 15(4): e1008045, 2019 04.
Article in English | MEDLINE | ID: mdl-31002671

ABSTRACT

Quantification of gene expression levels at the single cell level has revealed that gene expression can vary substantially even across a population of homogeneous cells. However, it is currently unclear what genomic features control variation in gene expression levels, and whether common genetic variants may impact gene expression variation. Here, we take a genome-wide approach to identify expression variance quantitative trait loci (vQTLs). To this end, we generated single cell RNA-seq (scRNA-seq) data from induced pluripotent stem cells (iPSCs) derived from 53 Yoruba individuals. We collected data for a median of 95 cells per individual and a total of 5,447 single cells, and identified 235 mean expression QTLs (eQTLs) at 10% FDR, of which 79% replicate in bulk RNA-seq data from the same individuals. We further identified 5 vQTLs at 10% FDR, but demonstrate that these can also be explained as effects on mean expression. Our study suggests that dispersion QTLs (dQTLs) which could alter the variance of expression independently of the mean can have larger fold changes, but explain less phenotypic variance than eQTLs. We estimate 4,015 individuals as a lower bound to achieve 80% power to detect the strongest dQTLs in iPSCs. These results will guide the design of future studies on understanding the genetic control of gene expression variance.


Subject(s)
Induced Pluripotent Stem Cells/metabolism , Quantitative Trait Loci , Black People/genetics , Cell Line , Computer Simulation , Gene Expression Profiling , Genetic Variation , Genome-Wide Association Study , Humans , Models, Genetic , Nigeria , Phenotype , Sequence Analysis, RNA , Single-Cell Analysis
10.
Genome Res ; 28(1): 122-131, 2018 01.
Article in English | MEDLINE | ID: mdl-29208628

ABSTRACT

Induced pluripotent stem cells (iPSCs) are an essential tool for studying cellular differentiation and cell types that are otherwise difficult to access. We investigated the use of iPSCs and iPSC-derived cells to study the impact of genetic variation on gene regulation across different cell types and as models for studies of complex disease. To do so, we established a panel of iPSCs from 58 well-studied Yoruba lymphoblastoid cell lines (LCLs); 14 of these lines were further differentiated into cardiomyocytes. We characterized regulatory variation across individuals and cell types by measuring gene expression levels, chromatin accessibility, and DNA methylation. Our analysis focused on a comparison of inter-individual regulatory variation across cell types. While most cell-type-specific regulatory quantitative trait loci (QTLs) lie in chromatin that is open only in the affected cell types, we found that 20% of cell-type-specific regulatory QTLs are in shared open chromatin. This observation motivated us to develop a deep neural network to predict open chromatin regions from DNA sequence alone. Using this approach, we were able to use the sequences of segregating haplotypes to predict the effects of common SNPs on cell-type-specific chromatin accessibility.


Subject(s)
Cell Differentiation , Chromatin Assembly and Disassembly , Chromatin/metabolism , DNA Methylation , Genetic Loci , Induced Pluripotent Stem Cells/metabolism , Myocytes, Cardiac/metabolism , Cell Line , Chromatin/genetics , Humans , Induced Pluripotent Stem Cells/cytology , Myocytes, Cardiac/cytology
11.
Bioinformatics ; 36(17): 4609-4615, 2020 11 01.
Article in English | MEDLINE | ID: mdl-32315392

ABSTRACT

MOTIVATION: Next-generation sequencing is rapidly improving diagnostic rates in rare Mendelian diseases, but even with whole genome or whole exome sequencing, the majority of cases remain unsolved. Increasingly, RNA sequencing is being used to solve many cases that evade diagnosis through sequencing alone. Specifically, the detection of aberrant splicing in many rare disease patients suggests that identifying RNA splicing outliers is particularly useful for determining causal Mendelian disease genes. However, there is as yet a paucity of statistical methodologies to detect splicing outliers. RESULTS: We developed LeafCutterMD, a new statistical framework that significantly improves the previously published LeafCutter in the context of detecting outlier splicing events. Through simulations and analysis of real patient data, we demonstrate that LeafCutterMD has better power than the state-of-the-art methodology while controlling false-positive rates. When applied to a cohort of disease-affected probands from the Mayo Clinic Center for Individualized Medicine, LeafCutterMD recovered all aberrantly spliced genes that had previously been identified by manual curation efforts. AVAILABILITY AND IMPLEMENTATION: The source code for this method is available under the opensource Apache 2.0 license in the latest release of the LeafCutter software package available online at http://davidaknowles.github.io/leafcutter. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome , Rare Diseases , Algorithms , High-Throughput Nucleotide Sequencing , Humans , RNA Splicing , Rare Diseases/diagnosis , Rare Diseases/genetics , Sequence Analysis, RNA , Software
12.
Nature ; 513(7518): 375-381, 2014 Sep 18.
Article in English | MEDLINE | ID: mdl-25186727

ABSTRACT

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.


Subject(s)
Cichlids/classification , Cichlids/genetics , Evolution, Molecular , Genetic Speciation , Genome/genetics , Africa, Eastern , Animals , DNA Transposable Elements/genetics , Gene Duplication/genetics , Gene Expression Regulation/genetics , Genomics , Lakes , MicroRNAs/genetics , Phylogeny , Polymorphism, Genetic/genetics
13.
Genome Res ; 25(1): 1-13, 2015 Jan.
Article in English | MEDLINE | ID: mdl-25524026

ABSTRACT

Ninety-four percent of mammalian protein-coding exons exceed 51 nucleotides (nt) in length. The paucity of micro-exons (≤ 51 nt) suggests that their recognition and correct processing by the splicing machinery present greater challenges than for longer exons. Yet, because thousands of human genes harbor processed micro-exons, specialized mechanisms may be in place to promote their splicing. Here, we survey deep genomic data sets to define 13,085 micro-exons and to study their splicing mechanisms and molecular functions. More than 60% of annotated human micro-exons exhibit a high level of sequence conservation, an indicator of functionality. While most human micro-exons require splicing-enhancing genomic features to be processed, the splicing of hundreds of micro-exons is enhanced by the adjacent binding of splice factors in the introns of pre-messenger RNAs. Notably, splicing of a significant number of micro-exons was found to be facilitated by the binding of RBFOX proteins, which promote their inclusion in the brain, muscle, and heart. Our analyses suggest that accurate regulation of micro-exon inclusion by RBFOX proteins and PTBP1 plays an important role in the maintenance of tissue-specific protein-protein interactions.


Subject(s)
Alternative Splicing , Exons , Heterogeneous-Nuclear Ribonucleoproteins/metabolism , Polypyrimidine Tract-Binding Protein/metabolism , RNA-Binding Proteins/metabolism , Animals , Brain/metabolism , Chromosome Mapping , Conserved Sequence , Gene Expression Regulation , Genomics , Heterogeneous-Nuclear Ribonucleoproteins/genetics , Humans , Introns , Mice , Nucleotides/genetics , Polypyrimidine Tract-Binding Protein/genetics , Protein Interaction Domains and Motifs , RNA Splicing Factors , RNA, Messenger , RNA-Binding Proteins/genetics
14.
Nat Methods ; 12(6): 519-22, 2015 Jun.
Article in English | MEDLINE | ID: mdl-25915121

ABSTRACT

The simultaneous sequencing of a single cell's genome and transcriptome offers a powerful means to dissect genetic variation and its effect on gene expression. Here we describe G&T-seq, a method for separating and sequencing genomic DNA and full-length mRNA from single cells. By applying G&T-seq to over 220 single cells from mice and humans, we discovered cellular properties that could not be inferred from DNA or RNA sequencing alone.


Subject(s)
DNA/genetics , Genomics/methods , Nucleic Acid Amplification Techniques/methods , RNA, Messenger/genetics , Animals , Cell Line, Tumor , Humans , Mice
15.
Bioinformatics ; 29(2): 160-5, 2013 Jan 15.
Article in English | MEDLINE | ID: mdl-23162087

ABSTRACT

MOTIVATION: The ready availability of next-generation sequencing has led to a situation where it is easy to produce very fragmentary genome assemblies. We present a pipeline, SWiPS (Scaffolding With Protein Sequences), that uses orthologous proteins to improve low quality genome assemblies. The protein sequences are used as guides to scaffold existing contigs, while simultaneously allowing the gene structure to be predicted by homology. RESULTS: To perform, SWiPS does not depend on a high N50 or whole proteins being encoded on a single contig. We tested our algorithm on simulated next-generation data from Ciona intestinalis, real next-generation data from Drosophila melanogaster, a complex genome assembly of Homo sapiens and the low coverage Sanger sequence assembly of Callorhinchus milii. The improvements in N50 are of the order of ∼20% for the C.intestinalis and H.sapiens assemblies, which is significant, considering the large size of intergenic regions in these eukaryotes. Using the CEGMA pipeline to assess the gene space represented in the genome assemblies, the number of genes retrieved increased by >110% for C.milii and from 20 to 40% for C.intestinalis. The scaffold error rates are low: 85-90% of scaffolds are fully correct, and >95% of local contig joins are correct. AVAILABILITY: SWiPS is available freely for download at http://www.well.ox.ac.uk/∼yli142/swips.html. CONTACT: yang.li@well.ox.ac.uk or copley@well.ox.ac.uk


Subject(s)
Algorithms , Genomics/methods , Sequence Homology, Amino Acid , Animals , Ciona intestinalis , Contig Mapping , Drosophila melanogaster/genetics , Fishes/genetics , High-Throughput Nucleotide Sequencing , Humans , Proteins/genetics , Sequence Analysis, DNA
16.
bioRxiv ; 2023 Oct 16.
Article in English | MEDLINE | ID: mdl-37745605

ABSTRACT

Alternative splicing (AS) is pervasive in human genes, yet the specific function of most AS events remains unknown. It is widely assumed that the primary function of AS is to diversify the proteome, however AS can also influence gene expression levels by producing transcripts rapidly degraded by nonsense-mediated decay (NMD). Currently, there are no precise estimates for how often the coupling of AS and NMD (AS-NMD) impacts gene expression levels because rapidly degraded NMD transcripts are challenging to capture. To better understand the impact of AS on gene expression levels, we analyzed population-scale genomic data in lymphoblastoid cell lines across eight molecular assays that capture gene regulation before, during, and after transcription and cytoplasmic decay. Sequencing nascent mRNA transcripts revealed frequent aberrant splicing of human introns, which results in remarkably high levels of mRNA transcripts subject to NMD. We estimate that ~15% of all protein-coding transcripts are degraded by NMD, and this estimate increases to nearly half of all transcripts for lowly-expressed genes with many introns. Leveraging genetic variation across cell lines, we find that GWAS trait-associated loci explained by AS are similarly likely to associate with NMD-induced expression level differences as with differences in protein isoform usage. Additionally, we used the splice-switching drug risdiplam to perturb AS at hundreds of genes, finding that ~3/4 of the splicing perturbations induce NMD. Thus, we conclude that AS-NMD substantially impacts the expression levels of most human genes. Our work further suggests that much of the molecular impact of AS is mediated by changes in protein expression levels rather than diversification of the proteome.

17.
Nat Genet ; 55(3): 461-470, 2023 03.
Article in English | MEDLINE | ID: mdl-36797366

ABSTRACT

Obesity-associated morbidity is exacerbated by abdominal obesity, which can be measured as the waist-to-hip ratio adjusted for the body mass index (WHRadjBMI). Here we identify genes associated with obesity and WHRadjBMI and characterize allele-sensitive enhancers that are predicted to regulate WHRadjBMI genes in women. We found that several waist-to-hip ratio-associated variants map within primate-specific Alu retrotransposons harboring a DNA motif associated with adipocyte differentiation. This suggests that a genetic component of adipose distribution in humans may involve co-option of retrotransposons as adipose enhancers. We evaluated the role of the strongest female WHRadjBMI-associated gene, SNX10, in adipose biology. We determined that it is required for human adipocyte differentiation and function and participates in diet-induced adipose expansion in female mice, but not males. Our data identify genes and regulatory mechanisms that underlie female-specific adipose distribution and mediate metabolic dysfunction in women.


Subject(s)
Obesity , Retroelements , Humans , Female , Animals , Mice , Obesity/genetics , Obesity/metabolism , Adiposity/genetics , Body Mass Index , Waist-Hip Ratio , Adipose Tissue/metabolism , Sorting Nexins/genetics , Sorting Nexins/metabolism
18.
Genome Biol ; 23(1): 103, 2022 04 21.
Article in English | MEDLINE | ID: mdl-35449021

ABSTRACT

Recent progress in deep learning has greatly improved the prediction of RNA splicing from DNA sequence. Here, we present Pangolin, a deep learning model to predict splice site strength in multiple tissues. Pangolin outperforms state-of-the-art methods for predicting RNA splicing on a variety of prediction tasks. Pangolin improves prediction of the impact of genetic variants on RNA splicing, including common, rare, and lineage-specific genetic variation. In addition, Pangolin identifies loss-of-function mutations with high accuracy and recall, particularly for mutations that are not missense or nonsense, demonstrating remarkable potential for identifying pathogenic variants.


Subject(s)
Pangolins , RNA Splicing , Animals , Base Sequence , Mutation , RNA Splice Sites
19.
Genome Biol ; 22(1): 291, 2021 10 14.
Article in English | MEDLINE | ID: mdl-34649612

ABSTRACT

BACKGROUND: Alternative cleavage and polyadenylation (APA), an RNA processing event, occurs in over 70% of human protein-coding genes. APA results in mRNA transcripts with distinct 3' ends. Most APA occurs within 3' UTRs, which harbor regulatory elements that can impact mRNA stability, translation, and localization. RESULTS: APA can be profiled using a number of established computational tools that infer polyadenylation sites from standard, short-read RNA-seq datasets. Here, we benchmarked a number of such tools-TAPAS, QAPA, DaPars2, GETUTR, and APATrap- against 3'-Seq, a specialized RNA-seq protocol that enriches for reads at the 3' ends of genes, and Iso-Seq, a Pacific Biosciences (PacBio) single-molecule full-length RNA-seq method in their ability to identify polyadenylation sites and quantify polyadenylation site usage. We demonstrate that 3'-Seq and Iso-Seq are able to identify and quantify the usage of polyadenylation sites more reliably than computational tools that take short-read RNA-seq as input. However, we find that running one such tool, QAPA, with a set of polyadenylation site annotations derived from small quantities of 3'-Seq or Iso-Seq can reliably quantify variation in APA across conditions, such asacross genotypes, as demonstrated by the successful mapping of alternative polyadenylation quantitative trait loci (apaQTL). CONCLUSIONS: We envisage that our analyses will shed light on the advantages of studying APA with more specialized sequencing protocols, such as 3'-Seq or Iso-Seq, and the limitations of studying APA with short-read RNA-seq. We provide a computational pipeline to aid in the identification of polyadenylation sites and quantification of polyadenylation site usages using Iso-Seq data as input.


Subject(s)
Polyadenylation , RNA-Seq , Software , Benchmarking , Cell Line , Genome, Human , Humans
20.
Genome Biol ; 22(1): 122, 2021 04 29.
Article in English | MEDLINE | ID: mdl-33926512

ABSTRACT

BACKGROUND: The vast majority of trait-associated variants identified using genome-wide association studies (GWAS) are noncoding, and therefore assumed to impact gene regulation. However, the majority of trait-associated loci are unexplained by regulatory quantitative trait loci (QTLs). RESULTS: We perform a comprehensive characterization of the putative mechanisms by which GWAS loci impact human immune traits. By harmonizing four major immune QTL studies, we identify 26,271 expression QTLs (eQTLs) and 23,121 splicing QTLs (sQTLs) spanning 18 immune cell types. Our colocalization analyses between QTLs and trait-associated loci from 72 GWAS reveals that genetic effects on RNA expression and splicing in immune cells colocalize with 40.4% of GWAS loci for immune-related traits, in many cases increasing the fraction of colocalized loci by two fold compared to previous studies. Notably, we find that the largest contributors of this increase are splicing QTLs, which colocalize on average with 14% of all GWAS loci that do not colocalize with eQTLs. By contrast, we find that cell type-specific eQTLs, and eQTLs with small effect sizes contribute very few new colocalizations. To investigate the 60% of GWAS loci that remain unexplained, we collect H3K27ac CUT&Tag data from rheumatoid arthritis and healthy controls, and find large-scale differences between immune cells from the different disease contexts, including at regions overlapping unexplained GWAS loci. CONCLUSION: Altogether, our work supports RNA splicing as an important mediator of genetic effects on immune traits, and suggests that we must expand our study of regulatory processes in disease contexts to improve functional interpretation of as yet unexplained GWAS loci.


Subject(s)
Gene Expression Regulation , Genetic Association Studies , Genetic Variation , Immunity/genetics , Quantitative Trait Loci , Quantitative Trait, Heritable , Arthritis, Rheumatoid/etiology , Arthritis, Rheumatoid/metabolism , Arthritis, Rheumatoid/pathology , Chromosome Mapping , Databases, Nucleic Acid , Disease Susceptibility , Gene Expression Profiling , Genetic Association Studies/methods , Genetic Predisposition to Disease , Genome-Wide Association Study/methods , Histones/metabolism , Humans , Immunomodulation/genetics , Organ Specificity , Transcriptome
SELECTION OF CITATIONS
SEARCH DETAIL