RESUMEN
Pathogenic autoantibodies arise in many autoimmune diseases, but it is not understood how the cells making them evade immune checkpoints. Here, single-cell multi-omics analysis demonstrates a shared mechanism with lymphoid malignancy in the formation of public rheumatoid factor autoantibodies responsible for mixed cryoglobulinemic vasculitis. By combining single-cell DNA and RNA sequencing with serum antibody peptide sequencing and antibody synthesis, rare circulating B lymphocytes making pathogenic autoantibodies were found to comprise clonal trees accumulating mutations. Lymphoma driver mutations in genes regulating B cell proliferation and V(D)J mutation (CARD11, TNFAIP3, CCND3, ID3, BTG2, and KLHL6) were present in rogue B cells producing the pathogenic autoantibody. Antibody V(D)J mutations conferred pathogenicity by causing the antigen-bound autoantibodies to undergo phase transition to insoluble aggregates at lower temperatures. These results reveal a pre-neoplastic stage in human lymphomagenesis and a cascade of somatic mutations leading to an iconic pathogenic autoantibody.
Asunto(s)
Autoanticuerpos/genética , Enfermedades Autoinmunes/genética , Linfocitos B/inmunología , Linfoma/genética , Animales , Autoanticuerpos/inmunología , Enfermedades Autoinmunes/inmunología , Enfermedades Autoinmunes/patología , Linfocitos B/patología , Proteínas Adaptadoras de Señalización CARD/genética , Proteínas Portadoras/genética , Evolución Clonal/genética , Evolución Clonal/inmunología , Ciclina D3/genética , Guanilato Ciclasa/genética , Humanos , Proteínas Inmediatas-Precoces/genética , Región Variable de Inmunoglobulina/genética , Región Variable de Inmunoglobulina/inmunología , Proteínas Inhibidoras de la Diferenciación/genética , Linfoma/inmunología , Linfoma/patología , Ratones , Mutación/genética , Mutación/inmunología , Proteínas de Neoplasias/genética , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Análisis de la Célula Individual/métodos , Proteína 3 Inducida por el Factor de Necrosis Tumoral alfa/genética , Proteínas Supresoras de Tumor/genética , Recombinación V(D)J/genéticaRESUMEN
Assembly and publication of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genome in January 2020 enabled the immediate development of tests to detect the new virus. This began the largest global testing programme in history, in which hundreds of millions of individuals have been tested to date. The unprecedented scale of testing has driven innovation in the strategies, technologies and concepts that govern testing in public health. This Review describes the changing role of testing during the COVID-19 pandemic, including the use of genomic surveillance to track SARS-CoV-2 transmission around the world, the use of contact tracing to contain disease outbreaks and testing for the presence of the virus circulating in the environment. Despite these efforts, widespread community transmission has become entrenched in many countries and has required the testing of populations to identify and isolate infected individuals, many of whom are asymptomatic. The diagnostic and epidemiological principles that underpin such population-scale testing are also considered, as are the high-throughput and point-of-care technologies that make testing feasible on a massive scale.
Asunto(s)
COVID-19 , Pandemias , Salud Pública , SARS-CoV-2 , COVID-19/diagnóstico , COVID-19/epidemiología , COVID-19/genética , COVID-19/transmisión , Humanos , SARS-CoV-2/genética , SARS-CoV-2/patogenicidadRESUMEN
The human mitochondrial genome comprises a distinct genetic system transcribed as precursor polycistronic transcripts that are subsequently cleaved to generate individual mRNAs, tRNAs, and rRNAs. Here, we provide a comprehensive analysis of the human mitochondrial transcriptome across multiple cell lines and tissues. Using directional deep sequencing and parallel analysis of RNA ends, we demonstrate wide variation in mitochondrial transcript abundance and precisely resolve transcript processing and maturation events. We identify previously undescribed transcripts, including small RNAs, and observe the enrichment of several nuclear RNAs in mitochondria. Using high-throughput in vivo DNaseI footprinting, we establish the global profile of DNA-binding protein occupancy across the mitochondrial genome at single-nucleotide resolution, revealing regulatory features at mitochondrial transcription initiation sites and functional insights into disease-associated variants. This integrated analysis of the mitochondrial transcriptome reveals unexpected complexity in the regulation, expression, and processing of mitochondrial RNA and provides a resource for future studies of mitochondrial function (accessed at http://mitochondria.matticklab.com).
Asunto(s)
Perfilación de la Expresión Génica , Mitocondrias/genética , ARN/análisis , Núcleo Celular/metabolismo , Huella de ADN , Proteínas de Unión al ADN/análisis , Desoxirribonucleasa I/metabolismo , Regulación de la Expresión Génica , Genoma Mitocondrial , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Región de Control de Posición , Proteínas Mitocondriales/análisis , Conformación de Ácido Nucleico , ARN/metabolismo , ARN Mitocondrial , Análisis de Secuencia de ARNRESUMEN
Next-generation sequencing (NGS) provides a broad investigation of the genome, and it is being readily applied for the diagnosis of disease-associated genetic features. However, the interpretation of NGS data remains challenging owing to the size and complexity of the genome and the technical errors that are introduced during sample preparation, sequencing and analysis. These errors can be understood and mitigated through the use of reference standards - well-characterized genetic materials or synthetic spike-in controls that help to calibrate NGS measurements and to evaluate diagnostic performance. The informed use of reference standards, and associated statistical principles, ensures rigorous analysis of NGS data and is essential for its future clinical use.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/normas , Análisis de Secuencia de ADN/normas , Animales , Humanos , Estándares de ReferenciaRESUMEN
PURPOSE: Branchpoint elements are required for intron removal, and variants at these elements can result in aberrant splicing. We aimed to assess the value of branchpoint annotations generated from recent large-scale studies to select branchpoint-abrogating variants, using hereditary cancer genes as model. METHODS: We identified branchpoint elements in 119 genes associated with hereditary cancer from 3 genome-wide experimentally-inferred and 2 predicted branchpoint data sets. We then identified variants that occur within branchpoint elements from public databases. We compared conservation, unique variant observations, and population frequencies at different nucleotides within branchpoint motifs. Finally, selected minigene assays were performed to assess the splicing effect of variants at branchpoint elements within mismatch repair genes. RESULTS: There was poor overlap between predicted and experimentally-inferred branchpoints. Our analysis of cancer genes suggested that variants at -2 nucleotide, -1 nucleotide, and branchpoint positions in experimentally-inferred canonical motifs are more likely to be clinically relevant. Minigene assay data showed the -2 nucleotide to be more important to branchpoint motif integrity but also showed fluidity in branchpoint usage. CONCLUSION: Data from cancer gene analysis suggest that there are few high-risk alleles that severely impact function via branchpoint abrogation. Results of this study inform a general scheme to prioritize branchpoint motif variants for further study.
Asunto(s)
Neoplasias , Empalme del ARN , Genes Relacionados con las Neoplasias , Humanos , Intrones/genética , Neoplasias/genética , Empalme del ARN/genéticaRESUMEN
The idea that stem cell therapies work only via cell replacement is challenged by the observation of consistent intercellular molecule exchange between the graft and the host. Here we defined a mechanism of cellular signaling by which neural stem/precursor cells (NPCs) communicate with the microenvironment via extracellular vesicles (EVs), and we elucidated its molecular signature and function. We observed cytokine-regulated pathways that sort proteins and mRNAs into EVs. We described induction of interferon gamma (IFN-γ) pathway in NPCs exposed to proinflammatory cytokines that is mirrored in EVs. We showed that IFN-γ bound to EVs through Ifngr1 activates Stat1 in target cells. Finally, we demonstrated that endogenous Stat1 and Ifngr1 in target cells are indispensable to sustain the activation of Stat1 signaling by EV-associated IFN-γ/Ifngr1 complexes. Our study identifies a mechanism of cellular signaling regulated by EV-associated IFN-γ/Ifngr1 complexes, which grafted stem cells may use to communicate with the host immune system.
Asunto(s)
Interferón gamma/metabolismo , Células-Madre Neurales/citología , Receptores de Interferón/metabolismo , Vesículas Transportadoras/metabolismo , Células 3T3 , Animales , Transporte Biológico , Comunicación Celular , Microambiente Celular , Inflamación/inmunología , Interferón gamma/biosíntesis , Interferón gamma/genética , Ratones , Células-Madre Neurales/trasplante , ARN Mensajero , Receptores de Interferón/genética , Factor de Transcripción STAT1/biosíntesis , Factor de Transcripción STAT1/genética , Factor de Transcripción STAT1/metabolismo , Transducción de Señal , Células TH1/metabolismo , Células Th2/metabolismo , Receptor de Interferón gammaRESUMEN
The combination of pervasive transcription and prolific alternative splicing produces a mammalian transcriptome of great breadth and diversity. The majority of transcribed genomic bases are intronic, antisense, or intergenic to protein-coding genes, yielding a plethora of short and long non-protein-coding regulatory RNAs. Long noncoding RNAs (lncRNAs) share most aspects of their biogenesis, processing, and regulation with mRNAs. However, lncRNAs are typically expressed in more restricted patterns, frequently from enhancers, and exhibit almost universal alternative splicing. These features are consistent with their role as modular epigenetic regulators. We describe here the key studies and technological advances that have shaped our understanding of the dimensions, dynamics, and biological relevance of the mammalian noncoding transcriptome.
Asunto(s)
ARN no Traducido/genética , Transcriptoma , Empalme Alternativo , Animales , Exones , HumanosRESUMEN
BACKGROUND: The androgen-regulated gene TMPRSS2 to the ETS transcription factor gene ERG fusion is the most common genomic alteration acquired during prostate tumorigenesis and biased toward men of European ancestry. In contrast, African American men present with more advanced disease, yet their tumors are less likely to acquire TMPRSS2-ERG. Data for Africa is scarce. METHODS: RNA was made available for genomic analyses from 181 prostate tissue biopsy cores from Black South African men, 94 with and 87 without pathological evidence for prostate cancer. Reverse transcription polymerase chain reaction was used to screen for the TMPRSS2-ERG fusion, while transcript junction coordinates and isoform frequencies, including novel gene fusions, were determined using targeted RNA sequencing. RESULTS: Here we report a frequency of 13% for TMPRSS2-ERG in tumors from Black South Africans. Present in 12/94 positive versus 1/87 cancer negative prostate tissue cores, this suggests a 92.62% predictivity for a positive cancer diagnosis (P = 0.0031). At a frequency of almost half that reported for African Americans and roughly a quarter of that reported for men of European ancestry, acquisition of TMPRSS2-ERG appears to be inversely associated with aggressive prostate cancer. Further support was provided by linking the presence of TMPRSS2-ERG to low-grade disease in younger patients (P = 0.0466), with higher expressing distal ERG fusion junction coordinates. CONCLUSIONS: Only the second study of its kind for the African continent, we support a link between TMPRSS2-ERG status and prostate cancer racial health disparity beyond the borders of the United States. We call for urgent evaluation of androgen deprivation therapy within Africa.
Asunto(s)
Fusión de Oncogenes/genética , Neoplasias de la Próstata/genética , Serina Endopeptidasas/genética , Adulto , Anciano , Anciano de 80 o más Años , Población Negra , Inestabilidad Genómica , Disparidades en el Estado de Salud , Humanos , Masculino , Persona de Mediana Edad , Próstata/patología , Neoplasias de la Próstata/patología , Sudáfrica , Regulador Transcripcional ERG/genética , Población BlancaRESUMEN
Targeted RNA sequencing (CaptureSeq) uses oligonucleotide probes to capture RNAs for sequencing, providing enriched read coverage, accurate measurement of gene expression, and quantitative expression data. We applied CaptureSeq to refine transcript annotations in the current murine GRCm38 assembly. More than 23,000 regions corresponding to putative or annotated long noncoding RNAs (lncRNAs) and 154,281 known splicing junction sites were selected for targeted sequencing across five mouse tissues and three brain subregions. The results illustrate that the mouse transcriptome is considerably more complex than previously thought. We assemble more complete transcript isoforms than GENCODE, expand transcript boundaries, and connect interspersed islands of mapped reads. We describe a novel filtering pipeline that identifies previously unannotated but high-quality transcript isoforms. In this set, 911 GENCODE neighboring genes are condensed into 400 expanded gene models. Additionally, 594 GENCODE lncRNAs acquire an open reading frame (ORF) when their structure is extended with CaptureSeq. Finally, we validate our observations using current FANTOM and Mouse ENCODE resources.
Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , ARN Largo no Codificante/biosíntesis , ARN Largo no Codificante/genética , Transcriptoma , Animales , RatonesRESUMEN
The identification of genetic variation with next-generation sequencing is confounded by the complexity of the human genome sequence and by biases that arise during library preparation, sequencing and analysis. We have developed a set of synthetic DNA standards, termed 'sequins', that emulate human genetic features and constitute qualitative and quantitative spike-in controls for genome sequencing. Sequencing reads derived from sequins align exclusively to an artificial in silico reference chromosome, rather than the human reference genome, which allows them them to be partitioned for parallel analysis. Here we use this approach to represent common and clinically relevant genetic variation, ranging from single nucleotide variants to large structural rearrangements and copy-number variation. We validate the design and performance of sequin standards by comparison to examples in the NA12878 reference genome, and we demonstrate their utility during the detection and quantification of variants. We provide sequins as a standardized, quantitative resource against which human genetic variation can be measured and diagnostic performance assessed.
Asunto(s)
Variaciones en el Número de Copia de ADN , ADN/genética , Genoma Humano , Genómica/métodos , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN/métodos , Cromosomas Artificiales/química , Cromosomas Artificiales/genética , ADN/síntesis química , ADN/química , Humanos , Estándares de Referencia , Análisis de Secuencia de ADN/normasRESUMEN
RNA sequencing (RNA-seq) can be used to assemble spliced isoforms, quantify expressed genes and provide a global profile of the transcriptome. However, the size and diversity of the transcriptome, the wide dynamic range in gene expression and inherent technical biases confound RNA-seq analysis. We have developed a set of spike-in RNA standards, termed 'sequins' (sequencing spike-ins), that represent full-length spliced mRNA isoforms. Sequins have an entirely artificial sequence with no homology to natural reference genomes, but they align to gene loci encoded on an artificial in silico chromosome. The combination of multiple sequins across a range of concentrations emulates alternative splicing and differential gene expression, and it provides scaling factors for normalization between samples. We demonstrate the use of sequins in RNA-seq experiments to measure sample-specific biases and determine the limits of reliable transcript assembly and quantification in accompanying human RNA samples. In addition, we have designed a complementary set of sequins that represent fusion genes arising from rearrangements of the in silico chromosome to aid in cancer diagnosis. RNA sequins provide a qualitative and quantitative reference with which to navigate the complexity of the human transcriptome.
Asunto(s)
Perfilación de la Expresión Génica/normas , Genes Sintéticos , Empalme del ARN , ARN Mensajero/genética , Análisis de Secuencia de ARN/normas , Cromosomas Artificiales , Humanos , Control de Calidad , Empalme del ARN/genética , ARN Mensajero/síntesis química , ARN Mensajero/química , Estándares de Referencia , Análisis de Secuencia de ARN/métodosRESUMEN
Motivation: The branchpoint element is required for the first lariat-forming reaction in splicing. However current catalogues of human branchpoints remain incomplete due to the difficulty in experimentally identifying these splicing elements. To address this limitation, we have developed a machine-learning algorithm-branchpointer-to identify branchpoint elements solely from gene annotations and genomic sequence. Results: Using branchpointer, we annotate branchpoint elements in 85% of human gene introns with sensitivity (61.8%) and specificity (97.8%). In addition to annotation, branchpointer can evaluate the impact of SNPs on branchpoint architecture to inform functional interpretation of genetic variants. Branchpointer identifies all published deleterious branchpoint mutations annotated in clinical variant databases, and finds thousands of additional clinical and common genetic variants with similar predicted effects. This genome-wide annotation of branchpoints provides a reference for the genetic analysis of splicing, and the interpretation of noncoding variation. Availability and implementation: Branchpointer is written and implemented in the statistical programming language R and is freely available under a BSD license as a package through Bioconductor. Contact: b.signal@garvan.org.au or t.mercer@garvan.org. Supplementary information: Supplementary data are available at Bioinformatics online.
Asunto(s)
Genoma Humano , Intrones , Aprendizaje Automático , Anotación de Secuencia Molecular , Empalme del ARN , Análisis de Secuencia de ADN/métodos , Variación Genética , Humanos , Sensibilidad y Especificidad , Programas InformáticosRESUMEN
During the splicing reaction, the 5' intron end is joined to the branchpoint nucleotide, selecting the next exon to incorporate into the mature RNA and forming an intron lariat, which is excised. Despite a critical role in gene splicing, the locations and features of human splicing branchpoints are largely unknown. We use exoribonuclease digestion and targeted RNA-sequencing to enrich for sequences that traverse the lariat junction and, by split and inverted alignment, reveal the branchpoint. We identify 59,359 high-confidence human branchpoints in >10,000 genes, providing a first map of splicing branchpoints in the human genome. Branchpoints are predominantly adenosine, highly conserved, and closely distributed to the 3' splice site. Analysis of human branchpoints reveals numerous novel features, including distinct features of branchpoints for alternatively spliced exons and a family of conserved sequence motifs overlapping branchpoints we term B-boxes, which exhibit maximal nucleotide diversity while maintaining interactions with the keto-rich U2 snRNA. Different B-box motifs exhibit divergent usage in vertebrate lineages and associate with other splicing elements and distinct intron-exon architectures, suggesting integration within a broader regulatory splicing code. Lastly, although branchpoints are refractory to common mutational processes and genetic variation, mutations occurring at branchpoint nucleotides are enriched for disease associations.
Asunto(s)
Secuencia de Consenso , Genómica , Intrones , Empalme del ARN , Empalme Alternativo , Animales , Biología Computacional/métodos , Evolución Molecular , Exones , Variación Genética , Genómica/métodos , Humanos , Motivos de Nucleótidos , Posición Específica de Matrices de Puntuación , Sitios de Empalme de ARNRESUMEN
We compared quantitative RT-PCR (qRT-PCR), RNA-seq and capture sequencing (CaptureSeq) in terms of their ability to assemble and quantify long noncoding RNAs and novel coding exons across 20 human tissues. CaptureSeq was superior for the detection and quantification of genes with low expression, showed little technical variation and accurately measured differential expression. This approach expands and refines previous annotations and simultaneously generates an expression atlas.
Asunto(s)
Perfilación de la Expresión Génica , ARN Largo no Codificante/genética , ARN/genética , Análisis de Secuencia/métodos , Humanos , Células K562 , Reacción en Cadena de la Polimerasa , ARN/sangre , ARN/químicaRESUMEN
SUMMARY: Spike-in controls are synthetic nucleic-acid sequences that are added to a user's sample and constitute internal standards for subsequent steps in the next generation sequencing workflow. AVAILABILITY AND IMPLEMENTATION: : The software is implemented in C ++/R and is freely available under BSD license. The source code is available from github.com/student-t/Anaquin , binaries and user manual from www.sequin.xyz/software and R package from bioconductor.org/packages/Anaquin. CONTACT: anaquin@garvan.org.au or t.mercer@garvan.org.au. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , HumanosRESUMEN
An expansive functionality and complexity has been ascribed to the majority of the human genome that was unanticipated at the outset of the draft sequence and assembly a decade ago. We are now faced with the challenge of integrating and interpreting this complexity in order to achieve a coherent view of genome biology. We argue that the linear representation of the genome exacerbates this complexity and an understanding of its three-dimensional structure is central to interpreting the regulatory and transcriptional architecture of the genome. Chromatin conformation capture techniques and high-resolution microscopy have afforded an emergent global view of genome structure within the nucleus. Chromosomes fold into complex, territorialized three-dimensional domains in concert with specialized subnuclear bodies that harbor concentrations of transcription and splicing machinery. The signature of these folds is retained within the layered regulatory landscapes annotated by chromatin immunoprecipitation, and we propose that genome contacts are reflected in the organization and expression of interweaved networks of overlapping coding and noncoding transcripts. This pervasive impact of genome structure favors a preeminent role for the nucleoskeleton and RNA in regulating gene expression by organizing these folds and contacts. Accordingly, we propose that the local and global three-dimensional structure of the genome provides a consistent, integrated, and intuitive framework for interpreting and understanding the regulatory and transcriptional complexity of the human genome.
Asunto(s)
Regulación de la Expresión Génica , Genoma , Transcripción Genética , Animales , Núcleo Celular/genética , Núcleo Celular/metabolismo , Cromatina/metabolismo , Impresión Genómica , Humanos , Matriz Nuclear/metabolismo , Pliegue del ARNRESUMEN
In mammals and other eukaryotes most of the genome is transcribed in a developmentally regulated manner to produce large numbers of long non-coding RNAs (ncRNAs). Here we review the rapidly advancing field of long ncRNAs, describing their conservation, their organization in the genome and their roles in gene regulation. We also consider the medical implications, and the emerging recognition that any transcript, regardless of coding potential, can have an intrinsic function as an RNA.
Asunto(s)
ARN no Traducido/genética , ARN no Traducido/metabolismo , Animales , Regulación de la Expresión Génica , Humanos , ARN no Traducido/química , Transcripción GenéticaRESUMEN
The expression of genes encompasses their transcription into mRNA followed by translation into protein. In recent years, next-generation sequencing and mass spectrometry methods have profiled DNA, RNA and protein abundance in cells. However, there are currently no reference standards that are compatible across these genomic, transcriptomic and proteomic methods, and provide an integrated measure of gene expression. Here, we use synthetic biology principles to engineer a multi-omics control, termed pREF, that can act as a universal molecular standard for next-generation sequencing and mass spectrometry methods. The pREF sequence encodes 21 synthetic genes that can be in vitro transcribed into spike-in mRNA controls, and in vitro translated to generate matched protein controls. The synthetic genes provide qualitative controls that can measure sensitivity and quantitative accuracy of DNA, RNA and peptide detection. We demonstrate the use of pREF in metagenome DNA sequencing and RNA sequencing experiments and evaluate the quantification of proteins using mass spectrometry. Unlike previous spike-in controls, pREF can be independently propagated and the synthetic mRNA and protein controls can be sustainably prepared by recipient laboratories using common molecular biology techniques. Together, this provides a universal synthetic standard able to integrate genomic, transcriptomic and proteomic methods.
Asunto(s)
ADN , Proteómica , ARN Mensajero/genética , ARN Mensajero/metabolismo , ADN/genética , Genómica , ARNRESUMEN
BACKGROUND: Accurate bacterial genome annotations provide a framework to understanding cellular functions, behavior and pathogenicity and are essential for metabolic engineering. Annotations based only on in silico predictions are inaccurate, particularly for large, high G + C content genomes due to the lack of similarities in gene length and gene organization to model organisms. RESULTS: Here we describe a 2D systems biology driven re-annotation of the Saccharopolyspora erythraea genome using proteogenomics, a genome-scale metabolic reconstruction, RNA-sequencing and small-RNA-sequencing. We observed transcription of more than 300 intergenic regions, detected 59 peptides in intergenic regions, confirmed 164 open reading frames previously annotated as hypothetical proteins and reassigned function to open reading frames using the genome-scale metabolic reconstruction. Finally, we present a novel way of mapping ribosomal binding sites across the genome by sequencing small RNAs. CONCLUSIONS: The work presented here describes a novel framework for annotation of the Saccharopolyspora erythraea genome. Based on experimental observations, the 2D annotation framework greatly reduces errors that are commonly made when annotating large-high G + C content genomes using computational prediction algorithms.