Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 29
Filtrar
1.
Nat Rev Genet ; 19(5): 325, 2018 05.
Artículo en Inglés | MEDLINE | ID: mdl-29430012

RESUMEN

This corrects the article DOI: 10.1038/nrg.2017.113.

2.
Nat Rev Genet ; 19(4): 208-219, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29379135

RESUMEN

Next-generation sequencing has made major strides in the past decade. Studies based on large sequencing data sets are growing in number, and public archives for raw sequencing data have been doubling in size every 18 months. Leveraging these data requires researchers to use large-scale computational resources. Cloud computing, a model whereby users rent computers and storage from large data centres, is a solution that is gaining traction in genomics research. Here, we describe how cloud computing is used in genomics for research and large-scale collaborations, and argue that its elasticity, reproducibility and privacy features make it ideally suited for the large-scale reanalysis of publicly available archived data, including privacy-protected data.


Asunto(s)
Nube Computacional , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , Biología Computacional , Humanos
3.
Bioinformatics ; 37(21): 3723-3733, 2021 11 05.
Artículo en Inglés | MEDLINE | ID: mdl-34478497

RESUMEN

MOTIVATION: Proteasomal cleavage is a key component in protein turnover, as well as antigen processing and presentation. Although tools for proteasomal cleavage prediction are available, they vary widely in their performance, options and availability. RESULTS: Herein, we present pepsickle, an open-source tool for proteasomal cleavage prediction with better in vivo prediction performance (area under the curve) and computational speed than current models available in the field and with the ability to predict sites based on both constitutive and immunoproteasome profiles. Post hoc filtering of predicted patient neoepitopes using pepsickle significantly enriches for immune-responsive epitopes and may improve current epitope prediction and vaccine development pipelines. AVAILABILITY AND IMPLEMENTATION: pepsickle is open source and available at https://github.com/pdxgx/pepsickle. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Antígenos , Complejo de la Endopetidasa Proteasomal , Humanos , Complejo de la Endopetidasa Proteasomal/metabolismo , Epítopos , Proteolisis
4.
J Virol ; 94(13)2020 06 16.
Artículo en Inglés | MEDLINE | ID: mdl-32303592

RESUMEN

Genetic variability across the three major histocompatibility complex (MHC) class I genes (human leukocyte antigen A [HLA-A], -B, and -C genes) may affect susceptibility to and severity of the disease caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), the virus responsible for coronavirus disease 2019 (COVID-19). We performed a comprehensive in silico analysis of viral peptide-MHC class I binding affinity across 145 HLA-A, -B, and -C genotypes for all SARS-CoV-2 peptides. We further explored the potential for cross-protective immunity conferred by prior exposure to four common human coronaviruses. The SARS-CoV-2 proteome was successfully sampled and was represented by a diversity of HLA alleles. However, we found that HLA-B*46:01 had the fewest predicted binding peptides for SARS-CoV-2, suggesting that individuals with this allele may be particularly vulnerable to COVID-19, as they were previously shown to be for SARS (M. Lin, H.-T. Tseng, J. A. Trejaut, H.-L. Lee, et al., BMC Med Genet 4:9, 2003, https://bmcmedgenet.biomedcentral.com/articles/10.1186/1471-2350-4-9). Conversely, we found that HLA-B*15:03 showed the greatest capacity to present highly conserved SARS-CoV-2 peptides that are shared among common human coronaviruses, suggesting that it could enable cross-protective T-cell-based immunity. Finally, we reported global distributions of HLA types with potential epidemiological ramifications in the setting of the current pandemic.IMPORTANCE Individual genetic variation may help to explain different immune responses to a virus across a population. In particular, understanding how variation in HLA may affect the course of COVID-19 could help identify individuals at higher risk from the disease. HLA typing can be fast and inexpensive. Pairing HLA typing with COVID-19 testing where feasible could improve assessment of severity of viral disease in the population. Following the development of a vaccine against SARS-CoV-2, the virus that causes COVID-19, individuals with high-risk HLA types could be prioritized for vaccination.


Asunto(s)
Betacoronavirus/inmunología , Infecciones por Coronavirus/virología , Prueba de Histocompatibilidad/métodos , Neumonía Viral/virología , Secuencia de Aminoácidos , COVID-19 , Prueba de COVID-19 , Técnicas de Laboratorio Clínico , Infecciones por Coronavirus/diagnóstico , Infecciones por Coronavirus/inmunología , Epítopos de Linfocito T/inmunología , Variación Genética , Genotipo , Haplotipos , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Clase I/inmunología , Humanos , Inmunidad Innata/inmunología , Pandemias , Neumonía Viral/inmunología , SARS-CoV-2 , Linfocitos T/inmunología
5.
Bioinformatics ; 36(3): 713-720, 2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31424527

RESUMEN

MOTIVATION: The vast majority of tools for neoepitope prediction from DNA sequencing of complementary tumor and normal patient samples do not consider germline context or the potential for the co-occurrence of two or more somatic variants on the same mRNA transcript. Without consideration of these phenomena, existing approaches are likely to produce both false-positive and false-negative results, resulting in an inaccurate and incomplete picture of the cancer neoepitope landscape. We developed neoepiscope chiefly to address this issue for single nucleotide variants (SNVs) and insertions/deletions (indels). RESULTS: Herein, we illustrate how germline and somatic variant phasing affects neoepitope prediction across multiple datasets. We estimate that up to ∼5% of neoepitopes arising from SNVs and indels may require variant phasing for their accurate assessment. neoepiscope is performant, flexible and supports several major histocompatibility complex binding affinity prediction tools. AVAILABILITY AND IMPLEMENTATION: neoepiscope is available on GitHub at https://github.com/pdxgx/neoepiscope under the MIT license. Scripts for reproducing results described in the text are available at https://github.com/pdxgx/neoepiscope-paper under the MIT license. Additional data from this study, including summaries of variant phasing incidence and benchmarking wallclock times, are available in Supplementary Files 1, 2 and 3. Supplementary File 1 contains Supplementary Table 1, Supplementary Figures 1 and 2, and descriptions of Supplementary Tables 2-8. Supplementary File 2 contains Supplementary Tables 2-6 and 8. Supplementary File 3 contains Supplementary Table 7. Raw sequencing data used for the analyses in this manuscript are available from the Sequence Read Archive under accessions PRJNA278450, PRJNA312948, PRJNA307199, PRJNA343789, PRJNA357321, PRJNA293912, PRJNA369259, PRJNA305077, PRJNA306070, PRJNA82745 and PRJNA324705; from the European Genome-phenome Archive under accessions EGAD00001004352 and EGAD00001002731; and by direct request to the authors. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Genoma , Humanos , Mutación INDEL , Análisis de Secuencia de ADN
6.
Proc Natl Acad Sci U S A ; 114(27): 7130-7135, 2017 07 03.
Artículo en Inglés | MEDLINE | ID: mdl-28634288

RESUMEN

RNA sequencing (RNA-seq) is a powerful approach for measuring gene expression levels in cells and tissues, but it relies on high-quality RNA. We demonstrate here that statistical adjustment using existing quality measures largely fails to remove the effects of RNA degradation when RNA quality associates with the outcome of interest. Using RNA-seq data from molecular degradation experiments of human primary tissues, we introduce a method-quality surrogate variable analysis (qSVA)-as a framework for estimating and removing the confounding effect of RNA quality in differential expression analysis. We show that this approach results in greatly improved replication rates (>3×) across two large independent postmortem human brain studies of schizophrenia and also removes potential RNA quality biases in earlier published work that compared expression levels of different brain regions and other diagnostic groups. Our approach can therefore improve the interpretation of differential expression analysis of transcriptomic data from human tissue.


Asunto(s)
ARN/análisis , Análisis de Secuencia de ARN/métodos , Algoritmos , Animales , Biología Computacional , Replicación del ADN , Perfilación de la Expresión Génica , Regulación de la Expresión Génica , Genotipo , Sustancia Gris , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , ARN/genética , Esquizofrenia/genética , Esquizofrenia/metabolismo , Transcriptoma
7.
Bioinformatics ; 34(1): 114-116, 2018 01 01.
Artículo en Inglés | MEDLINE | ID: mdl-28968689

RESUMEN

Motivation: As more and larger genomics studies appear, there is a growing need for comprehensive and queryable cross-study summaries. These enable researchers to leverage vast datasets that would otherwise be difficult to obtain. Results: Snaptron is a search engine for summarized RNA sequencing data with a query planner that leverages R-tree, B-tree and inverted indexing strategies to rapidly execute queries over 146 million exon-exon splice junctions from over 70 000 human RNA-seq samples. Queries can be tailored by constraining which junctions and samples to consider. Snaptron can score junctions according to tissue specificity or other criteria, and can score samples according to the relative frequency of different splicing patterns. We describe the software and outline biological questions that can be explored with Snaptron queries. Availability and implementation: Documentation is at http://snaptron.cs.jhu.edu. Source code is at https://github.com/ChristopherWilks/snaptron and https://github.com/ChristopherWilks/snaptron-experiments with a CC BY-NC 4.0 license. Contact: chris.wilks@jhu.edu or langmea@cs.jhu.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Empalme del ARN , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Exones , Humanos
8.
Nucleic Acids Res ; 45(2): e9, 2017 01 25.
Artículo en Inglés | MEDLINE | ID: mdl-27694310

RESUMEN

Differential expression analysis of RNA sequencing (RNA-seq) data typically relies on reconstructing transcripts or counting reads that overlap known gene structures. We previously introduced an intermediate statistical approach called differentially expressed region (DER) finder that seeks to identify contiguous regions of the genome showing differential expression signal at single base resolution without relying on existing annotation or potentially inaccurate transcript assembly.We present the derfinder software that improves our annotation-agnostic approach to RNA-seq analysis by: (i) implementing a computationally efficient bump-hunting approach to identify DERs that permits genome-scale analyses in a large number of samples, (ii) introducing a flexible statistical modeling framework, including multi-group and time-course analyses and (iii) introducing a new set of data visualizations for expressed region analysis. We apply this approach to public RNA-seq data from the Genotype-Tissue Expression (GTEx) project and BrainSpan project to show that derfinder permits the analysis of hundreds of samples at base resolution in R, identifies expression outside of known gene boundaries and can be used to visualize expressed regions at base-resolution. In simulations, our base resolution approaches enable discovery in the presence of incomplete annotation and is nearly as powerful as feature-level methods when the annotation is complete.derfinder analysis using expressed region-level and single base-level approaches provides a compromise between full transcript reconstruction and feature-level analysis. The package is available from Bioconductor at www.bioconductor.org/packages/derfinder.


Asunto(s)
Perfilación de la Expresión Génica/métodos , Programas Informáticos , Regulación de la Expresión Génica , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento , Anotación de Secuencia Molecular , Especificidad de Órganos/genética , Transcriptoma , Navegador Web
9.
Bioinformatics ; 33(24): 4033-4040, 2017 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-27592709

RESUMEN

MOTIVATION: RNA sequencing (RNA-seq) experiments now span hundreds to thousands of samples. Current spliced alignment software is designed to analyze each sample separately. Consequently, no information is gained from analyzing multiple samples together, and it requires extra work to obtain analysis products that incorporate data from across samples. RESULTS: We describe Rail-RNA, a cloud-enabled spliced aligner that analyzes many samples at once. Rail-RNA eliminates redundant work across samples, making it more efficient as samples are added. For many samples, Rail-RNA is more accurate than annotation-assisted aligners. We use Rail-RNA to align 667 RNA-seq samples from the GEUVADIS project on Amazon Web Services in under 16 h for US$0.91 per sample. Rail-RNA outputs alignments in SAM/BAM format; but it also outputs (i) base-level coverage bigWigs for each sample; (ii) coverage bigWigs encoding normalized mean and median coverages at each base across samples analyzed; and (iii) exon-exon splice junctions and indels (features) in columnar formats that juxtapose coverages in samples in which a given feature is found. Supplementary outputs are ready for use with downstream packages for reproducible statistical analysis. We use Rail-RNA to identify expressed regions in the GEUVADIS samples and show that both annotated and unannotated (novel) expressed regions exhibit consistent patterns of variation across populations and with respect to known confounding variables. AVAILABILITY AND IMPLEMENTATION: Rail-RNA is open-source software available at http://rail.bio. CONTACTS: anellore@gmail.com or langmea@cs.jhu.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Empalme del ARN , Alineación de Secuencia/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Exones , Perfilación de la Expresión Génica
10.
BMC Cancer ; 18(1): 414, 2018 04 13.
Artículo en Inglés | MEDLINE | ID: mdl-29653567

RESUMEN

BACKGROUND: Tumor neoantigens are drivers of cancer immunotherapy response; however, current prediction tools produce many candidates requiring further prioritization. Additional filtration criteria and population-level understanding may assist with prioritization. Herein, we show neoepitope immunogenicity is related to measures of peptide novelty and report population-level behavior of these and other metrics. METHODS: We propose four peptide novelty metrics to refine predicted neoantigenicity: tumor vs. paired normal peptide binding affinity difference, tumor vs. paired normal peptide sequence similarity, tumor vs. closest human peptide sequence similarity, and tumor vs. closest microbial peptide sequence similarity. We apply these metrics to neoepitopes predicted from somatic missense mutations in The Cancer Genome Atlas (TCGA) and a cohort of melanoma patients, and to a group of peptides with neoepitope-specific immune response data using an extension of pVAC-Seq (Hundal et al., pVAC-Seq: a genome-guided in silico approach to identifying tumor neoantigens. Genome Med 8:11, 2016). RESULTS: We show neoepitope burden varies across TCGA diseases and HLA alleles, with surprisingly low repetition of neoepitope sequences across patients or neoepitope preferences among sets of HLA alleles. Only 20.3% of predicted neoepitopes across TCGA patients displayed novel binding change based on our binding affinity difference criteria. Similarity of amino acid sequence was typically high between paired tumor-normal epitopes, but in 24.6% of cases, neoepitopes were more similar to other human peptides, or bacterial (56.8% of cases) or viral peptides (15.5% of cases), than their paired normal counterparts. Applied to peptides with neoepitope-specific immune response, a linear model incorporating neoepitope binding affinity, protein sequence similarity between neoepitopes and their closest viral peptides, and paired binding affinity difference was able to predict immunogenicity (AUROC = 0.66). CONCLUSIONS: Our proposed prioritization criteria emphasize neoepitope novelty and refine patient neoepitope predictions for focus on biologically meaningful candidate neoantigens. We have demonstrated that neoepitopes should be considered not only with respect to their paired normal epitope, but to the entire human proteome, and bacterial and viral peptides, with potential implications for neoepitope immunogenicity and personalized vaccines for cancer treatment. We conclude that putative neoantigens are highly variable across individuals as a function of cancer genetics and personalized HLA repertoire, while the overall behavior of filtration criteria reflects predictable patterns.


Asunto(s)
Antígenos de Neoplasias/inmunología , Epítopos/inmunología , Neoplasias/inmunología , Alelos , Secuencia de Aminoácidos , Antígenos de Neoplasias/genética , Mapeo Epitopo , Epítopos/química , Epítopos/genética , Genómica/métodos , Humanos , Inmunoterapia , Neoplasias/genética , Neoplasias/terapia , Péptidos/química , Péptidos/genética , Péptidos/inmunología , Curva ROC
11.
Bioinformatics ; 32(16): 2551-3, 2016 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-27153614

RESUMEN

MOTIVATION: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. RESULTS: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise. AVAILABILITY AND IMPLEMENTATION: Rail-RNA is available from http://rail.bio Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/ CONTACTS: : anellore@gmail.com or langmea@cs.jhu.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Programas Informáticos , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , ARN , Reproducibilidad de los Resultados
12.
Bioinform Adv ; 3(1): vbad020, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-36874953

RESUMEN

Summary: Thousands of DNA methylation (DNAm) array samples from human blood are publicly available on the Gene Expression Omnibus (GEO), but they remain underutilized for experiment planning, replication and cross-study and cross-platform analyses. To facilitate these tasks, we augmented our recountmethylation R/Bioconductor package with 12 537 uniformly processed EPIC and HM450K blood samples on GEO as well as several new features. We subsequently used our updated package in several illustrative analyses, finding (i) study ID bias adjustment increased variation explained by biological and demographic variables, (ii) most variation in autosomal DNAm was explained by genetic ancestry and CD4+ T-cell fractions and (iii) the dependence of power to detect differential methylation on sample size was similar for each of peripheral blood mononuclear cells (PBMC), whole blood and umbilical cord blood. Finally, we used PBMC and whole blood to perform independent validations, and we recovered 38-46% of differentially methylated probes between sexes from two previously published epigenome-wide association studies. Availability and implementation: Source code to reproduce the main results are available on GitHub (repo: recountmethylation_flexible-blood-analysis_manuscript; url: https://github.com/metamaden/recountmethylation_flexible-blood-analysis_manuscript). All data was publicly available and downloaded from the Gene Expression Omnibus (https://www.ncbi.nlm.nih.gov/geo/). Compilations of the analyzed public data can be accessed from the website recount.bio/data (preprocessed HM450K array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/; preprocessed EPIC array data: https://recount.bio/data/remethdb_h5se-gm_epic_0-0-2_1589820348/). Supplementary information: Supplementary data are available at Bioinformatics Advances online.

13.
HLA ; 99(6): 607-613, 2022 06.
Artículo en Inglés | MEDLINE | ID: mdl-35118818

RESUMEN

HLA is a critical component of the viral antigen presentation pathway. We investigated the relationship between the severity of SARS-CoV-2 disease and HLA type in 3235 individuals with confirmed SARS-CoV-2 infection. We found only the DPB1 locus to be associated with the binary outcome of whether an individual developed any COVID-19 symptoms. The number of peptides predicted to bind to an HLA allele had no significant relationship with disease severity both when stratifying individuals by ancestry or age and in a pooled analysis. Overall, at the population level, we found HLA type is significantly less predictive of COVID-19 disease severity than certain demographic factors and clinical comorbidities.


Asunto(s)
COVID-19 , Alelos , Genotipo , Hospitalización , Humanos , SARS-CoV-2
14.
Genome Biol ; 23(1): 240, 2022 11 11.
Artículo en Inglés | MEDLINE | ID: mdl-36369064

RESUMEN

BACKGROUND: There is growing interest in retained introns in a variety of disease contexts including cancer and aging. Many software tools have been developed to detect retained introns from short RNA-seq reads, but reliable detection is complicated by overlapping genes and transcripts as well as the presence of unprocessed or partially processed RNAs. RESULTS: We compared introns detected by 8 tools using short RNA-seq reads with introns observed in long RNA-seq reads from the same biological specimens. We found significant disagreement among tools (Fleiss' [Formula: see text]) such that 47.7% of all detected intron retentions were not called by more than one tool. We also observed poor performance of all tools, with none achieving an F1-score greater than 0.26, and qualitatively different behaviors between general-purpose alternative splicing detection tools and tools confined to retained intron detection. CONCLUSIONS: Short-read tools detect intron retention with poor recall and precision, calling into question the completeness and validity of a large percentage of putatively retained introns called by commonly used methods.


Asunto(s)
Empalme Alternativo , Programas Informáticos , Intrones , RNA-Seq , Análisis de Secuencia de ARN/métodos
15.
NAR Genom Bioinform ; 3(2): lqab025, 2021 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-33937763

RESUMEN

While DNA methylation (DNAm) is the most-studied epigenetic mark, few recent studies probe the breadth of publicly available DNAm array samples. We collectively analyzed 35 360 Illumina Infinium HumanMethylation450K DNAm array samples published on the Gene Expression Omnibus. We learned a controlled vocabulary of sample labels by applying regular expressions to metadata and used existing models to predict various sample properties including epigenetic age. We found approximately two-thirds of samples were from blood, one-quarter were from brain and one-third were from cancer patients. About 19% of samples failed at least one of Illumina's 17 prescribed quality assessments; signal distributions across samples suggest modifying manufacturer-recommended thresholds for failure would make these assessments more informative. We further analyzed DNAm variances in seven tissues (adipose, nasal, blood, brain, buccal, sperm and liver) and characterized specific probes distinguishing them. Finally, we compiled DNAm array data and metadata, including our learned and predicted sample labels, into database files accessible via the recountmethylation R/Bioconductor companion package. Its vignettes walk the user through some analyses contained in this paper.

16.
Genome Biol ; 22(1): 323, 2021 11 29.
Artículo en Inglés | MEDLINE | ID: mdl-34844637

RESUMEN

We present recount3, a resource consisting of over 750,000 publicly available human and mouse RNA sequencing (RNA-seq) samples uniformly processed by our new Monorail analysis pipeline. To facilitate access to the data, we provide the recount3 and snapcount R/Bioconductor packages as well as complementary web resources. Using these tools, data can be downloaded as study-level summaries or queried for specific exon-exon junctions, genes, samples, or other features. Monorail can be used to process local and/or private data, allowing results to be directly compared to any study in recount3. Taken together, our tools help biologists maximize the utility of publicly available RNA-seq data, especially to improve their understanding of newly collected data. recount3 is available from http://rna.recount.bio .


Asunto(s)
Empalme del ARN , RNA-Seq/métodos , ARN/genética , Animales , Secuencia de Bases , Biología Computacional/métodos , Exones , Regulación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Ratones , Análisis de Secuencia de ARN/métodos , Programas Informáticos
17.
NAR Cancer ; 2(1): zcaa001, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-34316681

RESUMEN

This study probes the distribution of putatively cancer-specific junctions across a broad set of publicly available non-cancer human RNA sequencing (RNA-seq) datasets. We compared cancer and non-cancer RNA-seq data from The Cancer Genome Atlas (TCGA), the Genotype-Tissue Expression (GTEx) Project and the Sequence Read Archive. We found that (i) averaging across cancer types, 80.6% of exon-exon junctions thought to be cancer-specific based on comparison with tissue-matched samples (σ = 13.0%) are in fact present in other adult non-cancer tissues throughout the body; (ii) 30.8% of junctions not present in any GTEx or TCGA normal tissues are shared by multiple samples within at least one cancer type cohort, and 87.4% of these distinguish between different cancer types; and (iii) many of these junctions not found in GTEx or TCGA normal tissues (15.4% on average, σ = 2.4%) are also found in embryological and other developmentally associated cells. These findings refine the meaning of RNA splicing event novelty, particularly with respect to the human neoepitope repertoire. Ultimately, cancer-specific exon-exon junctions may have a substantial causal relationship with the biology of disease.

18.
Genome Med ; 12(1): 33, 2020 03 30.
Artículo en Inglés | MEDLINE | ID: mdl-32228719

RESUMEN

BACKGROUND: Tumor mutational burden (TMB; the quantity of aberrant nucleotide sequences a given tumor may harbor) has been associated with response to immune checkpoint inhibitor therapy and is gaining broad acceptance as a result. However, TMB harbors intrinsic variability across cancer types, and its assessment and interpretation are poorly standardized. METHODS: Using a standardized approach, we quantify the robustness of TMB as a metric and its potential as a predictor of immunotherapy response and survival among a diverse cohort of cancer patients. We also explore the additive predictive potential of RNA-derived variants and neoepitope burden, incorporating several novel metrics of immunogenic potential. RESULTS: We find that TMB is a partial predictor of immunotherapy response in melanoma and non-small cell lung cancer, but not renal cell carcinoma. We find that TMB is predictive of overall survival in melanoma patients receiving immunotherapy, but not in an immunotherapy-naive population. We also find that it is an unstable metric with potentially problematic repercussions for clinical cohort classification. We finally note minimal additional predictive benefit to assessing neoepitope burden or its bulk derivatives, including RNA-derived sources of neoepitopes. CONCLUSIONS: We find sufficient cause to suggest that the predictive clinical value of TMB should not be overstated or oversimplified. While it is readily quantified, TMB is at best a limited surrogate biomarker of immunotherapy response. The data do not support isolated use of TMB in renal cell carcinoma.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas/genética , Inhibidores de Puntos de Control Inmunológico/uso terapéutico , Neoplasias Pulmonares/genética , Melanoma/genética , Acumulación de Mutaciones , Antígenos de Neoplasias/genética , Antígenos de Neoplasias/inmunología , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/inmunología , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Epítopos/genética , Epítopos/inmunología , Humanos , Neoplasias Pulmonares/tratamiento farmacológico , Melanoma/tratamiento farmacológico
19.
Sci Rep ; 10(1): 15429, 2020 09 22.
Artículo en Inglés | MEDLINE | ID: mdl-32963314

RESUMEN

Mucosal Associated Invariant T (MAIT) cells can sense intracellular infection by a broad array of pathogens. These cells are activated upon encountering microbial antigen(s) displayed by MR1 on the surface of an infected cell. Human MR1 undergoes alternative splicing. The full-length isoform, MR1A, can activate MAIT cells, while the function of the isoforms, MR1B and MR1C, are incompletely understood. In this report, we sought to characterize the expression and function of these splice variants. Using a transcriptomic analysis in conjunction with qPCR, we find that that MR1A and MR1B transcripts are widely expressed. However only MR1A can present mycobacterial antigen to MAIT cells. Coexpression of MR1B with MR1A decreases MAIT cell activation following bacterial infection. Additionally, expression of MR1B prior to MR1A lowers total MR1A abundance, suggesting competition between MR1A and MR1B for either ligands or chaperones required for folding and/or trafficking. Finally, we evaluated CD4/CD8 double positive thymocytes expressing surface MR1. Here, we find that relative expression of MR1A/MR1B transcript is associated with the prevalence of MR1 + CD4/CD8 cells in the thymus. Our results suggest alternative splicing of MR1 represents a means of regulating MAIT activation in response to microbial ligand(s).


Asunto(s)
Empalme Alternativo/genética , Empalme Alternativo/inmunología , Presentación de Antígeno/genética , Presentación de Antígeno/inmunología , Antígenos de Histocompatibilidad Clase I/genética , Antígenos de Histocompatibilidad Menor/genética , Células T Invariantes Asociadas a Mucosa/inmunología , Células A549 , Linfocitos T CD4-Positivos/inmunología , Linfocitos T CD8-positivos/inmunología , Línea Celular , Línea Celular Tumoral , Células HEK293 , Antígenos de Histocompatibilidad Clase I/inmunología , Humanos , Ligandos , Activación de Linfocitos/genética , Activación de Linfocitos/inmunología , Antígenos de Histocompatibilidad Menor/inmunología , Isoformas de Proteínas/genética , Isoformas de Proteínas/inmunología , Transporte de Proteínas/genética , Transporte de Proteínas/inmunología , Timocitos/inmunología , Transcriptoma/genética , Transcriptoma/inmunología
20.
Nat Commun ; 11(1): 137, 2020 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-31919425

RESUMEN

Public archives of next-generation sequencing data are growing exponentially, but the difficulty of marshaling this data has led to its underutilization by scientists. Here, we present ASCOT, a resource that uses annotation-free methods to rapidly analyze and visualize splice variants across tens of thousands of bulk and single-cell data sets in the public archive. To demonstrate the utility of ASCOT, we identify novel cell type-specific alternative exons across the nervous system and leverage ENCODE and GTEx data sets to study the unique splicing of photoreceptors. We find that PTBP1 knockdown and MSI1 and PCBP2 overexpression are sufficient to activate many photoreceptor-specific exons in HepG2 liver cancer cells. This work demonstrates how large-scale analysis of public RNA-Seq data sets can yield key insights into cell type-specific control of RNA splicing and underscores the importance of considering both annotated and unannotated splicing events.


Asunto(s)
Empalme Alternativo/genética , Biología Computacional/métodos , Análisis de Datos , Células Fotorreceptoras/citología , Sitios de Empalme de ARN/genética , Animales , Línea Celular Tumoral , Expresión Génica/genética , Células Hep G2 , Ribonucleoproteínas Nucleares Heterogéneas/genética , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias Hepáticas/genética , Ratones , Proteínas del Tejido Nervioso/biosíntesis , Proteínas del Tejido Nervioso/genética , Neuronas/citología , Proteína de Unión al Tracto de Polipirimidina/genética , Proteínas de Unión al ARN/biosíntesis , Proteínas de Unión al ARN/genética , Retina/citología , Análisis de Secuencia de ARN/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA