Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
1.
Cell ; 164(4): 805-17, 2016 02 11.
Artículo en Inglés | MEDLINE | ID: mdl-26871637

RESUMEN

While alternative splicing is known to diversify the functional characteristics of some genes, the extent to which protein isoforms globally contribute to functional complexity on a proteomic scale remains unknown. To address this systematically, we cloned full-length open reading frames of alternatively spliced transcripts for a large number of human genes and used protein-protein interaction profiling to functionally compare hundreds of protein isoform pairs. The majority of isoform pairs share less than 50% of their interactions. In the global context of interactome network maps, alternative isoforms tend to behave like distinct proteins rather than minor variants of each other. Interaction partners specific to alternative isoforms tend to be expressed in a highly tissue-specific manner and belong to distinct functional modules. Our strategy, applicable to other functional characteristics, reveals a widespread expansion of protein interaction capabilities through alternative splicing and suggests that many alternative "isoforms" are functionally divergent (i.e., "functional alloforms").


Asunto(s)
Empalme Alternativo , Isoformas de Proteínas/metabolismo , Proteoma/metabolismo , Animales , Clonación Molecular , Evolución Molecular , Humanos , Modelos Moleculares , Sistemas de Lectura Abierta , Dominios y Motivos de Interacción de Proteínas , Mapas de Interacción de Proteínas , Proteoma/análisis
2.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36752347

RESUMEN

Alzheimer's disease (AD) is one of the most challenging neurodegenerative diseases because of its complicated and progressive mechanisms, and multiple risk factors. Increasing research evidence demonstrates that genetics may be a key factor responsible for the occurrence of the disease. Although previous reports identified quite a few AD-associated genes, they were mostly limited owing to patient sample size and selection bias. There is a lack of comprehensive research aimed to identify AD-associated risk mutations systematically. To address this challenge, we hereby construct a large-scale AD mutation and co-mutation framework ('AD-Syn-Net'), and propose deep learning models named Deep-SMCI and Deep-CMCI configured with fully connected layers that are capable of predicting cognitive impairment of subjects effectively based on genetic mutation and co-mutation profiles. Next, we apply the customized frameworks to data sets to evaluate the importance scores of the mutations and identified mutation effectors and co-mutation combination vulnerabilities contributing to cognitive impairment. Furthermore, we evaluate the influence of mutation pairs on the network architecture to dissect the genetic organization of AD and identify novel co-mutations that could be responsible for dementia, laying a solid foundation for proposing future targeted therapy for AD precision medicine. Our deep learning model codes are available open access here: https://github.com/Pan-Bio/AD-mutation-effectors.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Aprendizaje Profundo , Humanos , Enfermedad de Alzheimer/genética , Imagen por Resonancia Magnética , Disfunción Cognitiva/genética , Mutación
3.
Circ Res ; 132(3): 323-338, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36597873

RESUMEN

BACKGROUND: Coronary artery disease (CAD) is the leading cause of death worldwide. Recent meta-analyses of genome-wide association studies have identified over 175 loci associated with CAD. The majority of these loci are in noncoding regions and are predicted to regulate gene expression. Given that vascular smooth muscle cells (SMCs) play critical roles in the development and progression of CAD, we aimed to identify the subset of the CAD loci associated with the regulation of transcription in distinct SMC phenotypes. METHODS: We measured gene expression in SMCs isolated from the ascending aortas of 151 heart transplant donors of various genetic ancestries in quiescent or proliferative conditions and calculated the association of their expression and splicing with ~6.3 million imputed single-nucleotide polymorphism markers across the genome. RESULTS: We identified 4910 expression and 4412 splicing quantitative trait loci (sQTLs) representing regions of the genome associated with transcript abundance and splicing. A total of 3660 expression quantitative trait loci (eQTLs) had not been observed in the publicly available Genotype-Tissue Expression dataset. Further, 29 and 880 eQTLs were SMC-specific and sex-biased, respectively. We made these results available for public query on a user-friendly website. To identify the effector transcript(s) regulated by CAD loci, we used 4 distinct colocalization approaches. We identified 84 eQTL and 164 sQTL that colocalized with CAD loci, highlighting the importance of genetic regulation of mRNA splicing as a molecular mechanism for CAD genetic risk. Notably, 20% and 35% of the eQTLs were unique to quiescent or proliferative SMCs, respectively. One CAD locus colocalized with a sex-specific eQTL (TERF2IP), and another locus colocalized with SMC-specific eQTL (ALKBH8). The most significantly associated CAD locus, 9p21, was an sQTL for the long noncoding RNA CDKN2B-AS1, also known as ANRIL, in proliferative SMCs. CONCLUSIONS: Collectively, our results provide evidence for the molecular mechanisms of genetic susceptibility to CAD in distinct SMC phenotypes.


Asunto(s)
Enfermedad de la Arteria Coronaria , Masculino , Femenino , Humanos , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/metabolismo , Estudio de Asociación del Genoma Completo/métodos , Regulación de la Expresión Génica , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Expresión Génica , Polimorfismo de Nucleótido Simple , Homólogo 8 de AlkB ARNt Metiltransferasa/genética , Homólogo 8 de AlkB ARNt Metiltransferasa/metabolismo
4.
Hum Mol Genet ; 31(R1): R123-R136, 2022 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-35960994

RESUMEN

Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.


Asunto(s)
Genética Humana , Isoformas de ARN , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Isoformas de ARN/genética , ARN Mensajero/genética , Análisis de Secuencia de ARN
5.
RNA Biol ; 19(1): 1228-1243, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-36457147

RESUMEN

Endothelial cells (ECs) comprise the lumenal lining of all blood vessels and are critical for the functioning of the cardiovascular system. Their phenotypes can be modulated by alternative splicing of RNA to produce distinct protein isoforms. To characterize the RNA and protein isoform landscape within ECs, we applied a long read proteogenomics approach to analyse human umbilical vein endothelial cells (HUVECs). Transcripts delineated from PacBio sequencing serve as the basis for a sample-specific protein database used for downstream mass-spectrometry (MS) analysis to infer protein isoform expression. We detected 53,863 transcript isoforms from 10,426 genes, with 22,195 of those transcripts being novel. Furthermore, the predominant isoform in HUVECs does not correspond with the accepted "reference isoform" 25% of the time, with vascular pathway-related genes among this group. We found 2,597 protein isoforms supported through unique peptides, with an additional 2,280 isoforms nominated upon incorporation of long-read transcript evidence. We characterized a novel alternative acceptor for endothelial-related gene CDH5, suggesting potential changes in its associated signalling pathways. Finally, we identified novel protein isoforms arising from a diversity of RNA splicing mechanisms supported by uniquely mapped novel peptides. Our results represent a high-resolution atlas of known and novel isoforms of potential relevance to endothelial phenotypes and function.[Figure: see text].


Asunto(s)
Proteogenómica , Humanos , Células Endoteliales de la Vena Umbilical Humana , Isoformas de Proteínas/genética , Empalme Alternativo , ARN
6.
Trends Biochem Sci ; 42(5): 342-354, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28284537

RESUMEN

Cellular functions are mediated by complex interactome networks of physical, biochemical, and functional interactions between DNA sequences, RNA molecules, proteins, lipids, and small metabolites. A thorough understanding of cellular organization requires accurate and relatively complete models of interactome networks at proteome scale. The recent publication of four human protein-protein interaction (PPI) maps represents a technological breakthrough and an unprecedented resource for the scientific community, heralding a new era of proteome-scale human interactomics. Our knowledge gained from these and complementary studies provides fresh insights into the opportunities and challenges when analyzing systematically generated interactome data, defines a clear roadmap towards the generation of a first reference interactome, and reveals new perspectives on the organization of cellular life.


Asunto(s)
Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Humanos , Unión Proteica , Proteínas/química , Proteómica
7.
J Proteome Res ; 18(9): 3429-3438, 2019 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-31378069

RESUMEN

Peptides detected by tandem mass spectrometry (MS/MS) in bottom-up proteomics serve as proxies for the proteins expressed in the sample. Protein inference is a process routinely applied to these peptides to generate a plausible list of candidate protein identifications. The use of multiple proteases for parallel protein digestions expands sequence coverage, provides additional peptide identifications, and increases the probability of identifying peptides that are unique to a single protein, which are all valuable for protein inference. We have developed and implemented a multi-protease protein inference algorithm in MetaMorpheus, a bottom-up search software program, which incorporates the calculation of protease-specific q-values and preserves the association of peptide sequences and their protease of origin. This integrated multi-protease protein inference algorithm provides more accurate results than either the aggregation of results from the separate analysis of the peptide identifications produced by each protease (separate approach) in MetaMorpheus, or results that are obtained using Fido, ProteinProphet, or DTASelect2. MetaMorpheus' integrated multi-protease data analysis decreases the ambiguity of the protein group list, reduces the frequency of erroneous identifications, and increases the number of post-translational modifications identified, while combining multi-protease search and protein inference into a single software program.


Asunto(s)
Proteínas/aislamiento & purificación , Proteómica , Programas Informáticos , Espectrometría de Masas en Tándem/métodos , Algoritmos , Secuencia de Aminoácidos/genética , Bases de Datos de Proteínas , Péptido Hidrolasas/química , Péptido Hidrolasas/aislamiento & purificación , Péptidos/química , Péptidos/aislamiento & purificación , Proteínas/química
8.
PLoS Genet ; 12(12): e1006466, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27935966

RESUMEN

Human genome-wide association studies (GWAS) have shown that genetic variation at >130 gene loci is associated with type 2 diabetes (T2D). We asked if the expression of the candidate T2D-associated genes within these loci is regulated by a common locus in pancreatic islets. Using an obese F2 mouse intercross segregating for T2D, we show that the expression of ~40% of the T2D-associated genes is linked to a broad region on mouse chromosome (Chr) 2. As all but 9 of these genes are not physically located on Chr 2, linkage to Chr 2 suggests a genomic factor(s) located on Chr 2 regulates their expression in trans. The transcription factor Nfatc2 is physically located on Chr 2 and its expression demonstrates cis linkage; i.e., its expression maps to itself. When conditioned on the expression of Nfatc2, linkage for the T2D-associated genes was greatly diminished, supporting Nfatc2 as a driver of their expression. Plasma insulin also showed linkage to the same broad region on Chr 2. Overexpression of a constitutively active (ca) form of Nfatc2 induced ß-cell proliferation in mouse and human islets, and transcriptionally regulated more than half of the T2D-associated genes. Overexpression of either ca-Nfatc2 or ca-Nfatc1 in mouse islets enhanced insulin secretion, whereas only ca-Nfatc2 was able to promote ß-cell proliferation, suggesting distinct molecular pathways mediating insulin secretion vs. ß-cell proliferation are regulated by NFAT. Our results suggest that many of the T2D-associated genes are downstream transcriptional targets of NFAT, and may act coordinately in a pathway through which NFAT regulates ß-cell proliferation in both mouse and human islets.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Insulina/genética , Factores de Transcripción NFATC/genética , Animales , Proliferación Celular/genética , Mapeo Cromosómico , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patología , Regulación de la Expresión Génica , Ligamiento Genético , Genoma , Estudio de Asociación del Genoma Completo , Humanos , Células Secretoras de Insulina/metabolismo , Células Secretoras de Insulina/patología , Islotes Pancreáticos/metabolismo , Islotes Pancreáticos/patología , Ratones , Ratones Obesos , Factores de Transcripción NFATC/biosíntesis , Regiones Promotoras Genéticas
9.
J Proteome Res ; 15(3): 800-8, 2016 Mar 04.
Artículo en Inglés | MEDLINE | ID: mdl-26704769

RESUMEN

Mass-spectrometry-based proteomic analysis underestimates proteomic variation due to the absence of variant peptides and posttranslational modifications (PTMs) from standard protein databases. Each individual carries thousands of missense mutations that lead to single amino acid variants, but these are missed because they are absent from generic proteomic search databases. Myriad types of protein PTMs play essential roles in biological processes but remain undetected because of increased false discovery rates in variable modification searches. We address these two fundamental shortcomings of bottom-up proteomics with two recently developed software tools. The first consists of workflows in Galaxy that mine RNA sequencing data to generate sample-specific databases containing variant peptides and products of alternative splicing events. The second tool applies a new strategy that alters the variable modification approach to consider only curated PTMs at specific positions, thereby avoiding the combinatorial explosion that traditionally leads to high false discovery rates. Using RNA-sequencing-derived databases with this Global Post-Translational Modification (G-PTM) search strategy revealed hundreds of single amino acid variant peptides, tens of novel splice junction peptides, and several hundred posttranslationally modified peptides in each of ten human cell lines.


Asunto(s)
Secuencia de Bases/genética , Procesamiento Proteico-Postraduccional/genética , Proteómica/métodos , Programas Informáticos , Línea Celular , Bases de Datos de Proteínas , Humanos , Proteogenómica/métodos
10.
J Proteome Res ; 14(11): 4714-20, 2015 Nov 06.
Artículo en Inglés | MEDLINE | ID: mdl-26418581

RESUMEN

Bottom-up proteomics database search algorithms used for peptide identification cannot comprehensively identify post-translational modifications (PTMs) in a single-pass because of high false discovery rates (FDRs). A new approach to database searching enables global PTM (G-PTM) identification by exclusively looking for curated PTMs, thereby avoiding the FDR penalty experienced during conventional variable modification searches. We identified over 2200 unique, high-confidence modified peptides comprising 26 different PTM types in a single-pass database search.


Asunto(s)
Algoritmos , Fragmentos de Péptidos/química , Procesamiento Proteico-Postraduccional , Proteínas/química , Proteómica/métodos , Programas Informáticos , Acetilación , Animales , Minería de Datos/estadística & datos numéricos , Bases de Datos de Proteínas , Humanos , Hidroxilación , Islotes Pancreáticos/química , Islotes Pancreáticos/metabolismo , Células Jurkat , Masculino , Metilación , Ratones , Ratones Endogámicos C57BL , Anotación de Secuencia Molecular , Fragmentos de Péptidos/aislamiento & purificación , Fragmentos de Péptidos/metabolismo , Mapeo Peptídico , Fosforilación , Proteínas/aislamiento & purificación , Proteínas/metabolismo
11.
Bioinformatics ; 30(21): 3136-8, 2014 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-25053745

RESUMEN

UNLABELLED: Single nucleotide variations (SNVs) located within a reading frame can result in single amino acid polymorphisms (SAPs), leading to alteration of the corresponding amino acid sequence as well as function of a protein. Accurate detection of SAPs is an important issue in proteomic analysis at the experimental and bioinformatic level. Herein, we present sapFinder, an R software package, for detection of the variant peptides based on tandem mass spectrometry (MS/MS)-based proteomics data. This package automates the construction of variation-associated databases from public SNV repositories or sample-specific next-generation sequencing (NGS) data and the identification of SAPs through database searching, post-processing and generation of HTML-based report with visualized interface. AVAILABILITY AND IMPLEMENTATION: sapFinder is implemented as a Bioconductor package in R. The package and the vignette can be downloaded at http://bioconductor.org/packages/devel/bioc/html/sapFinder.html and are provided under a GPL-2 license.


Asunto(s)
Sustitución de Aminoácidos , Péptidos/genética , Proteómica/métodos , Programas Informáticos , Variación Genética , Secuenciación de Nucleótidos de Alto Rendimiento , Péptidos/química , Espectrometría de Masas en Tándem
12.
Mol Cell Proteomics ; 12(8): 2341-53, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23629695

RESUMEN

Human proteomic databases required for MS peptide identification are frequently updated and carefully curated, yet are still incomplete because it has been challenging to acquire every protein sequence from the diverse assemblage of proteoforms expressed in every tissue and cell type. In particular, alternative splicing has been shown to be a major source of this cell-specific proteomic variation. Many new alternative splice forms have been detected at the transcript level using next generation sequencing methods, especially RNA-Seq, but it is not known how many of these transcripts are being translated. Leveraging the unprecedented capabilities of next generation sequencing methods, we collected RNA-Seq and proteomics data from the same cell population (Jurkat cells) and created a bioinformatics pipeline that builds customized databases for the discovery of novel splice-junction peptides. Eighty million paired-end Illumina reads and ∼500,000 tandem mass spectra were used to identify 12,873 transcripts (19,320 including isoforms) and 6810 proteins. We developed a bioinformatics workflow to retrieve high-confidence, novel splice junction sequences from the RNA data, translate these sequences into the analogous polypeptide sequence, and create a customized splice junction database for MS searching. Based on the RefSeq gene models, we detected 136,123 annotated and 144,818 unannotated transcript junctions. Of those, 24,834 unannotated junctions passed various quality filters (e.g. minimum read depth) and these entries were translated into 33,589 polypeptide sequences and used for database searching. We discovered 57 splice junction peptides not present in the Uniprot-Trembl proteomic database comprising an array of different splicing events, including skipped exons, alternative donors and acceptors, and noncanonical transcriptional start sites. To our knowledge this is the first example of using sample-specific RNA-Seq data to create a splice-junction database and discover new peptides resulting from alternative splicing.


Asunto(s)
Empalme Alternativo , Péptidos/análisis , Bases de Datos de Proteínas , Humanos , Células Jurkat , Espectrometría de Masas , Péptidos/genética , Proteómica , Análisis de Secuencia de ARN
13.
J Proteome Res ; 13(1): 228-40, 2014 Jan 03.
Artículo en Inglés | MEDLINE | ID: mdl-24175627

RESUMEN

Each individual carries thousands of nonsynonymous single nucleotide variants (nsSNVs) in their genome, each corresponding to a single amino acid polymorphism (SAP) in the encoded proteins. It is important to be able to directly detect and quantify these variations at the protein level to study post-transcriptional regulation, differential allelic expression, and other important biological processes. However, such variant peptides are not generally detected in standard proteomic analyses due to their absence from the generic databases that are employed for mass spectrometry searching. Here we extend previous work that demonstrated the use of customized SAP databases constructed from sample-matched RNA-Seq data. We collected deep-coverage RNA-Seq data from the Jurkat cell line, compiled the set of nsSNVs that are expressed, used this information to construct a customized SAP database, and searched it against deep-coverage shotgun MS data obtained from the same sample. This approach enabled the detection of 421 SAP peptides mapping to 395 nsSNVs. We compared these peptides to peptides identified from a large generic search database containing all known nsSNVs (dbSNP) and found that more than 70% of the SAP peptides from this dbSNP-derived search were not supported by the RNA-Seq data and thus are likely false positives. Next, we increased the SAP coverage from the RNA-Seq derived database by utilizing multiple protease digestions, thereby increasing variant detection to 695 SAP peptides mapping to 504 nsSNV sites. These detected SAP peptides corresponded to moderate to high abundance transcripts (30+ transcripts per million, TPM). The SAP peptides included 192 allelic pairs; the relative expression levels of the two alleles were evaluated for 51 of those pairs and were found to be comparable in all cases.


Asunto(s)
Espectrometría de Masas/métodos , Nucleótidos/química , Péptidos/análisis , Secuencia de Aminoácidos , Secuencia de Bases , Humanos , Células Jurkat , Datos de Secuencia Molecular , Péptidos/química
14.
BMC Genomics ; 15: 703, 2014 Aug 22.
Artículo en Inglés | MEDLINE | ID: mdl-25149441

RESUMEN

BACKGROUND: Current practice in mass spectrometry (MS)-based proteomics is to identify peptides by comparison of experimental mass spectra with theoretical mass spectra derived from a reference protein database; however, this strategy necessarily fails to detect peptide and protein sequences that are absent from the database. We and others have recently shown that customized proteomic databases derived from RNA-Seq data can be employed for MS-searching to both improve MS analysis and identify novel peptides. While this general strategy constitutes a significant advance for the discovery of novel protein variations, it has not been readily transferable to other laboratories due to the need for many specialized software tools. To address this problem, we have implemented readily accessible, modifiable, and extensible workflows within Galaxy-P, short for Galaxy for Proteomics, a web-based bioinformatic extension of the Galaxy framework for the analysis of multi-omics (e.g. genomics, transcriptomics, proteomics) data. RESULTS: We present three bioinformatic workflows that allow the user to upload raw RNA sequencing reads and convert the data into high-quality customized proteomic databases suitable for MS searching. We show the utility of these workflows on human and mouse samples, identifying 544 peptides containing single amino acid polymorphisms (SAPs) and 187 peptides corresponding to unannotated splice junction peptides, correlating protein and transcript expression levels, and providing the option to incorporate transcript abundance measures within the MS database search process (reduced databases, incorporation of transcript abundance for protein identification score calculations, etc.). CONCLUSIONS: Using RNA-Seq data to enhance MS analysis is a promising strategy to discover novel peptides specific to a sample and, more generally, to improve proteomics results. The main bottleneck for widespread adoption of this strategy has been the lack of easily used and modifiable computational tools. We provide a solution to this problem by introducing a set of workflows within the Galaxy-P framework that converts raw RNA-Seq data into customized proteomic databases.


Asunto(s)
Análisis de Secuencia de ARN , Programas Informáticos , Empalme Alternativo , Sustitución de Aminoácidos , Animales , Bases de Datos de Ácidos Nucleicos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Células Jurkat , Ratones , Polimorfismo de Nucleótido Simple , Isoformas de Proteínas/química , Isoformas de Proteínas/genética , Proteoma/química , Proteoma/genética
15.
bioRxiv ; 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38915658

RESUMEN

Studying protein isoforms is an essential step in biomedical research; at present, the main approach for analyzing proteins is via bottom-up mass spectrometry proteomics, which return peptide identifications, that are indirectly used to infer the presence of protein isoforms. However, the detection and quantification processes are noisy; in particular, peptides may be erroneously detected, and most peptides, known as shared peptides, are associated to multiple protein isoforms. As a consequence, studying individual protein isoforms is challenging, and inferred protein results are often abstracted to the gene-level or to groups of protein isoforms. Here, we introduce IsoBayes, a novel statistical method to perform inference at the isoform level. Our method enhances the information available, by integrating mass spectrometry proteomics and transcriptomics data in a Bayesian probabilistic framework. To account for the uncertainty in the measurement process, we propose a two-layer latent variable approach: first, we sample if a peptide has been correctly detected (or, alternatively filter peptides); second, we allocate the abundance of such selected peptides across the protein(s) they are compatible with. This enables us, starting from peptide-level data, to recover protein-level data; in particular, we: i) infer the presence/absence of each protein isoform (via a posterior probability), ii) estimate its abundance (and credible interval), and iii) target isoforms where transcript and protein relative abundances significantly differ. We benchmarked our approach in simulations, and in two multi-protease real datasets: our method displays good sensitivity and specificity when detecting protein isoforms, its estimated abundances highly correlate with the ground truth, and can detect changes between protein and transcript relative abundances. IsoBayes is freely distributed as a Bioconductor R package, and is accompanied by an example usage vignette.

16.
bioRxiv ; 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38617311

RESUMEN

Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of pre-defined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (LR RNAseq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNAseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This LR RNA seq-informed Tomahto targeted approach, called LRP-IS-PRM, is a new modality for generating protein-level evidence of alternative isoforms - a critical first step in designing functional studies and eventually clinical assays.

17.
Sci Transl Med ; 16(730): eade2886, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38232136

RESUMEN

Immunotherapy has emerged as a crucial strategy to combat cancer by "reprogramming" a patient's own immune system. Although immunotherapy is typically reserved for patients with a high mutational burden, neoantigens produced from posttranscriptional regulation may provide an untapped reservoir of common immunogenic targets for new targeted therapies. To comprehensively define tumor-specific and likely immunogenic neoantigens from patient RNA-Seq, we developed Splicing Neo Antigen Finder (SNAF), an easy-to-use and open-source computational workflow to predict splicing-derived immunogenic MHC-bound peptides (T cell antigen) and unannotated transmembrane proteins with altered extracellular epitopes (B cell antigen). This workflow uses a highly accurate deep learning strategy for immunogenicity prediction (DeepImmuno) in conjunction with new algorithms to rank the tumor specificity of neoantigens (BayesTS) and to predict regulators of mis-splicing (RNA-SPRINT). T cell antigens from SNAF were frequently evidenced as HLA-presented peptides from mass spectrometry (MS) and predict response to immunotherapy in melanoma. Splicing neoantigen burden was attributed to coordinated splicing factor dysregulation. Shared splicing neoantigens were found in up to 90% of patients with melanoma, correlated to overall survival in multiple cancer cohorts, induced T cell reactivity, and were characterized by distinct cells of origin and amino acid preferences. In addition to T cell neoantigens, our B cell focused pipeline (SNAF-B) identified a new class of tumor-specific extracellular neoepitopes, which we termed ExNeoEpitopes. ExNeoEpitope full-length mRNA predictions were tumor specific and were validated using long-read isoform sequencing and in vitro transmembrane localization assays. Therefore, our systematic identification of splicing neoantigens revealed potential shared targets for therapy in heterogeneous cancers.


Asunto(s)
Melanoma , Neoplasias , Humanos , Antígenos de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/terapia , Linfocitos T , Péptidos/química , Inmunoterapia/métodos
18.
Curr Stem Cell Rep ; 9(2): 31-41, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38939410

RESUMEN

Purpose of review: The underlying molecular mechanisms that direct stem cell differentiation into fully functional, mature cells remain an area of ongoing investigation. Cell state is the product of the combinatorial effect of individual factors operating within a coordinated regulatory network. Here, we discuss the contribution of both gene regulatory and splicing regulatory networks in defining stem cell fate during differentiation and the critical role of protein isoforms in this process. Recent findings: We review recent experimental and computational approaches that characterize gene regulatory networks, splice regulatory networks, and the resulting transcriptome and proteome they mediate during differentiation. Such approaches include long-read RNA sequencing, which has demonstrated high-resolution profiling of mRNA isoforms, and Cas13-based CRISPR, which could make possible high-throughput isoform screening. Collectively, these developments enable systems-level profiling of factors contributing to cell state. Summary: Overall, gene and splice regulatory networks are important in defining cell state. The emerging high-throughput systems-level approaches will characterize the gene regulatory network components necessary in driving stem cell differentiation.

19.
Methods Mol Biol ; 2660: 357-372, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37191809

RESUMEN

Traditionally, disease causal mutations were thought to disrupt gene function. However, it becomes more clear that many deleterious mutations could exhibit a "gain-of-function" (GOF) behavior. Systematic investigation of such mutations has been lacking and largely overlooked. Advances in next-generation sequencing have identified thousands of genomic variants that perturb the normal functions of proteins, further contributing to diverse phenotypic consequences in disease. Elucidating the functional pathways rewired by GOF mutations will be crucial for prioritizing disease-causing variants and their resultant therapeutic liabilities. In distinct cell types (with varying genotypes), precise signal transduction controls cell decision, including gene regulation and phenotypic output. When signal transduction goes awry due to GOF mutations, it would give rise to various disease types. Quantitative and molecular understanding of network perturbations by GOF mutations may provide explanations for 'missing heritability" in previous genome-wide association studies. We envision that it will be instrumental to push current paradigm toward a thorough functional and quantitative modeling of all GOF mutations and their mechanistic molecular events involved in disease development and progression. Many fundamental questions pertaining to genotype-phenotype relationships remain unresolved. For example, which GOF mutations are key for gene regulation and cellular decisions? What are the GOF mechanisms at various regulation levels? How do interaction networks undergo rewiring upon GOF mutations? Is it possible to leverage GOF mutations to reprogram signal transduction in cells, aiming to cure disease? To begin to address these questions, we will cover a wide range of topics regarding GOF disease mutations and their characterization by multi-omic networks. We highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks. We also discuss advances in bioinformatic and computational resources, which will dramatically help with studies on the functional and phenotypic consequences of GOF mutations.


Asunto(s)
Multiómica , Medicina de Precisión , Estudio de Asociación del Genoma Completo , Mutación , Mutación con Ganancia de Función
20.
bioRxiv ; 2023 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-36993769

RESUMEN

A major fraction of loci identified by genome-wide association studies (GWASs) lead to alterations in alternative splicing, but interpretation of how such alterations impact proteins is hindered by the technical limitations of short-read RNA-seq, which cannot directly link splicing events to full-length transcript or protein isoforms. Long-read RNA-seq represents a powerful tool to define and quantify transcript isoforms, and recently, infer protein isoform existence. Here we present a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H 4 PP ≥ 0.75). We generated deep coverage PacBio long-read RNA-seq data (N=∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were novel. By casting the colocalized sQTLs directly onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Using these data, we created one of the first proteome-scale resources defining full-length isoforms impacted by colocalized sQTLs. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense mediated decay (NMD) and 190 that potentially resulted in the expression of new protein isoforms. Finally, we identified colocalizing sQTLs in TPM2 for splice junctions between two mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization. We expect our approach to be widely generalizable across diverse clinical traits and accelerate system-scale analyses of protein isoform activities modulated by GWAS loci.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA