Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 26
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
bioRxiv ; 2024 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-38617311

RESUMO

Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of pre-defined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (LR RNAseq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNAseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This LR RNA seq-informed Tomahto targeted approach, called LRP-IS-PRM, is a new modality for generating protein-level evidence of alternative isoforms - a critical first step in designing functional studies and eventually clinical assays.

2.
Sci Transl Med ; 16(730): eade2886, 2024 Jan 17.
Artigo em Inglês | MEDLINE | ID: mdl-38232136

RESUMO

Immunotherapy has emerged as a crucial strategy to combat cancer by "reprogramming" a patient's own immune system. Although immunotherapy is typically reserved for patients with a high mutational burden, neoantigens produced from posttranscriptional regulation may provide an untapped reservoir of common immunogenic targets for new targeted therapies. To comprehensively define tumor-specific and likely immunogenic neoantigens from patient RNA-Seq, we developed Splicing Neo Antigen Finder (SNAF), an easy-to-use and open-source computational workflow to predict splicing-derived immunogenic MHC-bound peptides (T cell antigen) and unannotated transmembrane proteins with altered extracellular epitopes (B cell antigen). This workflow uses a highly accurate deep learning strategy for immunogenicity prediction (DeepImmuno) in conjunction with new algorithms to rank the tumor specificity of neoantigens (BayesTS) and to predict regulators of mis-splicing (RNA-SPRINT). T cell antigens from SNAF were frequently evidenced as HLA-presented peptides from mass spectrometry (MS) and predict response to immunotherapy in melanoma. Splicing neoantigen burden was attributed to coordinated splicing factor dysregulation. Shared splicing neoantigens were found in up to 90% of patients with melanoma, correlated to overall survival in multiple cancer cohorts, induced T cell reactivity, and were characterized by distinct cells of origin and amino acid preferences. In addition to T cell neoantigens, our B cell focused pipeline (SNAF-B) identified a new class of tumor-specific extracellular neoepitopes, which we termed ExNeoEpitopes. ExNeoEpitope full-length mRNA predictions were tumor specific and were validated using long-read isoform sequencing and in vitro transmembrane localization assays. Therefore, our systematic identification of splicing neoantigens revealed potential shared targets for therapy in heterogeneous cancers.


Assuntos
Melanoma , Neoplasias , Humanos , Antígenos de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/terapia , Linfócitos T , Peptídeos/química , Imunoterapia/métodos
3.
Methods Mol Biol ; 2660: 357-372, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37191809

RESUMO

Traditionally, disease causal mutations were thought to disrupt gene function. However, it becomes more clear that many deleterious mutations could exhibit a "gain-of-function" (GOF) behavior. Systematic investigation of such mutations has been lacking and largely overlooked. Advances in next-generation sequencing have identified thousands of genomic variants that perturb the normal functions of proteins, further contributing to diverse phenotypic consequences in disease. Elucidating the functional pathways rewired by GOF mutations will be crucial for prioritizing disease-causing variants and their resultant therapeutic liabilities. In distinct cell types (with varying genotypes), precise signal transduction controls cell decision, including gene regulation and phenotypic output. When signal transduction goes awry due to GOF mutations, it would give rise to various disease types. Quantitative and molecular understanding of network perturbations by GOF mutations may provide explanations for 'missing heritability" in previous genome-wide association studies. We envision that it will be instrumental to push current paradigm toward a thorough functional and quantitative modeling of all GOF mutations and their mechanistic molecular events involved in disease development and progression. Many fundamental questions pertaining to genotype-phenotype relationships remain unresolved. For example, which GOF mutations are key for gene regulation and cellular decisions? What are the GOF mechanisms at various regulation levels? How do interaction networks undergo rewiring upon GOF mutations? Is it possible to leverage GOF mutations to reprogram signal transduction in cells, aiming to cure disease? To begin to address these questions, we will cover a wide range of topics regarding GOF disease mutations and their characterization by multi-omic networks. We highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks. We also discuss advances in bioinformatic and computational resources, which will dramatically help with studies on the functional and phenotypic consequences of GOF mutations.


Assuntos
Multiômica , Medicina de Precisão , Estudo de Associação Genômica Ampla , Mutação , Mutação com Ganho de Função
4.
Life Sci Alliance ; 6(8)2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37197981

RESUMO

Connexin37-mediated regulation of cell cycle modulators and, consequently, growth arrest lack mechanistic understanding. We previously showed that arterial shear stress up-regulates Cx37 in endothelial cells and activates a Notch/Cx37/p27 signaling axis to promote G1 cell cycle arrest, and this is required to enable arterial gene expression. However, how induced expression of a gap junction protein, Cx37, up-regulates cyclin-dependent kinase inhibitor p27 to enable endothelial growth suppression and arterial specification is unclear. Herein, we fill this knowledge gap by expressing wild-type and regulatory domain mutants of Cx37 in cultured endothelial cells expressing the Fucci cell cycle reporter. We determined that both the channel-forming and cytoplasmic tail domains of Cx37 are required for p27 up-regulation and late G1 arrest. Mechanistically, the cytoplasmic tail domain of Cx37 interacts with, and sequesters, activated ERK in the cytoplasm. This then stabilizes pERK nuclear target Foxo3a, which up-regulates p27 transcription. Consistent with previous studies, we found this Cx37/pERK/Foxo3a/p27 signaling axis functions downstream of arterial shear stress to promote endothelial late G1 state and enable up-regulation of arterial genes.


Assuntos
Conexinas , Células Endoteliais , Células Endoteliais/metabolismo , Pontos de Checagem do Ciclo Celular/genética , Conexinas/genética , Conexinas/metabolismo , Pontos de Checagem da Fase G1 do Ciclo Celular , Núcleo Celular/metabolismo , Proteína alfa-4 de Junções Comunicantes
5.
bioRxiv ; 2023 Mar 21.
Artigo em Inglês | MEDLINE | ID: mdl-36993769

RESUMO

A major fraction of loci identified by genome-wide association studies (GWASs) lead to alterations in alternative splicing, but interpretation of how such alterations impact proteins is hindered by the technical limitations of short-read RNA-seq, which cannot directly link splicing events to full-length transcript or protein isoforms. Long-read RNA-seq represents a powerful tool to define and quantify transcript isoforms, and recently, infer protein isoform existence. Here we present a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H 4 PP ≥ 0.75). We generated deep coverage PacBio long-read RNA-seq data (N=∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were novel. By casting the colocalized sQTLs directly onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Using these data, we created one of the first proteome-scale resources defining full-length isoforms impacted by colocalized sQTLs. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense mediated decay (NMD) and 190 that potentially resulted in the expression of new protein isoforms. Finally, we identified colocalizing sQTLs in TPM2 for splice junctions between two mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization. We expect our approach to be widely generalizable across diverse clinical traits and accelerate system-scale analyses of protein isoform activities modulated by GWAS loci.

6.
Brief Bioinform ; 24(2)2023 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-36752347

RESUMO

Alzheimer's disease (AD) is one of the most challenging neurodegenerative diseases because of its complicated and progressive mechanisms, and multiple risk factors. Increasing research evidence demonstrates that genetics may be a key factor responsible for the occurrence of the disease. Although previous reports identified quite a few AD-associated genes, they were mostly limited owing to patient sample size and selection bias. There is a lack of comprehensive research aimed to identify AD-associated risk mutations systematically. To address this challenge, we hereby construct a large-scale AD mutation and co-mutation framework ('AD-Syn-Net'), and propose deep learning models named Deep-SMCI and Deep-CMCI configured with fully connected layers that are capable of predicting cognitive impairment of subjects effectively based on genetic mutation and co-mutation profiles. Next, we apply the customized frameworks to data sets to evaluate the importance scores of the mutations and identified mutation effectors and co-mutation combination vulnerabilities contributing to cognitive impairment. Furthermore, we evaluate the influence of mutation pairs on the network architecture to dissect the genetic organization of AD and identify novel co-mutations that could be responsible for dementia, laying a solid foundation for proposing future targeted therapy for AD precision medicine. Our deep learning model codes are available open access here: https://github.com/Pan-Bio/AD-mutation-effectors.


Assuntos
Doença de Alzheimer , Disfunção Cognitiva , Aprendizado Profundo , Humanos , Doença de Alzheimer/genética , Imageamento por Ressonância Magnética , Disfunção Cognitiva/genética , Mutação
7.
Circ Res ; 132(3): 323-338, 2023 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-36597873

RESUMO

BACKGROUND: Coronary artery disease (CAD) is the leading cause of death worldwide. Recent meta-analyses of genome-wide association studies have identified over 175 loci associated with CAD. The majority of these loci are in noncoding regions and are predicted to regulate gene expression. Given that vascular smooth muscle cells (SMCs) play critical roles in the development and progression of CAD, we aimed to identify the subset of the CAD loci associated with the regulation of transcription in distinct SMC phenotypes. METHODS: We measured gene expression in SMCs isolated from the ascending aortas of 151 heart transplant donors of various genetic ancestries in quiescent or proliferative conditions and calculated the association of their expression and splicing with ~6.3 million imputed single-nucleotide polymorphism markers across the genome. RESULTS: We identified 4910 expression and 4412 splicing quantitative trait loci (sQTLs) representing regions of the genome associated with transcript abundance and splicing. A total of 3660 expression quantitative trait loci (eQTLs) had not been observed in the publicly available Genotype-Tissue Expression dataset. Further, 29 and 880 eQTLs were SMC-specific and sex-biased, respectively. We made these results available for public query on a user-friendly website. To identify the effector transcript(s) regulated by CAD loci, we used 4 distinct colocalization approaches. We identified 84 eQTL and 164 sQTL that colocalized with CAD loci, highlighting the importance of genetic regulation of mRNA splicing as a molecular mechanism for CAD genetic risk. Notably, 20% and 35% of the eQTLs were unique to quiescent or proliferative SMCs, respectively. One CAD locus colocalized with a sex-specific eQTL (TERF2IP), and another locus colocalized with SMC-specific eQTL (ALKBH8). The most significantly associated CAD locus, 9p21, was an sQTL for the long noncoding RNA CDKN2B-AS1, also known as ANRIL, in proliferative SMCs. CONCLUSIONS: Collectively, our results provide evidence for the molecular mechanisms of genetic susceptibility to CAD in distinct SMC phenotypes.


Assuntos
Doença da Artéria Coronariana , Masculino , Feminino , Humanos , Doença da Artéria Coronariana/genética , Doença da Artéria Coronariana/metabolismo , Estudo de Associação Genômica Ampla/métodos , Regulação da Expressão Gênica , Locos de Características Quantitativas , Predisposição Genética para Doença , Expressão Gênica , Polimorfismo de Nucleotídeo Único , Homólogo AlkB 8 da RNAt Metiltransferase/genética , Homólogo AlkB 8 da RNAt Metiltransferase/metabolismo
8.
RNA Biol ; 19(1): 1228-1243, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-36457147

RESUMO

Endothelial cells (ECs) comprise the lumenal lining of all blood vessels and are critical for the functioning of the cardiovascular system. Their phenotypes can be modulated by alternative splicing of RNA to produce distinct protein isoforms. To characterize the RNA and protein isoform landscape within ECs, we applied a long read proteogenomics approach to analyse human umbilical vein endothelial cells (HUVECs). Transcripts delineated from PacBio sequencing serve as the basis for a sample-specific protein database used for downstream mass-spectrometry (MS) analysis to infer protein isoform expression. We detected 53,863 transcript isoforms from 10,426 genes, with 22,195 of those transcripts being novel. Furthermore, the predominant isoform in HUVECs does not correspond with the accepted "reference isoform" 25% of the time, with vascular pathway-related genes among this group. We found 2,597 protein isoforms supported through unique peptides, with an additional 2,280 isoforms nominated upon incorporation of long-read transcript evidence. We characterized a novel alternative acceptor for endothelial-related gene CDH5, suggesting potential changes in its associated signalling pathways. Finally, we identified novel protein isoforms arising from a diversity of RNA splicing mechanisms supported by uniquely mapped novel peptides. Our results represent a high-resolution atlas of known and novel isoforms of potential relevance to endothelial phenotypes and function.[Figure: see text].


Assuntos
Proteogenômica , Humanos , Células Endoteliais da Veia Umbilical Humana , Isoformas de Proteínas/genética , Processamento Alternativo , RNA
9.
Nat Commun ; 13(1): 5891, 2022 10 06.
Artigo em Inglês | MEDLINE | ID: mdl-36202789

RESUMO

During blood vessel development, endothelial cells become specified toward arterial or venous fates to generate a circulatory network that provides nutrients and oxygen to, and removes metabolic waste from, all tissues. Arterial-venous specification occurs in conjunction with suppression of endothelial cell cycle progression; however, the mechanistic role of cell cycle state is unknown. Herein, using Cdh5-CreERT2;R26FUCCI2aR reporter mice, we find that venous endothelial cells are enriched for the FUCCI-Negative state (early G1) and BMP signaling, while arterial endothelial cells are enriched for the FUCCI-Red state (late G1) and TGF-ß signaling. Furthermore, early G1 state is essential for BMP4-induced venous gene expression, whereas late G1 state is essential for TGF-ß1-induced arterial gene expression. Pharmacologically induced cell cycle arrest prevents arterial-venous specification defects in mice with endothelial hyperproliferation. Collectively, our results show that distinct endothelial cell cycle states provide distinct windows of opportunity for the molecular induction of arterial vs. venous fate.


Assuntos
Células Endoteliais , Fator de Crescimento Transformador beta1 , Animais , Artérias/metabolismo , Ciclo Celular , Células Endoteliais/metabolismo , Camundongos , Oxigênio/metabolismo , Fator de Crescimento Transformador beta1/metabolismo , Veias
10.
Hum Mol Genet ; 31(R1): R123-R136, 2022 10 20.
Artigo em Inglês | MEDLINE | ID: mdl-35960994

RESUMO

Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.


Assuntos
Genética Humana , Isoformas de RNA , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Isoformas de RNA/genética , RNA Mensageiro/genética , Análise de Sequência de RNA
11.
Genome Biol ; 23(1): 69, 2022 03 03.
Artigo em Inglês | MEDLINE | ID: mdl-35241129

RESUMO

BACKGROUND: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS: We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS: Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.


Assuntos
Proteogenômica , Processamento Alternativo , Humanos , Isoformas de Proteínas/genética , Proteômica , Análise de Sequência de RNA/métodos , Transcriptoma
12.
Cell Rep ; 37(7): 110022, 2021 11 16.
Artigo em Inglês | MEDLINE | ID: mdl-34788620

RESUMO

Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community.


Assuntos
Córtex Cerebral/metabolismo , Isoformas de Proteínas/genética , Transcriptoma/genética , Processamento Alternativo/genética , Animais , Encéfalo/metabolismo , Córtex Cerebral/fisiologia , Éxons/genética , Expressão Gênica/genética , Perfilação da Expressão Gênica/métodos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Camundongos , Isoformas de Proteínas/metabolismo , Precursores de RNA/genética , Sítios de Splice de RNA/genética , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos
13.
medRxiv ; 2020 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-33173926

RESUMO

Chronic obstructive pulmonary disease (COPD) is a leading cause of death worldwide. Genome-wide association studies (GWAS) have identified over 80 loci that are associated with COPD and emphysema, however for most of these loci the causal variant and gene are unknown. Here, we utilize lung splice quantitative trait loci (sQTL) data from the Genotype-Tissue Expression project (GTEx) and short read sequencing data from the Lung Tissue Research Consortium (LTRC) to characterize a locus in nephronectin ( NPNT ) associated with COPD case-control status and lung function. We found that the rs34712979 variant is associated with alternative splice junction use in NPNT , specifically for the junction connecting the 2nd and 4th exons (chr4:105898001-105927336) (p=4.02×10 -38 ). This association colocalized with GWAS data for COPD and lung spirometry measures with a posterior probability of 94%, indicating that the same causal genetic variants in NPNT underlie the associations with COPD risk, spirometric measures of lung function, and splicing. Investigation of NPNT short read sequencing revealed that rs34712979 creates a cryptic splice acceptor site which results in the inclusion of a 3 nucleotide exon extension, coding for a serine residue near the N-terminus of the protein. Using Oxford Nanopore Technologies (ONT) long read sequencing we identified 13 NPNT isoforms, 6 of which are predicted to be protein coding. Two of these are full length isoforms which differ only in the 3 nucleotide exon extension whose occurrence differs by genotype. Overall, our data indicate that rs34712979 modulates COPD risk and lung function by creating a novel splice acceptor which results in the inclusion of a 3 nucelotide sequence coding for a serine in the nephronectin protein sequence. Our findings implicate NPNT splicing in contributing to COPD risk, and identify a novel serine insertion in the nephronectin protein that warrants further study.

14.
Nat Commun ; 11(1): 2326, 2020 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-32393825

RESUMO

Most human protein-coding genes are expressed as multiple isoforms, which greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every coding gene, the majority of alternative isoforms remains uncharacterized due to (i) vast differences of overall levels between different isoforms expressed from common genes, and (ii) the difficulty of obtaining full-length transcript sequences. Here, we present ORF Capture-Seq (OCS), a flexible method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As a proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude when compared to unenriched samples. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will accelerate mapping of the human transcriptome.


Assuntos
Fases de Leitura Aberta/genética , Análise de Sequência de RNA/métodos , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Padrões de Referência , Fatores de Transcrição/genética
15.
J Proteome Res ; 18(9): 3429-3438, 2019 09 06.
Artigo em Inglês | MEDLINE | ID: mdl-31378069

RESUMO

Peptides detected by tandem mass spectrometry (MS/MS) in bottom-up proteomics serve as proxies for the proteins expressed in the sample. Protein inference is a process routinely applied to these peptides to generate a plausible list of candidate protein identifications. The use of multiple proteases for parallel protein digestions expands sequence coverage, provides additional peptide identifications, and increases the probability of identifying peptides that are unique to a single protein, which are all valuable for protein inference. We have developed and implemented a multi-protease protein inference algorithm in MetaMorpheus, a bottom-up search software program, which incorporates the calculation of protease-specific q-values and preserves the association of peptide sequences and their protease of origin. This integrated multi-protease protein inference algorithm provides more accurate results than either the aggregation of results from the separate analysis of the peptide identifications produced by each protease (separate approach) in MetaMorpheus, or results that are obtained using Fido, ProteinProphet, or DTASelect2. MetaMorpheus' integrated multi-protease data analysis decreases the ambiguity of the protein group list, reduces the frequency of erroneous identifications, and increases the number of post-translational modifications identified, while combining multi-protease search and protein inference into a single software program.


Assuntos
Proteínas/isolamento & purificação , Proteômica , Software , Espectrometria de Massas em Tandem/métodos , Algoritmos , Sequência de Aminoácidos/genética , Bases de Dados de Proteínas , Peptídeo Hidrolases/química , Peptídeo Hidrolases/isolamento & purificação , Peptídeos/química , Peptídeos/isolamento & purificação , Proteínas/química
16.
Genome Biol ; 19(1): 46, 2018 03 29.
Artigo em Inglês | MEDLINE | ID: mdl-29598823

RESUMO

BACKGROUND: The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. RESULTS: In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. CONCLUSIONS: Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.


Assuntos
Poliadenilação , Splicing de RNA , RNA Mensageiro/metabolismo , Iniciação da Transcrição Genética , Humanos , Células MCF-7 , Motivos de Nucleotídeos , Poli A/metabolismo , Proteoma/genética , RNA Mensageiro/química , Proteínas de Ligação a RNA/metabolismo , Análise de Sequência de RNA , Transcriptoma
17.
Trends Biochem Sci ; 42(5): 342-354, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28284537

RESUMO

Cellular functions are mediated by complex interactome networks of physical, biochemical, and functional interactions between DNA sequences, RNA molecules, proteins, lipids, and small metabolites. A thorough understanding of cellular organization requires accurate and relatively complete models of interactome networks at proteome scale. The recent publication of four human protein-protein interaction (PPI) maps represents a technological breakthrough and an unprecedented resource for the scientific community, heralding a new era of proteome-scale human interactomics. Our knowledge gained from these and complementary studies provides fresh insights into the opportunities and challenges when analyzing systematically generated interactome data, defines a clear roadmap towards the generation of a first reference interactome, and reveals new perspectives on the organization of cellular life.


Assuntos
Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Humanos , Ligação Proteica , Proteínas/química , Proteômica
18.
PLoS Genet ; 12(12): e1006466, 2016 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-27935966

RESUMO

Human genome-wide association studies (GWAS) have shown that genetic variation at >130 gene loci is associated with type 2 diabetes (T2D). We asked if the expression of the candidate T2D-associated genes within these loci is regulated by a common locus in pancreatic islets. Using an obese F2 mouse intercross segregating for T2D, we show that the expression of ~40% of the T2D-associated genes is linked to a broad region on mouse chromosome (Chr) 2. As all but 9 of these genes are not physically located on Chr 2, linkage to Chr 2 suggests a genomic factor(s) located on Chr 2 regulates their expression in trans. The transcription factor Nfatc2 is physically located on Chr 2 and its expression demonstrates cis linkage; i.e., its expression maps to itself. When conditioned on the expression of Nfatc2, linkage for the T2D-associated genes was greatly diminished, supporting Nfatc2 as a driver of their expression. Plasma insulin also showed linkage to the same broad region on Chr 2. Overexpression of a constitutively active (ca) form of Nfatc2 induced ß-cell proliferation in mouse and human islets, and transcriptionally regulated more than half of the T2D-associated genes. Overexpression of either ca-Nfatc2 or ca-Nfatc1 in mouse islets enhanced insulin secretion, whereas only ca-Nfatc2 was able to promote ß-cell proliferation, suggesting distinct molecular pathways mediating insulin secretion vs. ß-cell proliferation are regulated by NFAT. Our results suggest that many of the T2D-associated genes are downstream transcriptional targets of NFAT, and may act coordinately in a pathway through which NFAT regulates ß-cell proliferation in both mouse and human islets.


Assuntos
Diabetes Mellitus Tipo 2/genética , Insulina/genética , Fatores de Transcrição NFATC/genética , Animais , Proliferação de Células/genética , Mapeamento Cromossômico , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patologia , Regulação da Expressão Gênica , Ligação Genética , Genoma , Estudo de Associação Genômica Ampla , Humanos , Células Secretoras de Insulina/metabolismo , Células Secretoras de Insulina/patologia , Ilhotas Pancreáticas/metabolismo , Ilhotas Pancreáticas/patologia , Camundongos , Camundongos Obesos , Fatores de Transcrição NFATC/biossíntese , Regiões Promotoras Genéticas
19.
Annu Rev Anal Chem (Palo Alto Calif) ; 9(1): 521-45, 2016 06 12.
Artigo em Inglês | MEDLINE | ID: mdl-27049631

RESUMO

Mass spectrometry-based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.


Assuntos
Variação Genética/genética , Espectrometria de Massas , Proteínas/química , Proteínas/genética , Proteogenômica , Sequência de Bases/genética , Humanos
20.
Cell ; 164(4): 805-17, 2016 02 11.
Artigo em Inglês | MEDLINE | ID: mdl-26871637

RESUMO

While alternative splicing is known to diversify the functional characteristics of some genes, the extent to which protein isoforms globally contribute to functional complexity on a proteomic scale remains unknown. To address this systematically, we cloned full-length open reading frames of alternatively spliced transcripts for a large number of human genes and used protein-protein interaction profiling to functionally compare hundreds of protein isoform pairs. The majority of isoform pairs share less than 50% of their interactions. In the global context of interactome network maps, alternative isoforms tend to behave like distinct proteins rather than minor variants of each other. Interaction partners specific to alternative isoforms tend to be expressed in a highly tissue-specific manner and belong to distinct functional modules. Our strategy, applicable to other functional characteristics, reveals a widespread expansion of protein interaction capabilities through alternative splicing and suggests that many alternative "isoforms" are functionally divergent (i.e., "functional alloforms").


Assuntos
Processamento Alternativo , Isoformas de Proteínas/metabolismo , Proteoma/metabolismo , Animais , Clonagem Molecular , Evolução Molecular , Humanos , Modelos Moleculares , Fases de Leitura Aberta , Domínios e Motivos de Interação entre Proteínas , Mapas de Interação de Proteínas , Proteoma/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA