Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
bioRxiv ; 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38915658

RESUMEN

Studying protein isoforms is an essential step in biomedical research; at present, the main approach for analyzing proteins is via bottom-up mass spectrometry proteomics, which return peptide identifications, that are indirectly used to infer the presence of protein isoforms. However, the detection and quantification processes are noisy; in particular, peptides may be erroneously detected, and most peptides, known as shared peptides, are associated to multiple protein isoforms. As a consequence, studying individual protein isoforms is challenging, and inferred protein results are often abstracted to the gene-level or to groups of protein isoforms. Here, we introduce IsoBayes, a novel statistical method to perform inference at the isoform level. Our method enhances the information available, by integrating mass spectrometry proteomics and transcriptomics data in a Bayesian probabilistic framework. To account for the uncertainty in the measurement process, we propose a two-layer latent variable approach: first, we sample if a peptide has been correctly detected (or, alternatively filter peptides); second, we allocate the abundance of such selected peptides across the protein(s) they are compatible with. This enables us, starting from peptide-level data, to recover protein-level data; in particular, we: i) infer the presence/absence of each protein isoform (via a posterior probability), ii) estimate its abundance (and credible interval), and iii) target isoforms where transcript and protein relative abundances significantly differ. We benchmarked our approach in simulations, and in two multi-protease real datasets: our method displays good sensitivity and specificity when detecting protein isoforms, its estimated abundances highly correlate with the ground truth, and can detect changes between protein and transcript relative abundances. IsoBayes is freely distributed as a Bioconductor R package, and is accompanied by an example usage vignette.

2.
bioRxiv ; 2024 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-38617311

RESUMEN

Alternative splicing is a major contributor of transcriptomic complexity, but the extent to which transcript isoforms are translated into stable, functional protein isoforms is unclear. Furthermore, detection of relatively scarce isoform-specific peptides is challenging, with many protein isoforms remaining uncharted due to technical limitations. Recently, a family of advanced targeted MS strategies, termed internal standard parallel reaction monitoring (IS-PRM), have demonstrated multiplexed, sensitive detection of pre-defined peptides of interest. Such approaches have not yet been used to confirm existence of novel peptides. Here, we present a targeted proteogenomic approach that leverages sample-matched long-read RNA sequencing (LR RNAseq) data to predict potential protein isoforms with prior transcript evidence. Predicted tryptic isoform-specific peptides, which are specific to individual gene product isoforms, serve as "triggers" and "targets" in the IS-PRM method, Tomahto. Using the model human stem cell line WTC11, LR RNAseq data were generated and used to inform the generation of synthetic standards for 192 isoform-specific peptides (114 isoforms from 55 genes). These synthetic "trigger" peptides were labeled with super heavy tandem mass tags (TMT) and spiked into TMT-labeled WTC11 tryptic digest, predicted to contain corresponding endogenous "target" peptides. Compared to DDA mode, Tomahto increased detectability of isoforms by 3.6-fold, resulting in the identification of five previously unannotated isoforms. Our method detected protein isoform expression for 43 out of 55 genes corresponding to 54 resolved isoforms. This LR RNA seq-informed Tomahto targeted approach, called LRP-IS-PRM, is a new modality for generating protein-level evidence of alternative isoforms - a critical first step in designing functional studies and eventually clinical assays.

3.
Sci Transl Med ; 16(730): eade2886, 2024 Jan 17.
Artículo en Inglés | MEDLINE | ID: mdl-38232136

RESUMEN

Immunotherapy has emerged as a crucial strategy to combat cancer by "reprogramming" a patient's own immune system. Although immunotherapy is typically reserved for patients with a high mutational burden, neoantigens produced from posttranscriptional regulation may provide an untapped reservoir of common immunogenic targets for new targeted therapies. To comprehensively define tumor-specific and likely immunogenic neoantigens from patient RNA-Seq, we developed Splicing Neo Antigen Finder (SNAF), an easy-to-use and open-source computational workflow to predict splicing-derived immunogenic MHC-bound peptides (T cell antigen) and unannotated transmembrane proteins with altered extracellular epitopes (B cell antigen). This workflow uses a highly accurate deep learning strategy for immunogenicity prediction (DeepImmuno) in conjunction with new algorithms to rank the tumor specificity of neoantigens (BayesTS) and to predict regulators of mis-splicing (RNA-SPRINT). T cell antigens from SNAF were frequently evidenced as HLA-presented peptides from mass spectrometry (MS) and predict response to immunotherapy in melanoma. Splicing neoantigen burden was attributed to coordinated splicing factor dysregulation. Shared splicing neoantigens were found in up to 90% of patients with melanoma, correlated to overall survival in multiple cancer cohorts, induced T cell reactivity, and were characterized by distinct cells of origin and amino acid preferences. In addition to T cell neoantigens, our B cell focused pipeline (SNAF-B) identified a new class of tumor-specific extracellular neoepitopes, which we termed ExNeoEpitopes. ExNeoEpitope full-length mRNA predictions were tumor specific and were validated using long-read isoform sequencing and in vitro transmembrane localization assays. Therefore, our systematic identification of splicing neoantigens revealed potential shared targets for therapy in heterogeneous cancers.


Asunto(s)
Melanoma , Neoplasias , Humanos , Antígenos de Neoplasias/metabolismo , Neoplasias/genética , Neoplasias/terapia , Linfocitos T , Péptidos/química , Inmunoterapia/métodos
4.
Life Sci Alliance ; 6(8)2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37197981

RESUMEN

Connexin37-mediated regulation of cell cycle modulators and, consequently, growth arrest lack mechanistic understanding. We previously showed that arterial shear stress up-regulates Cx37 in endothelial cells and activates a Notch/Cx37/p27 signaling axis to promote G1 cell cycle arrest, and this is required to enable arterial gene expression. However, how induced expression of a gap junction protein, Cx37, up-regulates cyclin-dependent kinase inhibitor p27 to enable endothelial growth suppression and arterial specification is unclear. Herein, we fill this knowledge gap by expressing wild-type and regulatory domain mutants of Cx37 in cultured endothelial cells expressing the Fucci cell cycle reporter. We determined that both the channel-forming and cytoplasmic tail domains of Cx37 are required for p27 up-regulation and late G1 arrest. Mechanistically, the cytoplasmic tail domain of Cx37 interacts with, and sequesters, activated ERK in the cytoplasm. This then stabilizes pERK nuclear target Foxo3a, which up-regulates p27 transcription. Consistent with previous studies, we found this Cx37/pERK/Foxo3a/p27 signaling axis functions downstream of arterial shear stress to promote endothelial late G1 state and enable up-regulation of arterial genes.


Asunto(s)
Conexinas , Células Endoteliales , Células Endoteliales/metabolismo , Puntos de Control del Ciclo Celular/genética , Conexinas/genética , Conexinas/metabolismo , Puntos de Control de la Fase G1 del Ciclo Celular , Núcleo Celular/metabolismo , Proteína alfa-4 de Unión Comunicante
5.
Methods Mol Biol ; 2660: 357-372, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37191809

RESUMEN

Traditionally, disease causal mutations were thought to disrupt gene function. However, it becomes more clear that many deleterious mutations could exhibit a "gain-of-function" (GOF) behavior. Systematic investigation of such mutations has been lacking and largely overlooked. Advances in next-generation sequencing have identified thousands of genomic variants that perturb the normal functions of proteins, further contributing to diverse phenotypic consequences in disease. Elucidating the functional pathways rewired by GOF mutations will be crucial for prioritizing disease-causing variants and their resultant therapeutic liabilities. In distinct cell types (with varying genotypes), precise signal transduction controls cell decision, including gene regulation and phenotypic output. When signal transduction goes awry due to GOF mutations, it would give rise to various disease types. Quantitative and molecular understanding of network perturbations by GOF mutations may provide explanations for 'missing heritability" in previous genome-wide association studies. We envision that it will be instrumental to push current paradigm toward a thorough functional and quantitative modeling of all GOF mutations and their mechanistic molecular events involved in disease development and progression. Many fundamental questions pertaining to genotype-phenotype relationships remain unresolved. For example, which GOF mutations are key for gene regulation and cellular decisions? What are the GOF mechanisms at various regulation levels? How do interaction networks undergo rewiring upon GOF mutations? Is it possible to leverage GOF mutations to reprogram signal transduction in cells, aiming to cure disease? To begin to address these questions, we will cover a wide range of topics regarding GOF disease mutations and their characterization by multi-omic networks. We highlight the fundamental function of GOF mutations and discuss the potential mechanistic effects in the context of signaling networks. We also discuss advances in bioinformatic and computational resources, which will dramatically help with studies on the functional and phenotypic consequences of GOF mutations.


Asunto(s)
Multiómica , Medicina de Precisión , Estudio de Asociación del Genoma Completo , Mutación , Mutación con Ganancia de Función
6.
bioRxiv ; 2023 Mar 21.
Artículo en Inglés | MEDLINE | ID: mdl-36993769

RESUMEN

A major fraction of loci identified by genome-wide association studies (GWASs) lead to alterations in alternative splicing, but interpretation of how such alterations impact proteins is hindered by the technical limitations of short-read RNA-seq, which cannot directly link splicing events to full-length transcript or protein isoforms. Long-read RNA-seq represents a powerful tool to define and quantify transcript isoforms, and recently, infer protein isoform existence. Here we present a novel approach that integrates information from GWAS, splicing QTL (sQTL), and PacBio long-read RNA-seq in a disease-relevant model to infer the effects of sQTLs on the ultimate protein isoform products they encode. We demonstrate the utility of our approach using bone mineral density (BMD) GWAS data. We identified 1,863 sQTLs from the Genotype-Tissue Expression (GTEx) project in 732 protein-coding genes which colocalized with BMD associations (H 4 PP ≥ 0.75). We generated deep coverage PacBio long-read RNA-seq data (N=∼22 million full-length reads) on human osteoblasts, identifying 68,326 protein-coding isoforms, of which 17,375 (25%) were novel. By casting the colocalized sQTLs directly onto protein isoforms, we connected 809 sQTLs to 2,029 protein isoforms from 441 genes expressed in osteoblasts. Using these data, we created one of the first proteome-scale resources defining full-length isoforms impacted by colocalized sQTLs. Overall, we found that 74 sQTLs influenced isoforms likely impacted by nonsense mediated decay (NMD) and 190 that potentially resulted in the expression of new protein isoforms. Finally, we identified colocalizing sQTLs in TPM2 for splice junctions between two mutually exclusive exons, and two different transcript termination sites, making it impossible to interpret without long-read RNA-seq data. siRNA mediated knockdown in osteoblasts showed two TPM2 isoforms with opposing effects on mineralization. We expect our approach to be widely generalizable across diverse clinical traits and accelerate system-scale analyses of protein isoform activities modulated by GWAS loci.

7.
Brief Bioinform ; 24(2)2023 03 19.
Artículo en Inglés | MEDLINE | ID: mdl-36752347

RESUMEN

Alzheimer's disease (AD) is one of the most challenging neurodegenerative diseases because of its complicated and progressive mechanisms, and multiple risk factors. Increasing research evidence demonstrates that genetics may be a key factor responsible for the occurrence of the disease. Although previous reports identified quite a few AD-associated genes, they were mostly limited owing to patient sample size and selection bias. There is a lack of comprehensive research aimed to identify AD-associated risk mutations systematically. To address this challenge, we hereby construct a large-scale AD mutation and co-mutation framework ('AD-Syn-Net'), and propose deep learning models named Deep-SMCI and Deep-CMCI configured with fully connected layers that are capable of predicting cognitive impairment of subjects effectively based on genetic mutation and co-mutation profiles. Next, we apply the customized frameworks to data sets to evaluate the importance scores of the mutations and identified mutation effectors and co-mutation combination vulnerabilities contributing to cognitive impairment. Furthermore, we evaluate the influence of mutation pairs on the network architecture to dissect the genetic organization of AD and identify novel co-mutations that could be responsible for dementia, laying a solid foundation for proposing future targeted therapy for AD precision medicine. Our deep learning model codes are available open access here: https://github.com/Pan-Bio/AD-mutation-effectors.


Asunto(s)
Enfermedad de Alzheimer , Disfunción Cognitiva , Aprendizaje Profundo , Humanos , Enfermedad de Alzheimer/genética , Imagen por Resonancia Magnética , Disfunción Cognitiva/genética , Mutación
8.
Circ Res ; 132(3): 323-338, 2023 02 03.
Artículo en Inglés | MEDLINE | ID: mdl-36597873

RESUMEN

BACKGROUND: Coronary artery disease (CAD) is the leading cause of death worldwide. Recent meta-analyses of genome-wide association studies have identified over 175 loci associated with CAD. The majority of these loci are in noncoding regions and are predicted to regulate gene expression. Given that vascular smooth muscle cells (SMCs) play critical roles in the development and progression of CAD, we aimed to identify the subset of the CAD loci associated with the regulation of transcription in distinct SMC phenotypes. METHODS: We measured gene expression in SMCs isolated from the ascending aortas of 151 heart transplant donors of various genetic ancestries in quiescent or proliferative conditions and calculated the association of their expression and splicing with ~6.3 million imputed single-nucleotide polymorphism markers across the genome. RESULTS: We identified 4910 expression and 4412 splicing quantitative trait loci (sQTLs) representing regions of the genome associated with transcript abundance and splicing. A total of 3660 expression quantitative trait loci (eQTLs) had not been observed in the publicly available Genotype-Tissue Expression dataset. Further, 29 and 880 eQTLs were SMC-specific and sex-biased, respectively. We made these results available for public query on a user-friendly website. To identify the effector transcript(s) regulated by CAD loci, we used 4 distinct colocalization approaches. We identified 84 eQTL and 164 sQTL that colocalized with CAD loci, highlighting the importance of genetic regulation of mRNA splicing as a molecular mechanism for CAD genetic risk. Notably, 20% and 35% of the eQTLs were unique to quiescent or proliferative SMCs, respectively. One CAD locus colocalized with a sex-specific eQTL (TERF2IP), and another locus colocalized with SMC-specific eQTL (ALKBH8). The most significantly associated CAD locus, 9p21, was an sQTL for the long noncoding RNA CDKN2B-AS1, also known as ANRIL, in proliferative SMCs. CONCLUSIONS: Collectively, our results provide evidence for the molecular mechanisms of genetic susceptibility to CAD in distinct SMC phenotypes.


Asunto(s)
Enfermedad de la Arteria Coronaria , Masculino , Femenino , Humanos , Enfermedad de la Arteria Coronaria/genética , Enfermedad de la Arteria Coronaria/metabolismo , Estudio de Asociación del Genoma Completo/métodos , Regulación de la Expresión Génica , Sitios de Carácter Cuantitativo , Predisposición Genética a la Enfermedad , Expresión Génica , Polimorfismo de Nucleótido Simple , Homólogo 8 de AlkB ARNt Metiltransferasa/genética , Homólogo 8 de AlkB ARNt Metiltransferasa/metabolismo
9.
Curr Stem Cell Rep ; 9(2): 31-41, 2023 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-38939410

RESUMEN

Purpose of review: The underlying molecular mechanisms that direct stem cell differentiation into fully functional, mature cells remain an area of ongoing investigation. Cell state is the product of the combinatorial effect of individual factors operating within a coordinated regulatory network. Here, we discuss the contribution of both gene regulatory and splicing regulatory networks in defining stem cell fate during differentiation and the critical role of protein isoforms in this process. Recent findings: We review recent experimental and computational approaches that characterize gene regulatory networks, splice regulatory networks, and the resulting transcriptome and proteome they mediate during differentiation. Such approaches include long-read RNA sequencing, which has demonstrated high-resolution profiling of mRNA isoforms, and Cas13-based CRISPR, which could make possible high-throughput isoform screening. Collectively, these developments enable systems-level profiling of factors contributing to cell state. Summary: Overall, gene and splice regulatory networks are important in defining cell state. The emerging high-throughput systems-level approaches will characterize the gene regulatory network components necessary in driving stem cell differentiation.

10.
RNA Biol ; 19(1): 1228-1243, 2022 01.
Artículo en Inglés | MEDLINE | ID: mdl-36457147

RESUMEN

Endothelial cells (ECs) comprise the lumenal lining of all blood vessels and are critical for the functioning of the cardiovascular system. Their phenotypes can be modulated by alternative splicing of RNA to produce distinct protein isoforms. To characterize the RNA and protein isoform landscape within ECs, we applied a long read proteogenomics approach to analyse human umbilical vein endothelial cells (HUVECs). Transcripts delineated from PacBio sequencing serve as the basis for a sample-specific protein database used for downstream mass-spectrometry (MS) analysis to infer protein isoform expression. We detected 53,863 transcript isoforms from 10,426 genes, with 22,195 of those transcripts being novel. Furthermore, the predominant isoform in HUVECs does not correspond with the accepted "reference isoform" 25% of the time, with vascular pathway-related genes among this group. We found 2,597 protein isoforms supported through unique peptides, with an additional 2,280 isoforms nominated upon incorporation of long-read transcript evidence. We characterized a novel alternative acceptor for endothelial-related gene CDH5, suggesting potential changes in its associated signalling pathways. Finally, we identified novel protein isoforms arising from a diversity of RNA splicing mechanisms supported by uniquely mapped novel peptides. Our results represent a high-resolution atlas of known and novel isoforms of potential relevance to endothelial phenotypes and function.[Figure: see text].


Asunto(s)
Proteogenómica , Humanos , Células Endoteliales de la Vena Umbilical Humana , Isoformas de Proteínas/genética , Empalme Alternativo , ARN
11.
Nat Commun ; 13(1): 5891, 2022 10 06.
Artículo en Inglés | MEDLINE | ID: mdl-36202789

RESUMEN

During blood vessel development, endothelial cells become specified toward arterial or venous fates to generate a circulatory network that provides nutrients and oxygen to, and removes metabolic waste from, all tissues. Arterial-venous specification occurs in conjunction with suppression of endothelial cell cycle progression; however, the mechanistic role of cell cycle state is unknown. Herein, using Cdh5-CreERT2;R26FUCCI2aR reporter mice, we find that venous endothelial cells are enriched for the FUCCI-Negative state (early G1) and BMP signaling, while arterial endothelial cells are enriched for the FUCCI-Red state (late G1) and TGF-ß signaling. Furthermore, early G1 state is essential for BMP4-induced venous gene expression, whereas late G1 state is essential for TGF-ß1-induced arterial gene expression. Pharmacologically induced cell cycle arrest prevents arterial-venous specification defects in mice with endothelial hyperproliferation. Collectively, our results show that distinct endothelial cell cycle states provide distinct windows of opportunity for the molecular induction of arterial vs. venous fate.


Asunto(s)
Células Endoteliales , Factor de Crecimiento Transformador beta1 , Animales , Arterias/metabolismo , Ciclo Celular , Células Endoteliales/metabolismo , Ratones , Oxígeno/metabolismo , Factor de Crecimiento Transformador beta1/metabolismo , Venas
12.
Hum Mol Genet ; 31(R1): R123-R136, 2022 10 20.
Artículo en Inglés | MEDLINE | ID: mdl-35960994

RESUMEN

Aberrant splicing underlies many human diseases, including cancer, cardiovascular diseases and neurological disorders. Genome-wide mapping of splicing quantitative trait loci (sQTLs) has shown that genetic regulation of alternative splicing is widespread. However, identification of the corresponding isoform or protein products associated with disease-associated sQTLs is challenging with short-read RNA-seq, which cannot precisely characterize full-length transcript isoforms. Furthermore, contemporary sQTL interpretation often relies on reference transcript annotations, which are incomplete. Solutions to these issues may be found through integration of newly emerging long-read sequencing technologies. Long-read sequencing offers the capability to sequence full-length mRNA transcripts and, in some cases, to link sQTLs to transcript isoforms containing disease-relevant protein alterations. Here, we provide an overview of sQTL mapping approaches, the use of long-read sequencing to characterize sQTL effects on isoforms, the linkage of RNA isoforms to protein-level functions and comment on future directions in the field. Based on recent progress, long-read RNA sequencing promises to be part of the human disease genetics toolkit to discover and treat protein isoforms causing rare and complex diseases.


Asunto(s)
Genética Humana , Isoformas de ARN , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Isoformas de ARN/genética , ARN Mensajero/genética , Análisis de Secuencia de ARN
13.
Genome Biol ; 23(1): 69, 2022 03 03.
Artículo en Inglés | MEDLINE | ID: mdl-35241129

RESUMEN

BACKGROUND: The detection of physiologically relevant protein isoforms encoded by the human genome is critical to biomedicine. Mass spectrometry (MS)-based proteomics is the preeminent method for protein detection, but isoform-resolved proteomic analysis relies on accurate reference databases that match the sample; neither a subset nor a superset database is ideal. Long-read RNA sequencing (e.g., PacBio or Oxford Nanopore) provides full-length transcripts which can be used to predict full-length protein isoforms. RESULTS: We describe here a long-read proteogenomics approach for integrating sample-matched long-read RNA-seq and MS-based proteomics data to enhance isoform characterization. We introduce a classification scheme for protein isoforms, discover novel protein isoforms, and present the first protein inference algorithm for the direct incorporation of long-read transcriptome data to enable detection of protein isoforms previously intractable to MS-based detection. We have released an open-source Nextflow pipeline that integrates long-read sequencing in a proteomic workflow for isoform-resolved analysis. CONCLUSIONS: Our work suggests that the incorporation of long-read sequencing and proteomic data can facilitate improved characterization of human protein isoform diversity. Our first-generation pipeline provides a strong foundation for future development of long-read proteogenomics and its adoption for both basic and translational research.


Asunto(s)
Proteogenómica , Empalme Alternativo , Humanos , Isoformas de Proteínas/genética , Proteómica , Análisis de Secuencia de ARN/métodos , Transcriptoma
14.
Cell Rep ; 37(7): 110022, 2021 11 16.
Artículo en Inglés | MEDLINE | ID: mdl-34788620

RESUMEN

Alternative splicing is a post-transcriptional regulatory mechanism producing distinct mRNA molecules from a single pre-mRNA with a prominent role in the development and function of the central nervous system. We used long-read isoform sequencing to generate full-length transcript sequences in the human and mouse cortex. We identify novel transcripts not present in existing genome annotations, including transcripts mapping to putative novel (unannotated) genes and fusion transcripts incorporating exons from multiple genes. Global patterns of transcript diversity are similar between human and mouse cortex, although certain genes are characterized by striking differences between species. We also identify developmental changes in alternative splicing, with differential transcript usage between human fetal and adult cortex. Our data confirm the importance of alternative splicing in the cortex, dramatically increasing transcriptional diversity and representing an important mechanism underpinning gene regulation in the brain. We provide transcript-level data for human and mouse cortex as a resource to the scientific community.


Asunto(s)
Corteza Cerebral/metabolismo , Isoformas de Proteínas/genética , Transcriptoma/genética , Empalme Alternativo/genética , Animales , Encéfalo/metabolismo , Corteza Cerebral/fisiología , Exones/genética , Expresión Génica/genética , Perfilación de la Expresión Génica/métodos , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Ratones , Isoformas de Proteínas/metabolismo , Precursores del ARN/genética , Sitios de Empalme de ARN/genética , ARN Mensajero/genética , Análisis de Secuencia de ARN/métodos
15.
medRxiv ; 2020 Nov 03.
Artículo en Inglés | MEDLINE | ID: mdl-33173926

RESUMEN

Chronic obstructive pulmonary disease (COPD) is a leading cause of death worldwide. Genome-wide association studies (GWAS) have identified over 80 loci that are associated with COPD and emphysema, however for most of these loci the causal variant and gene are unknown. Here, we utilize lung splice quantitative trait loci (sQTL) data from the Genotype-Tissue Expression project (GTEx) and short read sequencing data from the Lung Tissue Research Consortium (LTRC) to characterize a locus in nephronectin ( NPNT ) associated with COPD case-control status and lung function. We found that the rs34712979 variant is associated with alternative splice junction use in NPNT , specifically for the junction connecting the 2nd and 4th exons (chr4:105898001-105927336) (p=4.02×10 -38 ). This association colocalized with GWAS data for COPD and lung spirometry measures with a posterior probability of 94%, indicating that the same causal genetic variants in NPNT underlie the associations with COPD risk, spirometric measures of lung function, and splicing. Investigation of NPNT short read sequencing revealed that rs34712979 creates a cryptic splice acceptor site which results in the inclusion of a 3 nucleotide exon extension, coding for a serine residue near the N-terminus of the protein. Using Oxford Nanopore Technologies (ONT) long read sequencing we identified 13 NPNT isoforms, 6 of which are predicted to be protein coding. Two of these are full length isoforms which differ only in the 3 nucleotide exon extension whose occurrence differs by genotype. Overall, our data indicate that rs34712979 modulates COPD risk and lung function by creating a novel splice acceptor which results in the inclusion of a 3 nucelotide sequence coding for a serine in the nephronectin protein sequence. Our findings implicate NPNT splicing in contributing to COPD risk, and identify a novel serine insertion in the nephronectin protein that warrants further study.

16.
Nat Commun ; 11(1): 2326, 2020 05 11.
Artículo en Inglés | MEDLINE | ID: mdl-32393825

RESUMEN

Most human protein-coding genes are expressed as multiple isoforms, which greatly expands the functional repertoire of the encoded proteome. While at least one reliable open reading frame (ORF) model has been assigned for every coding gene, the majority of alternative isoforms remains uncharacterized due to (i) vast differences of overall levels between different isoforms expressed from common genes, and (ii) the difficulty of obtaining full-length transcript sequences. Here, we present ORF Capture-Seq (OCS), a flexible method that addresses both challenges for targeted full-length isoform sequencing applications using collections of cloned ORFs as probes. As a proof-of-concept, we show that an OCS pipeline focused on genes coding for transcription factors increases isoform detection by an order of magnitude when compared to unenriched samples. In short, OCS enables rapid discovery of isoforms from custom-selected genes and will accelerate mapping of the human transcriptome.


Asunto(s)
Sistemas de Lectura Abierta/genética , Análisis de Secuencia de ARN/métodos , Humanos , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Estándares de Referencia , Factores de Transcripción/genética
17.
J Proteome Res ; 18(9): 3429-3438, 2019 09 06.
Artículo en Inglés | MEDLINE | ID: mdl-31378069

RESUMEN

Peptides detected by tandem mass spectrometry (MS/MS) in bottom-up proteomics serve as proxies for the proteins expressed in the sample. Protein inference is a process routinely applied to these peptides to generate a plausible list of candidate protein identifications. The use of multiple proteases for parallel protein digestions expands sequence coverage, provides additional peptide identifications, and increases the probability of identifying peptides that are unique to a single protein, which are all valuable for protein inference. We have developed and implemented a multi-protease protein inference algorithm in MetaMorpheus, a bottom-up search software program, which incorporates the calculation of protease-specific q-values and preserves the association of peptide sequences and their protease of origin. This integrated multi-protease protein inference algorithm provides more accurate results than either the aggregation of results from the separate analysis of the peptide identifications produced by each protease (separate approach) in MetaMorpheus, or results that are obtained using Fido, ProteinProphet, or DTASelect2. MetaMorpheus' integrated multi-protease data analysis decreases the ambiguity of the protein group list, reduces the frequency of erroneous identifications, and increases the number of post-translational modifications identified, while combining multi-protease search and protein inference into a single software program.


Asunto(s)
Proteínas/aislamiento & purificación , Proteómica , Programas Informáticos , Espectrometría de Masas en Tándem/métodos , Algoritmos , Secuencia de Aminoácidos/genética , Bases de Datos de Proteínas , Péptido Hidrolasas/química , Péptido Hidrolasas/aislamiento & purificación , Péptidos/química , Péptidos/aislamiento & purificación , Proteínas/química
18.
Genome Biol ; 19(1): 46, 2018 03 29.
Artículo en Inglés | MEDLINE | ID: mdl-29598823

RESUMEN

BACKGROUND: The multifaceted control of gene expression requires tight coordination of regulatory mechanisms at transcriptional and post-transcriptional level. Here, we studied the interdependence of transcription initiation, splicing and polyadenylation events on single mRNA molecules by full-length mRNA sequencing. RESULTS: In MCF-7 breast cancer cells, we find 2700 genes with interdependent alternative transcription initiation, splicing and polyadenylation events, both in proximal and distant parts of mRNA molecules, including examples of coupling between transcription start sites and polyadenylation sites. The analysis of three human primary tissues (brain, heart and liver) reveals similar patterns of interdependency between transcription initiation and mRNA processing events. We predict thousands of novel open reading frames from full-length mRNA sequences and obtained evidence for their translation by shotgun proteomics. The mapping database rescues 358 previously unassigned peptides and improves the assignment of others. By recognizing sample-specific amino-acid changes and novel splicing patterns, full-length mRNA sequencing improves proteogenomics analysis of MCF-7 cells. CONCLUSIONS: Our findings demonstrate that our understanding of transcriptome complexity is far from complete and provides a basis to reveal largely unresolved mechanisms that coordinate transcription initiation and mRNA processing.


Asunto(s)
Poliadenilación , Empalme del ARN , ARN Mensajero/metabolismo , Iniciación de la Transcripción Genética , Humanos , Células MCF-7 , Motivos de Nucleótidos , Poli A/metabolismo , Proteoma/genética , ARN Mensajero/química , Proteínas de Unión al ARN/metabolismo , Análisis de Secuencia de ARN , Transcriptoma
19.
Trends Biochem Sci ; 42(5): 342-354, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-28284537

RESUMEN

Cellular functions are mediated by complex interactome networks of physical, biochemical, and functional interactions between DNA sequences, RNA molecules, proteins, lipids, and small metabolites. A thorough understanding of cellular organization requires accurate and relatively complete models of interactome networks at proteome scale. The recent publication of four human protein-protein interaction (PPI) maps represents a technological breakthrough and an unprecedented resource for the scientific community, heralding a new era of proteome-scale human interactomics. Our knowledge gained from these and complementary studies provides fresh insights into the opportunities and challenges when analyzing systematically generated interactome data, defines a clear roadmap towards the generation of a first reference interactome, and reveals new perspectives on the organization of cellular life.


Asunto(s)
Mapeo de Interacción de Proteínas , Mapas de Interacción de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Humanos , Unión Proteica , Proteínas/química , Proteómica
20.
PLoS Genet ; 12(12): e1006466, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-27935966

RESUMEN

Human genome-wide association studies (GWAS) have shown that genetic variation at >130 gene loci is associated with type 2 diabetes (T2D). We asked if the expression of the candidate T2D-associated genes within these loci is regulated by a common locus in pancreatic islets. Using an obese F2 mouse intercross segregating for T2D, we show that the expression of ~40% of the T2D-associated genes is linked to a broad region on mouse chromosome (Chr) 2. As all but 9 of these genes are not physically located on Chr 2, linkage to Chr 2 suggests a genomic factor(s) located on Chr 2 regulates their expression in trans. The transcription factor Nfatc2 is physically located on Chr 2 and its expression demonstrates cis linkage; i.e., its expression maps to itself. When conditioned on the expression of Nfatc2, linkage for the T2D-associated genes was greatly diminished, supporting Nfatc2 as a driver of their expression. Plasma insulin also showed linkage to the same broad region on Chr 2. Overexpression of a constitutively active (ca) form of Nfatc2 induced ß-cell proliferation in mouse and human islets, and transcriptionally regulated more than half of the T2D-associated genes. Overexpression of either ca-Nfatc2 or ca-Nfatc1 in mouse islets enhanced insulin secretion, whereas only ca-Nfatc2 was able to promote ß-cell proliferation, suggesting distinct molecular pathways mediating insulin secretion vs. ß-cell proliferation are regulated by NFAT. Our results suggest that many of the T2D-associated genes are downstream transcriptional targets of NFAT, and may act coordinately in a pathway through which NFAT regulates ß-cell proliferation in both mouse and human islets.


Asunto(s)
Diabetes Mellitus Tipo 2/genética , Insulina/genética , Factores de Transcripción NFATC/genética , Animales , Proliferación Celular/genética , Mapeo Cromosómico , Diabetes Mellitus Tipo 2/metabolismo , Diabetes Mellitus Tipo 2/patología , Regulación de la Expresión Génica , Ligamiento Genético , Genoma , Estudio de Asociación del Genoma Completo , Humanos , Células Secretoras de Insulina/metabolismo , Células Secretoras de Insulina/patología , Islotes Pancreáticos/metabolismo , Islotes Pancreáticos/patología , Ratones , Ratones Obesos , Factores de Transcripción NFATC/biosíntesis , Regiones Promotoras Genéticas
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...