Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Genome Res ; 31(3): 359-371, 2021 03.
Artículo en Inglés | MEDLINE | ID: mdl-33452016

RESUMEN

Alternative splicing is an RNA processing mechanism that affects most genes in human, contributing to disease mechanisms and phenotypic diversity. The regulation of splicing involves an intricate network of cis-regulatory elements and trans-acting factors. Due to their high sequence specificity, cis-regulation of splicing can be altered by genetic variants, significantly affecting splicing outcomes. Recently, multiple methods have been applied to understanding the regulatory effects of genetic variants on splicing. However, it is still challenging to go beyond apparent association to pinpoint functional variants. To fill in this gap, we utilized large-scale data sets of the Genotype-Tissue Expression (GTEx) project to study genetically modulated alternative splicing (GMAS) via identification of allele-specific splicing events. We demonstrate that GMAS events are shared across tissues and individuals more often than expected by chance, consistent with their genetically driven nature. Moreover, although the allelic bias of GMAS exons varies across samples, the degree of variation is similar across tissues versus individuals. Thus, genetic background drives the GMAS pattern to a similar degree as tissue-specific splicing mechanisms. Leveraging the genetically driven nature of GMAS, we developed a new method to predict functional splicing-altering variants, built upon a genotype-phenotype concordance model across samples. Complemented by experimental validations, this method predicted >1000 functional variants, many of which may alter RNA-protein interactions. Lastly, 72% of GMAS-associated SNPs were in linkage disequilibrium with GWAS-reported SNPs, and such association was enriched in tissues of relevance for specific traits/diseases. Our study enables a comprehensive view of genetically driven splicing variations in human tissues.


Asunto(s)
Alelos , Empalme Alternativo/genética , Variación Genética , Línea Celular , Exones , Femenino , Estudio de Asociación del Genoma Completo , Humanos , Desequilibrio de Ligamiento , Masculino , Especificidad de Órganos/genética , Polimorfismo de Nucleótido Simple/genética
2.
Endocrinology ; 161(2)2020 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-31912136

RESUMEN

Soybean oil consumption has increased greatly in the past half-century and is linked to obesity and diabetes. To test the hypothesis that soybean oil diet alters hypothalamic gene expression in conjunction with metabolic phenotype, we performed RNA sequencing analysis using male mice fed isocaloric, high-fat diets based on conventional soybean oil (high in linoleic acid, LA), a genetically modified, low-LA soybean oil (Plenish), and coconut oil (high in saturated fat, containing no LA). The 2 soybean oil diets had similar but nonidentical effects on the hypothalamic transcriptome, whereas the coconut oil diet had a negligible effect compared to a low-fat control diet. Dysregulated genes were associated with inflammation, neuroendocrine, neurochemical, and insulin signaling. Oxt was the only gene with metabolic, inflammation, and neurological relevance upregulated by both soybean oil diets compared to both control diets. Oxytocin immunoreactivity in the supraoptic and paraventricular nuclei of the hypothalamus was reduced, whereas plasma oxytocin and hypothalamic Oxt were increased. These central and peripheral effects of soybean oil diets were correlated with glucose intolerance but not body weight. Alterations in hypothalamic Oxt and plasma oxytocin were not observed in the coconut oil diet enriched in stigmasterol, a phytosterol found in soybean oil. We postulate that neither stigmasterol nor LA is responsible for effects of soybean oil diets on oxytocin and that Oxt messenger RNA levels could be associated with the diabetic state. Given the ubiquitous presence of soybean oil in the American diet, its observed effects on hypothalamic gene expression could have important public health ramifications.


Asunto(s)
Diabetes Mellitus/etiología , Expresión Génica/efectos de los fármacos , Hipotálamo/efectos de los fármacos , Oxitocina/sangre , Aceite de Soja/efectos adversos , Animales , Inflamación/etiología , Ácido Linoleico/efectos adversos , Masculino , Ratones , Enfermedades del Sistema Nervioso/etiología , Obesidad/etiología , Estigmasterol/efectos adversos
3.
Nat Commun ; 10(1): 1338, 2019 03 22.
Artículo en Inglés | MEDLINE | ID: mdl-30902979

RESUMEN

Allele-specific protein-RNA binding is an essential aspect that may reveal functional genetic variants (GVs) mediating post-transcriptional regulation. Recently, genome-wide detection of in vivo binding of RNA-binding proteins is greatly facilitated by the enhanced crosslinking and immunoprecipitation (eCLIP) method. We developed a new computational approach, called BEAPR, to identify allele-specific binding (ASB) events in eCLIP-Seq data. BEAPR takes into account crosslinking-induced sequence propensity and variations between replicated experiments. Using simulated and actual data, we show that BEAPR largely outperforms often-used count analysis methods. Importantly, BEAPR overcomes the inherent overdispersion problem of these methods. Complemented by experimental validations, we demonstrate that the application of BEAPR to ENCODE eCLIP-Seq data of 154 proteins helps to predict functional GVs that alter splicing or mRNA abundance. Moreover, many GVs with ASB patterns have known disease relevance. Overall, BEAPR is an effective method that helps to address the outstanding challenge of functional interpretation of GVs.


Asunto(s)
Alelos , Variación Genética , Proteínas de Unión al ARN/metabolismo , ARN/genética , Regiones no Traducidas 3'/genética , Secuencias de Aminoácidos , Secuencia de Bases , Biología Computacional , Simulación por Computador , Enfermedad/genética , Predisposición Genética a la Enfermedad , Células Hep G2 , Humanos , Células K562 , Polimorfismo de Nucleótido Simple/genética , Unión Proteica , Sitios de Carácter Cuantitativo/genética , ARN Helicasas/metabolismo , Empalme del ARN/genética , ARN Mensajero/genética , ARN Mensajero/metabolismo , Reproducibilidad de los Resultados , Transactivadores/metabolismo
4.
Commun Biol ; 2: 19, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30652130

RESUMEN

Adenosine-to-inosine (A-to-I) editing, mediated by the ADAR enzymes, diversifies the transcriptome by altering RNA sequences. Recent studies reported global changes in RNA editing in disease and development. Such widespread editing variations necessitate an improved understanding of the regulatory mechanisms of RNA editing. Here, we study the roles of >200 RNA-binding proteins (RBPs) in mediating RNA editing in two human cell lines. Using RNA-sequencing and global protein-RNA binding data, we identify a number of RBPs as key regulators of A-to-I editing. These RBPs, such as TDP-43, DROSHA, NF45/90 and Ro60, mediate editing through various mechanisms including regulation of ADAR1 expression, interaction with ADAR1, and binding to Alu elements. We highlight that editing regulation by Ro60 is consistent with the global up-regulation of RNA editing in systemic lupus erythematosus. Additionally, most key editing regulators act in a cell type-specific manner. Together, our work provides insights for the regulatory mechanisms of RNA editing.


Asunto(s)
Adenosina Desaminasa/genética , Adenosina Desaminasa/metabolismo , Regulación Neoplásica de la Expresión Génica , Edición de ARN/genética , Proteínas de Unión al ARN/genética , Proteínas de Unión al ARN/metabolismo , Adenosina/genética , Elementos Alu , Autoantígenos/genética , Técnicas de Silenciamiento del Gen , Células Hep G2 , Humanos , Inosina/genética , Células K562 , Lupus Eritematoso Sistémico/genética , ARN Citoplasmático Pequeño/genética , Ribonucleoproteínas/genética , Análisis de Secuencia de ARN , Transcripción Genética , Transfección
5.
Genome Res ; 28(6): 812-823, 2018 06.
Artículo en Inglés | MEDLINE | ID: mdl-29724793

RESUMEN

In eukaryotes, nascent RNA transcripts undergo an intricate series of RNA processing steps to achieve mRNA maturation. RNA editing and alternative splicing are two major RNA processing steps that can introduce significant modifications to the final gene products. By tackling these processes in isolation, recent studies have enabled substantial progress in understanding their global RNA targets and regulatory pathways. However, the interplay between individual steps of RNA processing, an essential aspect of gene regulation, remains poorly understood. By sequencing the RNA of different subcellular fractions, we examined the timing of adenosine-to-inosine (A-to-I) RNA editing and its impact on alternative splicing. We observed that >95% A-to-I RNA editing events occurred in the chromatin-associated RNA prior to polyadenylation. We report about 500 editing sites in the 3' acceptor sequences that can alter splicing of the associated exons. These exons are highly conserved during evolution and reside in genes with important cellular function. Furthermore, we identified a second class of exons whose splicing is likely modulated by RNA secondary structures that are recognized by the RNA editing machinery. The genome-wide analyses, supported by experimental validations, revealed remarkable interplay between RNA editing and splicing and expanded the repertoire of functional RNA editing sites.


Asunto(s)
Regulación de la Expresión Génica/genética , Edición de ARN/genética , Precursores del ARN/genética , Empalme del ARN/genética , Adenosina/genética , Animales , Cromatina/genética , Exones/genética , Humanos , Inosina/genética , Mamíferos/genética , Conformación de Ácido Nucleico , Poliadenilación/genética
6.
Bioinformatics ; 32(23): 3593-3602, 2016 12 01.
Artículo en Inglés | MEDLINE | ID: mdl-27522083

RESUMEN

MOTIVATION: Differential transcript expression (DTE) analysis without predefined conditions is critical to biological studies. For example, it can be used to discover biomarkers to classify cancer samples into previously unknown subtypes such that better diagnosis and therapy methods can be developed for the subtypes. Although several DTE tools for population data, i.e. data without known biological conditions, have been published, these tools either assume binary conditions in the input population or require the number of conditions as a part of the input. Fixing the number of conditions to binary is unrealistic and may distort the results of a DTE analysis. Estimating the correct number of conditions in a population could also be challenging for a routine user. Moreover, the existing tools only provide differential usages of exons, which may be insufficient to interpret the patterns of alternative splicing across samples and restrains the applications of the tools from many biology studies. RESULTS: We propose a novel DTE analysis algorithm, called SDEAP, that estimates the number of conditions directly from the input samples using a Dirichlet mixture model and discovers alternative splicing events using a new graph modular decomposition algorithm. By taking advantage of the above technical improvement, SDEAP was able to outperform the other DTE analysis methods in our extensive experiments on simulated data and real data with qPCR validation. The prediction of SDEAP also allowed us to classify the samples of cancer subtypes and cell-cycle phases more accurately. AVAILABILITY AND IMPLEMENTATION: SDEAP is publicly available for free at https://github.com/ewyang089/SDEAP/wiki CONTACT: yyang027@cs.ucr.edu; jiang@cs.ucr.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Empalme Alternativo , Ciclo Celular , Exones , Humanos , Neoplasias/clasificación , Neoplasias/genética , Análisis de Secuencia de ARN , Programas Informáticos
7.
BMC Genomics ; 16 Suppl 2: S15, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25708199

RESUMEN

BACKGROUND: RNA-Seq based transcriptome assembly has become a fundamental technique for studying expressed mRNAs (i.e., transcripts or isoforms) in a cell using high-throughput sequencing technologies, and is serving as a basis to analyze the structural and quantitative differences of expressed isoforms between samples. However, the current transcriptome assembly algorithms are not specifically designed to handle large amounts of errors that are inherent in real RNA-Seq datasets, especially those involving multiple samples, making downstream differential analysis applications difficult. On the other hand, multiple sample RNA-Seq datasets may provide more information than single sample datasets that can be utilized to improve the performance of transcriptome assembly and abundance estimation, but such information remains overlooked by the existing assembly tools. RESULTS: We formulate a computational framework of transcriptome assembly that is capable of handling noisy RNA-Seq reads and multiple sample RNA-Seq datasets efficiently. We show that finding an optimal solution under this framework is an NP-hard problem. Instead, we develop an efficient heuristic algorithm, called Iterative Shortest Path (ISP), based on linear programming (LP) and integer linear programming (ILP). Our preliminary experimental results on both simulated and real datasets and comparison with the existing assembly tools demonstrate that (i) the ISP algorithm is able to assemble transcriptomes with a greatly increased precision while keeping the same level of sensitivity, especially when many samples are involved, and (ii) its assembly results help improve downstream differential analysis. The source code of ISP is freely available at http://alumni.cs.ucr.edu/~liw/isp.html.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Análisis de Secuencia de ARN/estadística & datos numéricos , Transcriptoma/genética , Empalme Alternativo , Animales , Simulación por Computador , Perfilación de la Expresión Génica/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Humanos , Internet , Modelos Genéticos , Isoformas de Proteínas/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/métodos , Programas Informáticos
8.
Mol Pharmacol ; 87(2): 218-30, 2015 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-25403678

RESUMEN

Tyrosinase, a key copper-containing enzyme involved in melanin biosynthesis, is closely associated with hyperpigmentation disorders, cancer, and neurodegenerative diseases, and as such, it is an essential target in medicine and cosmetics. Known tyrosinase inhibitors possess adverse side effects, and there are no safety regulations; therefore, it is necessary to develop new inhibitors with fewer side effects and less toxicity. Peptides are exquisitely specific to their in vivo targets, with high potencies and relatively few off-target side effects. Thus, we systematically and comprehensively investigated the tyrosinase-inhibitory abilities of N- and C-terminal cysteine/tyrosine-containing tetrapeptides by constructing a phage-display random tetrapeptide library and conducting computational molecular docking studies on novel tyrosinase tetrapeptide inhibitors. We found that N-terminal cysteine-containing tetrapeptides exhibited the most potent tyrosinase-inhibitory abilities. The positional preference of cysteine residues at the N terminus in the tetrapeptides significantly contributed to their tyrosinase-inhibitory function. The sulfur atom in cysteine moieties of N- and C-terminal cysteine-containing tetrapeptides coordinated with copper ions, which then tightly blocked substrate-binding sites. N- and C-terminal tyrosine-containing tetrapeptides functioned as competitive inhibitors against mushroom tyrosinase by using the phenol ring of tyrosine to stack with the imidazole ring of His263, thus competing for the substrate-binding site. The N-terminal cysteine-containing tetrapeptide CRVI exhibited the strongest tyrosinase-inhibitory potency (with an IC50 of 2.7 ± 0.5 µM), which was superior to those of the known tyrosinase inhibitors (arbutin and kojic acid) and outperformed kojic acid-tripeptides, mimosine-FFY, and short-sequence oligopeptides at inhibiting mushroom tyrosinase.


Asunto(s)
Cisteína/metabolismo , Monofenol Monooxigenasa/metabolismo , Oligopéptidos/metabolismo , Biblioteca de Péptidos , Azufre/metabolismo , Agaricales/enzimología , Agaricales/genética , Secuencia de Bases , Cisteína/genética , Inhibidores Enzimáticos/administración & dosificación , Inhibidores Enzimáticos/metabolismo , Datos de Secuencia Molecular , Monofenol Monooxigenasa/antagonistas & inhibidores , Oligopéptidos/administración & dosificación , Oligopéptidos/genética , Unión Proteica/fisiología , Estructura Secundaria de Proteína , Estructura Terciaria de Proteína
9.
Bioinformatics ; 29(17): 2153-61, 2013 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-23793751

RESUMEN

MOTIVATION: RNA-Seq is increasingly being used for differential gene expression analysis, which was dominated by the microarray technology in the past decade. However, inferring differential gene expression based on the observed difference of RNA-Seq read counts has unique challenges that were not present in microarray-based analysis. The differential expression estimation may be biased against low read count values such that the differential expression of genes with high read counts is more easily detected. The estimation bias may further propagate in downstream analyses at the systems biology level if it is not corrected. RESULTS: To obtain a better inference of differential gene expression, we propose a new efficient algorithm based on a Markov random field (MRF) model, called MRFSeq, that uses additional gene coexpression data to enhance the prediction power. Our main technical contribution is the careful selection of the clique potential functions in the MRF so its maximum a posteriori estimation can be reduced to the well-known maximum flow problem and thus solved in polynomial time. Our extensive experiments on simulated and real RNA-Seq datasets demonstrate that MRFSeq is more accurate and less biased against genes with low read counts than the existing methods based on RNA-Seq data alone. For example, on the well-studied MAQC dataset, MRFSeq improved the sensitivity from 11.6 to 38.8% for genes with low read counts. AVAILABILITY: MRFSeq is implemented in C and available at http://www.cs.ucr.edu/~yyang027/mrfseq.htm


Asunto(s)
Perfilación de la Expresión Génica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ARN/métodos , Algoritmos , Cadenas de Markov
10.
PLoS One ; 7(7): e40846, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22848404

RESUMEN

Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.


Asunto(s)
Inteligencia Artificial , Carbohidratos/química , Bases de Datos de Proteínas , Modelos Moleculares , Proteínas/química , Análisis de Secuencia de Proteína , Sitios de Unión , Proteínas/genética
11.
PLoS One ; 7(6): e37706, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22701576

RESUMEN

Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.


Asunto(s)
Aminoácidos/química , Biología Computacional/métodos , Modelos Químicos , Modelos Moleculares , Mapeo de Interacción de Proteínas/métodos , Proteínas/química , Algoritmos , Inteligencia Artificial , Simulación por Computador , Redes Neurales de la Computación , Probabilidad , Distribuciones Estadísticas , Estadísticas no Paramétricas
12.
PLoS One ; 7(3): e33340, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22457753

RESUMEN

Protein-protein interactions are critical determinants in biological systems. Engineered proteins binding to specific areas on protein surfaces could lead to therapeutics or diagnostics for treating diseases in humans. But designing epitope-specific protein-protein interactions with computational atomistic interaction free energy remains a difficult challenge. Here we show that, with the antibody-VEGF (vascular endothelial growth factor) interaction as a model system, the experimentally observed amino acid preferences in the antibody-antigen interface can be rationalized with 3-dimensional distributions of interacting atoms derived from the database of protein structures. Machine learning models established on the rationalization can be generalized to design amino acid preferences in antibody-antigen interfaces, for which the experimental validations are tractable with current high throughput synthetic antibody display technologies. Leave-one-out cross validation on the benchmark system yielded the accuracy, precision, recall (sensitivity) and specificity of the overall binary predictions to be 0.69, 0.45, 0.63, and 0.71 respectively, and the overall Matthews correlation coefficient of the 20 amino acid types in the 24 interface CDR positions was 0.312. The structure-based computational antibody design methodology was further tested with other antibodies binding to VEGF. The results indicate that the methodology could provide alternatives to the current antibody technologies based on animal immune systems in engineering therapeutic and diagnostic antibodies against predetermined antigen epitopes.


Asunto(s)
Reacciones Antígeno-Anticuerpo , Regiones Determinantes de Complementariedad , Inteligencia Artificial , Sitios de Unión de Anticuerpos , Cristalografía por Rayos X , Humanos , Modelos Moleculares , Reproducibilidad de los Resultados , Anticuerpos de Cadena Única/química , Anticuerpos de Cadena Única/inmunología , Factor A de Crecimiento Endotelial Vascular/inmunología
13.
Bioinformatics ; 24(23): 2691-7, 2008 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-18974075

RESUMEN

MOTIVATION: Regulatory proteases modulate proteomic dynamics with a spectrum of specificities against substrate proteins. Predictions of the substrate sites in a proteome for the proteases would facilitate understanding the biological functions of the proteases. High-throughput experiments could generate suitable datasets for machine learning to grasp complex relationships between the substrate sequences and the enzymatic specificities. But the capability in predicting protease substrate sites by integrating the machine learning algorithms with the experimental methodology has yet to be demonstrated. RESULTS: Factor Xa, a key regulatory protease in the blood coagulation system, was used as model system, for which effective substrate site predictors were developed and benchmarked. The predictors were derived from bootstrap aggregation (machine learning) algorithms trained with data obtained from multilevel substrate phage display experiments. The experimental sampling and computational learning on substrate specificities can be generalized to proteases for which the active forms are available for the in vitro experiments. AVAILABILITY: http://asqa.iis.sinica.edu.tw/fXaWeb/


Asunto(s)
Inteligencia Artificial , Biología Computacional/métodos , Péptido Hidrolasas/química , Biblioteca de Péptidos , Algoritmos , Animales , Sitios de Unión , Simulación por Computador , Bases de Datos de Proteínas , Humanos , Cinética , Modelos Biológicos , Especificidad por Sustrato
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA