Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Genome Res ; 31(3): 359-371, 2021 03.
Artigo em Inglês | MEDLINE | ID: mdl-33452016

RESUMO

Alternative splicing is an RNA processing mechanism that affects most genes in human, contributing to disease mechanisms and phenotypic diversity. The regulation of splicing involves an intricate network of cis-regulatory elements and trans-acting factors. Due to their high sequence specificity, cis-regulation of splicing can be altered by genetic variants, significantly affecting splicing outcomes. Recently, multiple methods have been applied to understanding the regulatory effects of genetic variants on splicing. However, it is still challenging to go beyond apparent association to pinpoint functional variants. To fill in this gap, we utilized large-scale data sets of the Genotype-Tissue Expression (GTEx) project to study genetically modulated alternative splicing (GMAS) via identification of allele-specific splicing events. We demonstrate that GMAS events are shared across tissues and individuals more often than expected by chance, consistent with their genetically driven nature. Moreover, although the allelic bias of GMAS exons varies across samples, the degree of variation is similar across tissues versus individuals. Thus, genetic background drives the GMAS pattern to a similar degree as tissue-specific splicing mechanisms. Leveraging the genetically driven nature of GMAS, we developed a new method to predict functional splicing-altering variants, built upon a genotype-phenotype concordance model across samples. Complemented by experimental validations, this method predicted >1000 functional variants, many of which may alter RNA-protein interactions. Lastly, 72% of GMAS-associated SNPs were in linkage disequilibrium with GWAS-reported SNPs, and such association was enriched in tissues of relevance for specific traits/diseases. Our study enables a comprehensive view of genetically driven splicing variations in human tissues.


Assuntos
Alelos , Processamento Alternativo/genética , Variação Genética , Linhagem Celular , Éxons , Feminino , Estudo de Associação Genômica Ampla , Humanos , Desequilíbrio de Ligação , Masculino , Especificidade de Órgãos/genética , Polimorfismo de Nucleotídeo Único/genética
2.
Genome Res ; 28(6): 812-823, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29724793

RESUMO

In eukaryotes, nascent RNA transcripts undergo an intricate series of RNA processing steps to achieve mRNA maturation. RNA editing and alternative splicing are two major RNA processing steps that can introduce significant modifications to the final gene products. By tackling these processes in isolation, recent studies have enabled substantial progress in understanding their global RNA targets and regulatory pathways. However, the interplay between individual steps of RNA processing, an essential aspect of gene regulation, remains poorly understood. By sequencing the RNA of different subcellular fractions, we examined the timing of adenosine-to-inosine (A-to-I) RNA editing and its impact on alternative splicing. We observed that >95% A-to-I RNA editing events occurred in the chromatin-associated RNA prior to polyadenylation. We report about 500 editing sites in the 3' acceptor sequences that can alter splicing of the associated exons. These exons are highly conserved during evolution and reside in genes with important cellular function. Furthermore, we identified a second class of exons whose splicing is likely modulated by RNA secondary structures that are recognized by the RNA editing machinery. The genome-wide analyses, supported by experimental validations, revealed remarkable interplay between RNA editing and splicing and expanded the repertoire of functional RNA editing sites.


Assuntos
Regulação da Expressão Gênica/genética , Edição de RNA/genética , Precursores de RNA/genética , Splicing de RNA/genética , Adenosina/genética , Animais , Cromatina/genética , Éxons/genética , Humanos , Inosina/genética , Mamíferos/genética , Conformação de Ácido Nucleico , Poliadenilação/genética
3.
Bioinformatics ; 32(23): 3593-3602, 2016 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-27522083

RESUMO

MOTIVATION: Differential transcript expression (DTE) analysis without predefined conditions is critical to biological studies. For example, it can be used to discover biomarkers to classify cancer samples into previously unknown subtypes such that better diagnosis and therapy methods can be developed for the subtypes. Although several DTE tools for population data, i.e. data without known biological conditions, have been published, these tools either assume binary conditions in the input population or require the number of conditions as a part of the input. Fixing the number of conditions to binary is unrealistic and may distort the results of a DTE analysis. Estimating the correct number of conditions in a population could also be challenging for a routine user. Moreover, the existing tools only provide differential usages of exons, which may be insufficient to interpret the patterns of alternative splicing across samples and restrains the applications of the tools from many biology studies. RESULTS: We propose a novel DTE analysis algorithm, called SDEAP, that estimates the number of conditions directly from the input samples using a Dirichlet mixture model and discovers alternative splicing events using a new graph modular decomposition algorithm. By taking advantage of the above technical improvement, SDEAP was able to outperform the other DTE analysis methods in our extensive experiments on simulated data and real data with qPCR validation. The prediction of SDEAP also allowed us to classify the samples of cancer subtypes and cell-cycle phases more accurately. AVAILABILITY AND IMPLEMENTATION: SDEAP is publicly available for free at https://github.com/ewyang089/SDEAP/wiki CONTACT: yyang027@cs.ucr.edu; jiang@cs.ucr.eduSupplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Processamento Alternativo , Ciclo Celular , Éxons , Humanos , Neoplasias/classificação , Neoplasias/genética , Análise de Sequência de RNA , Software
4.
Mol Pharmacol ; 87(2): 218-30, 2015 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-25403678

RESUMO

Tyrosinase, a key copper-containing enzyme involved in melanin biosynthesis, is closely associated with hyperpigmentation disorders, cancer, and neurodegenerative diseases, and as such, it is an essential target in medicine and cosmetics. Known tyrosinase inhibitors possess adverse side effects, and there are no safety regulations; therefore, it is necessary to develop new inhibitors with fewer side effects and less toxicity. Peptides are exquisitely specific to their in vivo targets, with high potencies and relatively few off-target side effects. Thus, we systematically and comprehensively investigated the tyrosinase-inhibitory abilities of N- and C-terminal cysteine/tyrosine-containing tetrapeptides by constructing a phage-display random tetrapeptide library and conducting computational molecular docking studies on novel tyrosinase tetrapeptide inhibitors. We found that N-terminal cysteine-containing tetrapeptides exhibited the most potent tyrosinase-inhibitory abilities. The positional preference of cysteine residues at the N terminus in the tetrapeptides significantly contributed to their tyrosinase-inhibitory function. The sulfur atom in cysteine moieties of N- and C-terminal cysteine-containing tetrapeptides coordinated with copper ions, which then tightly blocked substrate-binding sites. N- and C-terminal tyrosine-containing tetrapeptides functioned as competitive inhibitors against mushroom tyrosinase by using the phenol ring of tyrosine to stack with the imidazole ring of His263, thus competing for the substrate-binding site. The N-terminal cysteine-containing tetrapeptide CRVI exhibited the strongest tyrosinase-inhibitory potency (with an IC50 of 2.7 ± 0.5 µM), which was superior to those of the known tyrosinase inhibitors (arbutin and kojic acid) and outperformed kojic acid-tripeptides, mimosine-FFY, and short-sequence oligopeptides at inhibiting mushroom tyrosinase.


Assuntos
Cisteína/metabolismo , Monofenol Mono-Oxigenase/metabolismo , Oligopeptídeos/metabolismo , Biblioteca de Peptídeos , Enxofre/metabolismo , Agaricales/enzimologia , Agaricales/genética , Sequência de Bases , Cisteína/genética , Inibidores Enzimáticos/administração & dosagem , Inibidores Enzimáticos/metabolismo , Dados de Sequência Molecular , Monofenol Mono-Oxigenase/antagonistas & inibidores , Oligopeptídeos/administração & dosagem , Oligopeptídeos/genética , Ligação Proteica/fisiologia , Estrutura Secundária de Proteína , Estrutura Terciária de Proteína
5.
BMC Genomics ; 16 Suppl 2: S15, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25708199

RESUMO

BACKGROUND: RNA-Seq based transcriptome assembly has become a fundamental technique for studying expressed mRNAs (i.e., transcripts or isoforms) in a cell using high-throughput sequencing technologies, and is serving as a basis to analyze the structural and quantitative differences of expressed isoforms between samples. However, the current transcriptome assembly algorithms are not specifically designed to handle large amounts of errors that are inherent in real RNA-Seq datasets, especially those involving multiple samples, making downstream differential analysis applications difficult. On the other hand, multiple sample RNA-Seq datasets may provide more information than single sample datasets that can be utilized to improve the performance of transcriptome assembly and abundance estimation, but such information remains overlooked by the existing assembly tools. RESULTS: We formulate a computational framework of transcriptome assembly that is capable of handling noisy RNA-Seq reads and multiple sample RNA-Seq datasets efficiently. We show that finding an optimal solution under this framework is an NP-hard problem. Instead, we develop an efficient heuristic algorithm, called Iterative Shortest Path (ISP), based on linear programming (LP) and integer linear programming (ILP). Our preliminary experimental results on both simulated and real datasets and comparison with the existing assembly tools demonstrate that (i) the ISP algorithm is able to assemble transcriptomes with a greatly increased precision while keeping the same level of sensitivity, especially when many samples are involved, and (ii) its assembly results help improve downstream differential analysis. The source code of ISP is freely available at http://alumni.cs.ucr.edu/~liw/isp.html.


Assuntos
Algoritmos , Biologia Computacional/métodos , Análise de Sequência de RNA/estatística & dados numéricos , Transcriptoma/genética , Processamento Alternativo , Animais , Simulação por Computador , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Internet , Modelos Genéticos , Isoformas de Proteínas/genética , Reprodutibilidade dos Testes , Análise de Sequência de RNA/métodos , Software
6.
Bioinformatics ; 29(17): 2153-61, 2013 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-23793751

RESUMO

MOTIVATION: RNA-Seq is increasingly being used for differential gene expression analysis, which was dominated by the microarray technology in the past decade. However, inferring differential gene expression based on the observed difference of RNA-Seq read counts has unique challenges that were not present in microarray-based analysis. The differential expression estimation may be biased against low read count values such that the differential expression of genes with high read counts is more easily detected. The estimation bias may further propagate in downstream analyses at the systems biology level if it is not corrected. RESULTS: To obtain a better inference of differential gene expression, we propose a new efficient algorithm based on a Markov random field (MRF) model, called MRFSeq, that uses additional gene coexpression data to enhance the prediction power. Our main technical contribution is the careful selection of the clique potential functions in the MRF so its maximum a posteriori estimation can be reduced to the well-known maximum flow problem and thus solved in polynomial time. Our extensive experiments on simulated and real RNA-Seq datasets demonstrate that MRFSeq is more accurate and less biased against genes with low read counts than the existing methods based on RNA-Seq data alone. For example, on the well-studied MAQC dataset, MRFSeq improved the sensitivity from 11.6 to 38.8% for genes with low read counts. AVAILABILITY: MRFSeq is implemented in C and available at http://www.cs.ucr.edu/~yyang027/mrfseq.htm


Assuntos
Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Algoritmos , Cadeias de Markov
7.
Endocrinology ; 161(2)2020 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-31912136

RESUMO

Soybean oil consumption has increased greatly in the past half-century and is linked to obesity and diabetes. To test the hypothesis that soybean oil diet alters hypothalamic gene expression in conjunction with metabolic phenotype, we performed RNA sequencing analysis using male mice fed isocaloric, high-fat diets based on conventional soybean oil (high in linoleic acid, LA), a genetically modified, low-LA soybean oil (Plenish), and coconut oil (high in saturated fat, containing no LA). The 2 soybean oil diets had similar but nonidentical effects on the hypothalamic transcriptome, whereas the coconut oil diet had a negligible effect compared to a low-fat control diet. Dysregulated genes were associated with inflammation, neuroendocrine, neurochemical, and insulin signaling. Oxt was the only gene with metabolic, inflammation, and neurological relevance upregulated by both soybean oil diets compared to both control diets. Oxytocin immunoreactivity in the supraoptic and paraventricular nuclei of the hypothalamus was reduced, whereas plasma oxytocin and hypothalamic Oxt were increased. These central and peripheral effects of soybean oil diets were correlated with glucose intolerance but not body weight. Alterations in hypothalamic Oxt and plasma oxytocin were not observed in the coconut oil diet enriched in stigmasterol, a phytosterol found in soybean oil. We postulate that neither stigmasterol nor LA is responsible for effects of soybean oil diets on oxytocin and that Oxt messenger RNA levels could be associated with the diabetic state. Given the ubiquitous presence of soybean oil in the American diet, its observed effects on hypothalamic gene expression could have important public health ramifications.


Assuntos
Diabetes Mellitus/etiologia , Expressão Gênica/efeitos dos fármacos , Hipotálamo/efeitos dos fármacos , Ocitocina/sangue , Óleo de Soja/efeitos adversos , Animais , Inflamação/etiologia , Ácido Linoleico/efeitos adversos , Masculino , Camundongos , Doenças do Sistema Nervoso/etiologia , Obesidade/etiologia , Estigmasterol/efeitos adversos
8.
Bioinformatics ; 24(23): 2691-7, 2008 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-18974075

RESUMO

MOTIVATION: Regulatory proteases modulate proteomic dynamics with a spectrum of specificities against substrate proteins. Predictions of the substrate sites in a proteome for the proteases would facilitate understanding the biological functions of the proteases. High-throughput experiments could generate suitable datasets for machine learning to grasp complex relationships between the substrate sequences and the enzymatic specificities. But the capability in predicting protease substrate sites by integrating the machine learning algorithms with the experimental methodology has yet to be demonstrated. RESULTS: Factor Xa, a key regulatory protease in the blood coagulation system, was used as model system, for which effective substrate site predictors were developed and benchmarked. The predictors were derived from bootstrap aggregation (machine learning) algorithms trained with data obtained from multilevel substrate phage display experiments. The experimental sampling and computational learning on substrate specificities can be generalized to proteases for which the active forms are available for the in vitro experiments. AVAILABILITY: http://asqa.iis.sinica.edu.tw/fXaWeb/


Assuntos
Inteligência Artificial , Biologia Computacional/métodos , Peptídeo Hidrolases/química , Biblioteca de Peptídeos , Algoritmos , Animais , Sítios de Ligação , Simulação por Computador , Bases de Dados de Proteínas , Humanos , Cinética , Modelos Biológicos , Especificidade por Substrato
9.
Commun Biol ; 2: 19, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-30652130

RESUMO

Adenosine-to-inosine (A-to-I) editing, mediated by the ADAR enzymes, diversifies the transcriptome by altering RNA sequences. Recent studies reported global changes in RNA editing in disease and development. Such widespread editing variations necessitate an improved understanding of the regulatory mechanisms of RNA editing. Here, we study the roles of >200 RNA-binding proteins (RBPs) in mediating RNA editing in two human cell lines. Using RNA-sequencing and global protein-RNA binding data, we identify a number of RBPs as key regulators of A-to-I editing. These RBPs, such as TDP-43, DROSHA, NF45/90 and Ro60, mediate editing through various mechanisms including regulation of ADAR1 expression, interaction with ADAR1, and binding to Alu elements. We highlight that editing regulation by Ro60 is consistent with the global up-regulation of RNA editing in systemic lupus erythematosus. Additionally, most key editing regulators act in a cell type-specific manner. Together, our work provides insights for the regulatory mechanisms of RNA editing.


Assuntos
Adenosina Desaminase/genética , Adenosina Desaminase/metabolismo , Regulação Neoplásica da Expressão Gênica , Edição de RNA/genética , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Adenosina/genética , Elementos Alu , Autoantígenos/genética , Técnicas de Silenciamento de Genes , Células Hep G2 , Humanos , Inosina/genética , Células K562 , Lúpus Eritematoso Sistêmico/genética , RNA Citoplasmático Pequeno/genética , Ribonucleoproteínas/genética , Análise de Sequência de RNA , Transcrição Gênica , Transfecção
10.
Nat Commun ; 10(1): 1338, 2019 03 22.
Artigo em Inglês | MEDLINE | ID: mdl-30902979

RESUMO

Allele-specific protein-RNA binding is an essential aspect that may reveal functional genetic variants (GVs) mediating post-transcriptional regulation. Recently, genome-wide detection of in vivo binding of RNA-binding proteins is greatly facilitated by the enhanced crosslinking and immunoprecipitation (eCLIP) method. We developed a new computational approach, called BEAPR, to identify allele-specific binding (ASB) events in eCLIP-Seq data. BEAPR takes into account crosslinking-induced sequence propensity and variations between replicated experiments. Using simulated and actual data, we show that BEAPR largely outperforms often-used count analysis methods. Importantly, BEAPR overcomes the inherent overdispersion problem of these methods. Complemented by experimental validations, we demonstrate that the application of BEAPR to ENCODE eCLIP-Seq data of 154 proteins helps to predict functional GVs that alter splicing or mRNA abundance. Moreover, many GVs with ASB patterns have known disease relevance. Overall, BEAPR is an effective method that helps to address the outstanding challenge of functional interpretation of GVs.


Assuntos
Alelos , Variação Genética , Proteínas de Ligação a RNA/metabolismo , RNA/genética , Regiões 3' não Traduzidas/genética , Motivos de Aminoácidos , Sequência de Bases , Biologia Computacional , Simulação por Computador , Doença/genética , Predisposição Genética para Doença , Células Hep G2 , Humanos , Células K562 , Polimorfismo de Nucleotídeo Único/genética , Ligação Proteica , Locos de Características Quantitativas/genética , RNA Helicases/metabolismo , Splicing de RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reprodutibilidade dos Testes , Transativadores/metabolismo
11.
PLoS One ; 7(7): e40846, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22848404

RESUMO

Non-covalent protein-carbohydrate interactions mediate molecular targeting in many biological processes. Prediction of non-covalent carbohydrate binding sites on protein surfaces not only provides insights into the functions of the query proteins; information on key carbohydrate-binding residues could suggest site-directed mutagenesis experiments, design therapeutics targeting carbohydrate-binding proteins, and provide guidance in engineering protein-carbohydrate interactions. In this work, we show that non-covalent carbohydrate binding sites on protein surfaces can be predicted with relatively high accuracy when the query protein structures are known. The prediction capabilities were based on a novel encoding scheme of the three-dimensional probability density maps describing the distributions of 36 non-covalent interacting atom types around protein surfaces. One machine learning model was trained for each of the 30 protein atom types. The machine learning algorithms predicted tentative carbohydrate binding sites on query proteins by recognizing the characteristic interacting atom distribution patterns specific for carbohydrate binding sites from known protein structures. The prediction results for all protein atom types were integrated into surface patches as tentative carbohydrate binding sites based on normalized prediction confidence level. The prediction capabilities of the predictors were benchmarked by a 10-fold cross validation on 497 non-redundant proteins with known carbohydrate binding sites. The predictors were further tested on an independent test set with 108 proteins. The residue-based Matthews correlation coefficient (MCC) for the independent test was 0.45, with prediction precision and sensitivity (or recall) of 0.45 and 0.49 respectively. In addition, 111 unbound carbohydrate-binding protein structures for which the structures were determined in the absence of the carbohydrate ligands were predicted with the trained predictors. The overall prediction MCC was 0.49. Independent tests on anti-carbohydrate antibodies showed that the carbohydrate antigen binding sites were predicted with comparable accuracy. These results demonstrate that the predictors are among the best in carbohydrate binding site predictions to date.


Assuntos
Inteligência Artificial , Carboidratos/química , Bases de Dados de Proteínas , Modelos Moleculares , Proteínas/química , Análise de Sequência de Proteína , Sítios de Ligação , Proteínas/genética
12.
PLoS One ; 7(6): e37706, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22701576

RESUMO

Protein-protein interactions are key to many biological processes. Computational methodologies devised to predict protein-protein interaction (PPI) sites on protein surfaces are important tools in providing insights into the biological functions of proteins and in developing therapeutics targeting the protein-protein interaction sites. One of the general features of PPI sites is that the core regions from the two interacting protein surfaces are complementary to each other, similar to the interior of proteins in packing density and in the physicochemical nature of the amino acid composition. In this work, we simulated the physicochemical complementarities by constructing three-dimensional probability density maps of non-covalent interacting atoms on the protein surfaces. The interacting probabilities were derived from the interior of known structures. Machine learning algorithms were applied to learn the characteristic patterns of the probability density maps specific to the PPI sites. The trained predictors for PPI sites were cross-validated with the training cases (consisting of 432 proteins) and were tested on an independent dataset (consisting of 142 proteins). The residue-based Matthews correlation coefficient for the independent test set was 0.423; the accuracy, precision, sensitivity, specificity were 0.753, 0.519, 0.677, and 0.779 respectively. The benchmark results indicate that the optimized machine learning models are among the best predictors in identifying PPI sites on protein surfaces. In particular, the PPI site prediction accuracy increases with increasing size of the PPI site and with increasing hydrophobicity in amino acid composition of the PPI interface; the core interface regions are more likely to be recognized with high prediction confidence. The results indicate that the physicochemical complementarity patterns on protein surfaces are important determinants in PPIs, and a substantial portion of the PPI sites can be predicted correctly with the physicochemical complementarity features based on the non-covalent interaction data derived from protein interiors.


Assuntos
Aminoácidos/química , Biologia Computacional/métodos , Modelos Químicos , Modelos Moleculares , Mapeamento de Interação de Proteínas/métodos , Proteínas/química , Algoritmos , Inteligência Artificial , Simulação por Computador , Redes Neurais de Computação , Probabilidade , Distribuições Estatísticas , Estatísticas não Paramétricas
13.
PLoS One ; 7(3): e33340, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22457753

RESUMO

Protein-protein interactions are critical determinants in biological systems. Engineered proteins binding to specific areas on protein surfaces could lead to therapeutics or diagnostics for treating diseases in humans. But designing epitope-specific protein-protein interactions with computational atomistic interaction free energy remains a difficult challenge. Here we show that, with the antibody-VEGF (vascular endothelial growth factor) interaction as a model system, the experimentally observed amino acid preferences in the antibody-antigen interface can be rationalized with 3-dimensional distributions of interacting atoms derived from the database of protein structures. Machine learning models established on the rationalization can be generalized to design amino acid preferences in antibody-antigen interfaces, for which the experimental validations are tractable with current high throughput synthetic antibody display technologies. Leave-one-out cross validation on the benchmark system yielded the accuracy, precision, recall (sensitivity) and specificity of the overall binary predictions to be 0.69, 0.45, 0.63, and 0.71 respectively, and the overall Matthews correlation coefficient of the 20 amino acid types in the 24 interface CDR positions was 0.312. The structure-based computational antibody design methodology was further tested with other antibodies binding to VEGF. The results indicate that the methodology could provide alternatives to the current antibody technologies based on animal immune systems in engineering therapeutic and diagnostic antibodies against predetermined antigen epitopes.


Assuntos
Reações Antígeno-Anticorpo , Regiões Determinantes de Complementaridade , Inteligência Artificial , Sítios de Ligação de Anticorpos , Cristalografia por Raios X , Humanos , Modelos Moleculares , Reprodutibilidade dos Testes , Anticorpos de Cadeia Única/química , Anticorpos de Cadeia Única/imunologia , Fator A de Crescimento do Endotélio Vascular/imunologia
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa