Pesquisa | Secretaria de Estado da Saúde

1.

InterPro in 2022.

Paysan-Lafosse, Typhaine; Blum, Matthias; Chuguransky, Sara; Grego, Tiago; Pinto, Beatriz Lázaro; Salazar, Gustavo A; Bileschi, Maxwell L; Bork, Peer; Bridge, Alan; Colwell, Lucy; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex.

Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.

Artigo em Inglês | MEDLINE | ID: mdl-36350672

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.

Assuntos

Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , Software

2.

A crowdsourcing open platform for literature curation in UniProt.

Wang, Yuqi; Wang, Qinghua; Huang, Hongzhan; Huang, Wei; Chen, Yongxing; McGarvey, Peter B; Wu, Cathy H; Arighi, Cecilia N.

PLoS Biol ; 19(12): e3001464, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34871295

RESUMO

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.

Assuntos

Crowdsourcing/métodos , Curadoria de Dados/métodos , Anotação de Sequência Molecular/métodos , Sequência de Aminoácidos/genética , Biologia Computacional/métodos , Bases de Dados de Proteínas/tendências , Humanos , Literatura , Proteínas/metabolismo , Participação dos Interessados

3.

The InterPro protein families and domains database: 20 years on.

Blum, Matthias; Chang, Hsin-Yu; Chuguransky, Sara; Grego, Tiago; Kandasaamy, Swaathi; Mitchell, Alex; Nuka, Gift; Paysan-Lafosse, Typhaine; Qureshi, Matloob; Raj, Shriya; Richardson, Lorna; Salazar, Gustavo A; Williams, Lowri; Bork, Peer; Bridge, Alan; Gough, Julian; Haft, Daniel H; Letunic, Ivica; Marchler-Bauer, Aron; Mi, Huaiyu; Natale, Darren A; Necci, Marco; Orengo, Christine A; Pandurangan, Arun P; Rivoire, Catherine; Sigrist, Christian J A; Sillitoe, Ian; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Bateman, Alex; Finn, Robert D.

Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.

Assuntos

Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência

4.

Global analysis of switchgrass (Panicum virgatum L.) transcriptomes in response to interactive effects of drought and heat stresses.

Hayford, Rita K; Serba, Desalegn D; Xie, Shaojun; Ayyappan, Vasudevan; Thimmapuram, Jyothi; Saha, Malay C; Wu, Cathy H; Kalavacharla, Venu Kal.

BMC Plant Biol ; 22(1): 107, 2022 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-35260072

RESUMO

BACKGROUND: Sustainable production of high-quality feedstock has been of great interest in bioenergy research. Despite the economic importance, high temperatures and water deficit are limiting factors for the successful cultivation of switchgrass in semi-arid areas. There are limited reports on the molecular basis of combined abiotic stress tolerance in switchgrass, particularly the combination of drought and heat stress. We used transcriptomic approaches to elucidate the changes in the response of switchgrass to drought and high temperature simultaneously. RESULTS: We conducted solely drought treatment in switchgrass plant Alamo AP13 by withholding water after 45 days of growing. For the combination of drought and heat effect, heat treatment (35 °C/25 °C day/night) was imposed after 72 h of the initiation of drought. Samples were collected at 0 h, 72 h, 96 h, 120 h, 144 h, and 168 h after treatment imposition, total RNA was extracted, and RNA-Seq conducted. Out of a total of 32,190 genes, we identified 3912, as drought (DT) responsive genes, 2339 and 4635 as, heat (HT) and drought and heat (DTHT) responsive genes, respectively. There were 209, 106, and 220 transcription factors (TFs) differentially expressed under DT, HT and DTHT respectively. Gene ontology annotation identified the metabolic process as the significant term enriched in DTHT genes. Other biological processes identified in DTHT responsive genes included: response to water, photosynthesis, oxidation-reduction processes, and response to stress. KEGG pathway enrichment analysis on DT and DTHT responsive genes revealed that TFs and genes controlling phenylpropanoid pathways were important for individual as well as combined stress response. For example, hydroxycinnamoyl-CoA shikimate/quinate hydroxycinnamoyl transferase (HCT) from the phenylpropanoid pathway was induced by single DT and combinations of DTHT stress. CONCLUSION: Through RNA-Seq analysis, we have identified unique and overlapping genes in response to DT and combined DTHT stress in switchgrass. The combination of DT and HT stress may affect the photosynthetic machinery and phenylpropanoid pathway of switchgrass which negatively impacts lignin synthesis and biomass production of switchgrass. The biological function of genes identified particularly in response to DTHT stress could further be confirmed by techniques such as single point mutation or RNAi.

Assuntos

Adaptação Fisiológica/genética , Desidratação/genética , Resposta ao Choque Térmico/genética , Panicum/genética , Transcriptoma , Perfilação da Expressão Gênica , Regulação da Expressão Gênica de Plantas , Genes de Plantas

5.

COVID-19 Knowledge Graph from semantic integration of biomedical literature and databases.

Chen, Chuming; Ross, Karen E; Gavali, Sachin; Cowart, Julie E; Wu, Cathy H.

Bioinformatics ; 37(23): 4597-4598, 2021 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-34613368

RESUMO

SUMMARY: The global response to the COVID-19 pandemic has led to a rapid increase of scientific literature on this deadly disease. Extracting knowledge from biomedical literature and integrating it with relevant information from curated biological databases is essential to gain insight into COVID-19 etiology, diagnosis and treatment. We used Semantic Web technology RDF to integrate COVID-19 knowledge mined from literature by iTextMine, PubTator and SemRep with relevant biological databases and formalized the knowledge in a standardized and computable COVID-19 Knowledge Graph (KG). We published the COVID-19 KG via a SPARQL endpoint to support federated queries on the Semantic Web and developed a knowledge portal with browsing and searching interfaces. We also developed a RESTful API to support programmatic access and provided RDF dumps for download. AVAILABILITY AND IMPLEMENTATION: The COVID-19 Knowledge Graph is publicly available under CC-BY 4.0 license at https://research.bioinformatics.udel.edu/covid19kg/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

COVID-19 , Semântica , Humanos , Pandemias , Reconhecimento Automatizado de Padrão , Bases de Dados Factuais

6.

piNET: a versatile web platform for downstream analysis and visualization of proteomics data.

Shamsaei, Behrouz; Chojnacki, Szymon; Pilarczyk, Marcin; Najafabadi, Mehdi; Niu, Wen; Chen, Chuming; Ross, Karen; Matlock, Andrea; Muhlich, Jeremy; Chutipongtanate, Somchai; Zheng, Jie; Turner, John; Vidovic, Dusica; Jaffe, Jake; MacCoss, Michael; Wu, Cathy; Pillai, Ajay; Ma'ayan, Avi; Schürer, Stephan; Kouril, Michal; Medvedovic, Mario; Meller, Jarek.

Nucleic Acids Res ; 48(W1): W85-W93, 2020 07 02.

Artigo em Inglês | MEDLINE | ID: mdl-32469073

RESUMO

Rapid progress in proteomics and large-scale profiling of biological systems at the protein level necessitates the continued development of efficient computational tools for the analysis and interpretation of proteomics data. Here, we present the piNET server that facilitates integrated annotation, analysis and visualization of quantitative proteomics data, with emphasis on PTM networks and integration with the LINCS library of chemical and genetic perturbation signatures in order to provide further mechanistic and functional insights. The primary input for the server consists of a set of peptides or proteins, optionally with PTM sites, and their corresponding abundance values. Several interconnected workflows can be used to generate: (i) interactive graphs and tables providing comprehensive annotation and mapping between peptides and proteins with PTM sites; (ii) high resolution and interactive visualization for enzyme-substrate networks, including kinases and their phospho-peptide targets; (iii) mapping and visualization of LINCS signature connectivity for chemical inhibitors or genetic knockdown of enzymes upstream of their target PTM sites. piNET has been built using a modular Spring-Boot JAVA platform as a fast, versatile and easy to use tool. The Apache Lucene indexing is used for fast mapping of peptides into UniProt entries for the human, mouse and other commonly used model organism proteomes. PTM-centric network analyses combine PhosphoSitePlus, iPTMnet and SIGNOR databases of validated enzyme-substrate relationships, for kinase networks augmented by DeepPhos predictions and sequence-based mapping of PhosphoSitePlus consensus motifs. Concordant LINCS signatures are mapped using iLINCS. For each workflow, a RESTful API counterpart can be used to generate the results programmatically in the json format. The server is available at http://pinet-server.org, and it is free and open to all users without login requirement.

Assuntos

Processamento de Proteína Pós-Traducional , Proteômica/métodos , Software , Animais , Gráficos por Computador , Enzimas/metabolismo , Humanos , Internet , Camundongos , Peptídeos/química , Peptídeos/metabolismo , Proteínas/química , Proteínas/metabolismo , Fluxo de Trabalho

7.

Characterization of metabolic responses, genetic variations, and microsatellite instability in ammonia-stressed CHO cells grown in fed-batch cultures.

Chitwood, Dylan G; Wang, Qinghua; Elliott, Kathryn; Bullock, Aiyana; Jordana, Dwon; Li, Zhigang; Wu, Cathy; Harcum, Sarah W; Saski, Christopher A.

BMC Biotechnol ; 21(1): 4, 2021 01 08.

Artigo em Inglês | MEDLINE | ID: mdl-33419422

RESUMO

BACKGROUND: As bioprocess intensification has increased over the last 30 years, yields from mammalian cell processes have increased from 10's of milligrams to over 10's of grams per liter. Most of these gains in productivity can be attributed to increasing cell densities within bioreactors. As such, strategies have been developed to minimize accumulation of metabolic wastes, such as lactate and ammonia. Unfortunately, neither cell growth nor biopharmaceutical production can occur without some waste metabolite accumulation. Inevitably, metabolic waste accumulation leads to decline and termination of the culture. While it is understood that the accumulation of these unwanted compounds imparts a suboptimal culture environment, little is known about the genotoxic properties of these compounds that may lead to global genome instability. In this study, we examined the effects of high and moderate extracellular ammonia on the physiology and genomic integrity of Chinese hamster ovary (CHO) cells. RESULTS: Through whole genome sequencing, we discovered 2394 variant sites within functional genes comprised of both single nucleotide polymorphisms and insertion/deletion mutations as a result of ammonia stress with high or moderate impact on functional genes. Furthermore, several of these de novo mutations were found in genes whose functions are to maintain genome stability, such as Tp53, Tnfsf11, Brca1, as well as Nfkb1. Furthermore, we characterized microsatellite content of the cultures using the CriGri-PICR Chinese hamster genome assembly and discovered an abundance of microsatellite loci that are not replicated faithfully in the ammonia-stressed cultures. Unfaithful replication of these loci is a signature of microsatellite instability. With rigorous filtering, we found 124 candidate microsatellite loci that may be suitable for further investigation to determine whether these loci may be reliable biomarkers to predict genome instability in CHO cultures. CONCLUSION: This study advances our knowledge with regards to the effects of ammonia accumulation on CHO cell culture performance by identifying ammonia-sensitive genes linked to genome stability and lays the foundation for the development of a new diagnostic tool for assessing genome stability.

Assuntos

Amônia/metabolismo , Técnicas de Cultura Celular por Lotes/métodos , Variação Genética , Instabilidade de Microssatélites , Animais , Proteína BRCA1/metabolismo , Biomarcadores , Reatores Biológicos , Células CHO , Contagem de Células , Cricetulus , Meios de Cultura , Feminino , Genes p53 , Variação Genética/genética , Ácido Láctico/metabolismo , Mutação , Subunidade p50 de NF-kappa B/metabolismo , Ovário/metabolismo , Ligante RANK/metabolismo

8.

iPTMnet: an integrated resource for protein post-translational modification network discovery.

Huang, Hongzhan; Arighi, Cecilia N; Ross, Karen E; Ren, Jia; Li, Gang; Chen, Sheng-Chih; Wang, Qinghua; Cowart, Julie; Vijay-Shanker, K; Wu, Cathy H.

Nucleic Acids Res ; 46(D1): D542-D550, 2018 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-29145615

RESUMO

Protein post-translational modifications (PTMs) play a pivotal role in numerous biological processes by modulating regulation of protein function. We have developed iPTMnet (http://proteininformationresource.org/iPTMnet) for PTM knowledge discovery, employing an integrative bioinformatics approach-combining text mining, data mining, and ontological representation to capture rich PTM information, including PTM enzyme-substrate-site relationships, PTM-specific protein-protein interactions (PPIs) and PTM conservation across species. iPTMnet encompasses data from (i) our PTM-focused text mining tools, RLIMS-P and eFIP, which extract phosphorylation information from full-scale mining of PubMed abstracts and full-length articles; (ii) a set of curated databases with experimentally observed PTMs; and iii) Protein Ontology that organizes proteins and PTM proteoforms, enabling their representation, annotation and comparison within and across species. Presently covering eight major PTM types (phosphorylation, ubiquitination, acetylation, methylation, glycosylation, S-nitrosylation, sumoylation and myristoylation), iPTMnet knowledgebase contains more than 654 500 unique PTM sites in over 62 100 proteins, along with more than 1200 PTM enzymes and over 24 300 PTM enzyme-substrate-site relations. The website supports online search, browsing, retrieval and visual analysis for scientific queries. Several examples, including functional interpretation of phosphoproteomic data, demonstrate iPTMnet as a gateway for visual exploration and systematic analysis of PTM networks and conservation, thereby enabling PTM discovery and hypothesis generation.

Assuntos

Bases de Dados de Proteínas , Bases de Conhecimento , Processamento de Proteína Pós-Traducional , Animais , Biologia Computacional , Mineração de Dados , Enzimas/metabolismo , Humanos , Internet , Fosforilação , Mapas de Interação de Proteínas , Alinhamento de Sequência

9.

Beyond communication training: The MaRIS model for developing medical students' human capabilities and personal resilience.

Chan, Kwong D; Humphreys, Linda; Mey, Amary; Holland, Carissa; Wu, Cathy; Rogers, Gary D.

Med Teach ; 42(2): 187-195, 2020 02.

Artigo em Inglês | MEDLINE | ID: mdl-31608726

RESUMO

Purpose: Human capabilities in medicine, including communication skills, are increasingly important within the complex, challenging and dynamic landscape of healthcare. Supporting medical students to manage unavoidable role-related stressors adaptively may help mitigate the anguish that is too commonly reported among the profession. We developed a model, "MaRIS", underpinned by contemplative pedagogy, to support medical students to enhance their human capabilities, across all three domains of Bloom's taxonomy, and their personal resilience. It is the first to integrate Mindfulness, affective Reflection, Impactive experiences and a Supportive environment into medical curriculum design. Here, we describe the theoretical basis underpinning MaRIS and present a preliminary study to evaluate its impact on students' subjectively-rated capabilities.Materials and Methods: A questionnaire capturing self-ratings of competence, empathy and resilience, as well as impressions of their experiences, was administered to foundation year medical students before (T0), during (T1) and after delivery (T2).Results: Fifty-five students completed the survey at all time points. Mean scores for all domains increased significantly from T0 to T1 and from T0 to T2. Free-text comments suggest learning impact across the cognitive, psychomotor and affective domains.Conclusions: MaRIS appears to facilitate medical students' establishment of the foundations for building the human capabilities and personal resilience required for professional practice.

Assuntos

Educação de Graduação em Medicina/métodos , Relações Interprofissionais , Relações Médico-Paciente , Resiliência Psicológica , Estudantes de Medicina/psicologia , Adulto , Competência Clínica , Comunicação , Currículo , Empatia , Feminino , Humanos , Masculino , Atenção Plena , Inquéritos e Questionários , Adulto Jovem

10.

UniProt genomic mapping for deciphering functional effects of missense variants.

McGarvey, Peter B; Nightingale, Andrew; Luo, Jie; Huang, Hongzhan; Martin, Maria J; Wu, Cathy.

Hum Mutat ; 40(6): 694-705, 2019 06.

Artigo em Inglês | MEDLINE | ID: mdl-30840782

RESUMO

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.

Assuntos

Mapeamento Cromossômico/métodos , Bases de Dados Genéticas , Mutação de Sentido Incorreto , Proteínas/química , Sítios de Ligação , Bases de Dados de Proteínas , Predisposição Genética para Doença , Humanos , Anotação de Sequência Molecular , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Proteínas/genética , Proteínas/metabolismo , Software , Navegador

11.

Protein Ontology (PRO): enhancing and scaling up the representation of protein entities.

Natale, Darren A; Arighi, Cecilia N; Blake, Judith A; Bona, Jonathan; Chen, Chuming; Chen, Sheng-Chih; Christie, Karen R; Cowart, Julie; D'Eustachio, Peter; Diehl, Alexander D; Drabkin, Harold J; Duncan, William D; Huang, Hongzhan; Ren, Jia; Ross, Karen; Ruttenberg, Alan; Shamovsky, Veronica; Smith, Barry; Wang, Qinghua; Zhang, Jian; El-Sayed, Abdelrahman; Wu, Cathy H.

Nucleic Acids Res ; 45(D1): D339-D346, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899649

RESUMO

The Protein Ontology (PRO; http://purl.obolibrary.org/obo/pr) formally defines and describes taxon-specific and taxon-neutral protein-related entities in three major areas: proteins related by evolution; proteins produced from a given gene; and protein-containing complexes. PRO thus serves as a tool for referencing protein entities at any level of specificity. To enhance this ability, and to facilitate the comparison of such entities described in different resources, we developed a standardized representation of proteoforms using UniProtKB as a sequence reference and PSI-MOD as a post-translational modification reference. We illustrate its use in facilitating an alignment between PRO and Reactome protein entities. We also address issues of scalability, describing our first steps into the use of text mining to identify protein-related entities, the large-scale import of proteoform information from expert curated resources, and our ability to dynamically generate PRO terms. Web views for individual terms are now more informative about closely-related terms, including for example an interactive multiple sequence alignment. Finally, we describe recent improvement in semantic utility, with PRO now represented in OWL and as a SPARQL endpoint. These developments will further support the anticipated growth of PRO and facilitate discoverability of and allow aggregation of data relating to protein entities.

Assuntos

Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas , Animais , Humanos , Proteínas/química , Proteínas/genética , Navegador

12.

InterPro in 2017-beyond protein family and domain annotations.

Finn, Robert D; Attwood, Teresa K; Babbitt, Patricia C; Bateman, Alex; Bork, Peer; Bridge, Alan J; Chang, Hsin-Yu; Dosztányi, Zsuzsanna; El-Gebali, Sara; Fraser, Matthew; Gough, Julian; Haft, David; Holliday, Gemma L; Huang, Hongzhan; Huang, Xiaosong; Letunic, Ivica; Lopez, Rodrigo; Lu, Shennan; Marchler-Bauer, Aron; Mi, Huaiyu; Mistry, Jaina; Natale, Darren A; Necci, Marco; Nuka, Gift; Orengo, Christine A; Park, Youngmi; Pesseat, Sebastien; Piovesan, Damiano; Potter, Simon C; Rawlings, Neil D; Redaschi, Nicole; Richardson, Lorna; Rivoire, Catherine; Sangrador-Vegas, Amaia; Sigrist, Christian; Sillitoe, Ian; Smithers, Ben; Squizzato, Silvano; Sutton, Granger; Thanki, Narmada; Thomas, Paul D; Tosatto, Silvio C E; Wu, Cathy H; Xenarios, Ioannis; Yeh, Lai-Su; Young, Siew-Yit; Mitchell, Alex L.

Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899635

RESUMO

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.

Assuntos

Biologia Computacional/métodos , Bases de Dados de Proteínas , Domínios e Motivos de Interação entre Proteínas , Software , Humanos , Anotação de Sequência Molecular , Filogenia

13.

Completing sparse and disconnected protein-protein network by deep learning.

Huang, Lei; Liao, Li; Wu, Cathy H.

BMC Bioinformatics ; 19(1): 103, 2018 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-29566671

RESUMO

BACKGROUND: Protein-protein interaction (PPI) prediction remains a central task in systems biology to achieve a better and holistic understanding of cellular and intracellular processes. Recently, an increasing number of computational methods have shifted from pair-wise prediction to network level prediction. Many of the existing network level methods predict PPIs under the assumption that the training network should be connected. However, this assumption greatly affects the prediction power and limits the application area because the current golden standard PPI networks are usually very sparse and disconnected. Therefore, how to effectively predict PPIs based on a training network that is sparse and disconnected remains a challenge. RESULTS: In this work, we developed a novel PPI prediction method based on deep learning neural network and regularized Laplacian kernel. We use a neural network with an autoencoder-like architecture to implicitly simulate the evolutionary processes of a PPI network. Neurons of the output layer correspond to proteins and are labeled with values (1 for interaction and 0 for otherwise) from the adjacency matrix of a sparse disconnected training PPI network. Unlike autoencoder, neurons at the input layer are given all zero input, reflecting an assumption of no a priori knowledge about PPIs, and hidden layers of smaller sizes mimic ancient interactome at different times during evolution. After the training step, an evolved PPI network whose rows are outputs of the neural network can be obtained. We then predict PPIs by applying the regularized Laplacian kernel to the transition matrix that is built upon the evolved PPI network. The results from cross-validation experiments show that the PPI prediction accuracies for yeast data and human data measured as AUC are increased by up to 8.4 and 14.9% respectively, as compared to the baseline. Moreover, the evolved PPI network can also help us leverage complementary information from the disconnected training network and multiple heterogeneous data sources. Tested by the yeast data with six heterogeneous feature kernels, the results show our method can further improve the prediction performance by up to 2%, which is very close to an upper bound that is obtained by an Approximate Bayesian Computation based sampling method. CONCLUSIONS: The proposed evolution deep neural network, coupled with regularized Laplacian kernel, is an effective tool in completing sparse and disconnected PPI networks and in facilitating integration of heterogeneous data sources.

Assuntos

Algoritmos , Aprendizado de Máquina , Mapas de Interação de Proteínas , Área Sob a Curva , Teorema de Bayes , Humanos , Redes Neurais de Computação , Proteínas/metabolismo , Curva ROC , Saccharomyces cerevisiae/metabolismo

14.

Transcriptional profiling of liver during the critical embryo-to-hatchling transition period in the chicken (Gallus gallus).

Cogburn, Larry A; Trakooljul, Nares; Chen, Chuming; Huang, Hongzhan; Wu, Cathy H; Carré, Wilfrid; Wang, Xiaofei; White, Harold B.

BMC Genomics ; 19(1): 695, 2018 Sep 21.

Artigo em Inglês | MEDLINE | ID: mdl-30241500

RESUMO

BACKGROUND: Although hatching is perhaps the most abrupt and profound metabolic challenge that a chicken must undergo; there have been no attempts to functionally map the metabolic pathways induced in liver during the embryo-to-hatchling transition. Furthermore, we know very little about the metabolic and regulatory factors that regulate lipid metabolism in late embryos or newly-hatched chicks. In the present study, we examined hepatic transcriptomes of 12 embryos and 12 hatchling chicks during the peri-hatch period-or the metabolic switch from chorioallantoic to pulmonary respiration. RESULTS: Initial hierarchical clustering revealed two distinct, albeit opposing, patterns of hepatic gene expression. Cluster A genes are largely lipolytic and highly expressed in embryos. While, Cluster B genes are lipogenic/thermogenic and mainly controlled by the lipogenic transcription factor THRSPA. Using pairwise comparisons of embryo and hatchling ages, we found 1272 genes that were differentially expressed between embryos and hatchling chicks, including 24 transcription factors and 284 genes that regulate lipid metabolism. The three most differentially-expressed transcripts found in liver of embryos were MOGAT1, DIO3 and PDK4, whereas THRSPA, FASN and DIO2 were highest in hatchlings. An unusual finding was the "ectopic" and extremely high differentially expression of seven feather keratin transcripts in liver of 16 day embryos, which coincides with engorgement of liver with yolk lipids. Gene interaction networks show several transcription factors, transcriptional co-activators/co-inhibitors and their downstream genes that exert a 'ying-yang' action on lipid metabolism during the embryo-to-hatching transition. These upstream regulators include ligand-activated transcription factors, sirtuins and Kruppel-like factors. CONCLUSIONS: Our genome-wide transcriptional analysis has greatly expanded the hepatic repertoire of regulatory and metabolic genes involved in the embryo-to-hatchling transition. New knowledge was gained on interactive transcriptional networks and metabolic pathways that enable the abrupt switch from ectothermy (embryo) to endothermy (hatchling) in the chicken. Several transcription factors and their coactivators/co-inhibitors appear to exert opposing actions on lipid metabolism, leading to the predominance of lipolysis in embryos and lipogenesis in hatchlings. Our analysis of hepatic transcriptomes has enabled discovery of opposing, interconnected and interdependent transcriptional regulators that provide precise ying-yang or homeorhetic regulation of lipid metabolism during the critical embryo-to-hatchling transition.

Assuntos

Galinhas/crescimento & desenvolvimento , Galinhas/metabolismo , Regulação da Expressão Gênica no Desenvolvimento , Fígado/metabolismo , Animais , Cruzamento , Embrião de Galinha/crescimento & desenvolvimento , Embrião de Galinha/metabolismo , Desenvolvimento Embrionário , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Fígado/embriologia , Fígado/crescimento & desenvolvimento , Transcriptoma

15.

Oncogenic fusion protein EWS-FLI1 is a network hub that regulates alternative splicing.

Selvanathan, Saravana P; Graham, Garrett T; Erkizan, Hayriye V; Dirksen, Uta; Natarajan, Thanemozhi G; Dakic, Aleksandra; Yu, Songtao; Liu, Xuefeng; Paulsen, Michelle T; Ljungman, Mats E; Wu, Cathy H; Lawlor, Elizabeth R; Üren, Aykut; Toretsky, Jeffrey A.

Proc Natl Acad Sci U S A ; 112(11): E1307-16, 2015 Mar 17.

Artigo em Inglês | MEDLINE | ID: mdl-25737553

RESUMO

The synthesis and processing of mRNA, from transcription to translation initiation, often requires splicing of intragenic material. The final mRNA composition varies based on proteins that modulate splice site selection. EWS-FLI1 is an Ewing sarcoma (ES) oncoprotein with an interactome that we demonstrate to have multiple partners in spliceosomal complexes. We evaluate the effect of EWS-FLI1 on posttranscriptional gene regulation using both exon array and RNA-seq. Genes that potentially regulate oncogenesis, including CLK1, CASP3, PPFIBP1, and TERT, validate as alternatively spliced by EWS-FLI1. In a CLIP-seq experiment, we find that EWS-FLI1 RNA-binding motifs most frequently occur adjacent to intron-exon boundaries. EWS-FLI1 also alters splicing by directly binding to known splicing factors including DDX5, hnRNP K, and PRPF6. Reduction of EWS-FLI1 produces an isoform of Î³-TERT that has increased telomerase activity compared with wild-type (WT) TERT. The small molecule YK-4-279 is an inhibitor of EWS-FLI1 oncogenic function that disrupts specific protein interactions, including helicases DDX5 and RNA helicase A (RHA) that alters RNA-splicing ratios. As such, YK-4-279 validates the splicing mechanism of EWS-FLI1, showing alternatively spliced gene patterns that significantly overlap with EWS-FLI1 reduction and WT human mesenchymal stem cells (hMSC). Exon array analysis of 75 ES patient samples shows similar isoform expression patterns to cell line models expressing EWS-FLI1, supporting the clinical relevance of our findings. These experiments establish systemic alternative splicing as an oncogenic process modulated by EWS-FLI1. EWS-FLI1 modulation of mRNA splicing may provide insight into the contribution of splicing toward oncogenesis, and, reciprocally, EWS-FLI1 interactions with splicing proteins may inform the splicing code.

Assuntos

Processamento Alternativo/genética , Proteínas de Fusão Oncogênica/metabolismo , Proteína Proto-Oncogênica c-fli-1/metabolismo , Proteína EWS de Ligação a RNA/metabolismo , Transdução de Sinais/genética , Processamento Alternativo/efeitos dos fármacos , Sequência de Bases , Linhagem Celular Tumoral , Éxons/genética , Humanos , Indóis , Íntrons/genética , Proteínas de Fusão Oncogênica/genética , Ligação Proteica/efeitos dos fármacos , Isoformas de Proteínas/metabolismo , Proteína Proto-Oncogênica c-fli-1/genética , Processamento Pós-Transcricional do RNA/efeitos dos fármacos , Processamento Pós-Transcricional do RNA/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Proteína EWS de Ligação a RNA/genética , Sarcoma de Ewing/genética , Sarcoma de Ewing/patologia , Transdução de Sinais/efeitos dos fármacos , Spliceossomos/efeitos dos fármacos , Spliceossomos/metabolismo , Telomerase/metabolismo

16.

Automatic gene annotation using GO terms from cellular component domain.

Ding, Ruoyao; Qu, Yingying; Wu, Cathy H; Vijay-Shanker, K.

BMC Med Inform Decis Mak ; 18(Suppl 5): 119, 2018 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-30526566

RESUMO

BACKGROUND: The Gene Ontology (GO) is a resource that supplies information about gene product function using ontologies to represent biological knowledge. These ontologies cover three domains: Cellular Component (CC), Molecular Function (MF), and Biological Process (BP). GO annotation is a process which assigns gene functional information using GO terms to relevant genes in the literature. It is a common task among the Model Organism Database (MOD) groups. Manual GO annotation relies on human curators assigning gene functional information using GO terms by reading the biomedical literature. This process is very time-consuming and labor-intensive. As a result, many MODs can afford to curate only a fraction of relevant articles. METHODS: GO terms from the CC domain can be essentially divided into two sub-hierarchies: subcellular location terms, and protein complex terms. We cast the task of gene annotation using GO terms from the CC domain as relation extraction between gene and other entities: (1) extract cases where a protein is found to be in a subcellular location, and (2) extract cases where a protein is a subunit of a protein complex. For each relation extraction task, we use an approach based on triggers and syntactic dependencies to extract the desired relations among entities. RESULTS: We tested our approach on the BC4GO test set, a publicly available corpus for GO annotation. Our approach obtains a F1-score of 71%, a precision of 91% and a recall of 58% for predicting GO terms from CC Domain for given genes. CONCLUSIONS: We have described a novel approach of treating gene annotation with GO terms from CC domain as two relation extraction subtasks. Evaluation results show that our approach achieves a F1-score of 71% for predicting GO terms for given genes. Thereby our approach can be used to accelerate the process of GO annotation for the bio-annotators.

Assuntos

Biologia Computacional , Ontologia Genética , Anotação de Sequência Molecular , Processamento de Linguagem Natural , Humanos

17.

Computational clustering for viral reference proteomes.

Chen, Chuming; Huang, Hongzhan; Mazumder, Raja; Natale, Darren A; McGarvey, Peter B; Zhang, Jian; Polson, Shawn W; Wang, Yuqi; Wu, Cathy H.

Bioinformatics ; 32(13): 2041-3, 2016 07 01.

Artigo em Inglês | MEDLINE | ID: mdl-27153712

RESUMO

MOTIVATION: The enormous number of redundant sequenced genomes has hindered efforts to analyze and functionally annotate proteins. As the taxonomy of viruses is not uniformly defined, viral proteomes pose special challenges in this regard. Grouping viruses based on the similarity of their proteins at proteome scale can normalize against potential taxonomic nomenclature anomalies. RESULTS: We present Viral Reference Proteomes (Viral RPs), which are computed from complete virus proteomes within UniProtKB. Viral RPs based on 95, 75, 55, 35 and 15% co-membership in proteome similarity based clusters are provided. Comparison of our computational Viral RPs with UniProt's curator-selected Reference Proteomes indicates that the two sets are consistent and complementary. Furthermore, each Viral RP represents a cluster of virus proteomes that was consistent with virus or host taxonomy. We provide BLASTP search and FTP download of Viral RP protein sequences, and a browser to facilitate the visualization of Viral RPs. AVAILABILITY AND IMPLEMENTATION: http://proteininformationresource.org/rps/viruses/ CONTACT: chenc@udel.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Bases de Dados de Proteínas , Proteoma/análise , Proteínas Virais/análise , Sequência de Aminoácidos , Análise por Conglomerados , Biologia Computacional , Bases de Conhecimento

18.

Prediction of residue-residue contact matrix for protein-protein interaction with Fisher score features and deep learning.

Du, Tianchuan; Liao, Li; Wu, Cathy H; Sun, Bilin.

Methods ; 110: 97-105, 2016 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-27282356

RESUMO

Protein-protein interactions play essential roles in many biological processes. Acquiring knowledge of the residue-residue contact information of two interacting proteins is not only helpful in annotating functions for proteins, but also critical for structure-based drug design. The prediction of the protein residue-residue contact matrix of the interfacial regions is challenging. In this work, we introduced deep learning techniques (specifically, stacked autoencoders) to build deep neural network models to tackled the residue-residue contact prediction problem. In tandem with interaction profile Hidden Markov Models, which was used first to extract Fisher score features from protein sequences, stacked autoencoders were deployed to extract and learn hidden abstract features. The deep learning model showed significant improvement over the traditional machine learning model, Support Vector Machines (SVM), with the overall accuracy increased by 15% from 65.40% to 80.82%. We showed that the stacked autoencoders could extract novel features, which can be utilized by deep neural networks and other classifiers to enhance learning, out of the Fisher score features. It is further shown that deep neural networks have significant advantages over SVM in making use of the newly extracted features.

Assuntos

Sequência de Aminoácidos/genética , Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas/genética , Aprendizado de Máquina , Software

19.

The InterPro protein families database: the classification resource after 15 years.

Mitchell, Alex; Chang, Hsin-Yu; Daugherty, Louise; Fraser, Matthew; Hunter, Sarah; Lopez, Rodrigo; McAnulla, Craig; McMenamin, Conor; Nuka, Gift; Pesseat, Sebastien; Sangrador-Vegas, Amaia; Scheremetjew, Maxim; Rato, Claudia; Yong, Siew-Yit; Bateman, Alex; Punta, Marco; Attwood, Teresa K; Sigrist, Christian J A; Redaschi, Nicole; Rivoire, Catherine; Xenarios, Ioannis; Kahn, Daniel; Guyot, Dominique; Bork, Peer; Letunic, Ivica; Gough, Julian; Oates, Matt; Haft, Daniel; Huang, Hongzhan; Natale, Darren A; Wu, Cathy H; Orengo, Christine; Sillitoe, Ian; Mi, Huaiyu; Thomas, Paul D; Finn, Robert D.

Nucleic Acids Res ; 43(Database issue): D213-21, 2015 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-25428371

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) is a freely available resource that can be used to classify sequences into protein families and to predict the presence of important domains and sites. Central to the InterPro database are predictive models, known as signatures, from a range of different protein family databases that have different biological focuses and use different methodological approaches to classify protein families and domains. InterPro integrates these signatures, capitalizing on the respective strengths of the individual databases, to produce a powerful protein classification resource. Here, we report on the status of InterPro as it enters its 15th year of operation, and give an overview of new developments with the database and its associated Web interfaces and software. In particular, the new domain architecture search tool is described and the process of mapping of Gene Ontology terms to InterPro is outlined. We also discuss the challenges faced by the resource given the explosive growth in sequence data in recent years. InterPro (version 48.0) contains 36,766 member database signatures integrated into 26,238 InterPro entries, an increase of over 3993 entries (5081 signatures), since 2012.

Assuntos

Bases de Dados de Proteínas , Proteínas/classificação , Bactérias/metabolismo , Ontologia Genética , Estrutura Terciária de Proteína , Proteínas/genética , Análise de Sequência de Proteína , Software

20.

Elevated FGF21 secretion, PGC-1α and ketogenic enzyme expression are hallmarks of iron-sulfur cluster depletion in human skeletal muscle.

Crooks, Daniel R; Natarajan, Thanemozhi G; Jeong, Suh Young; Chen, Chuming; Park, Sun Young; Huang, Hongzhan; Ghosh, Manik C; Tong, Wing-Hang; Haller, Ronald G; Wu, Cathy; Rouault, Tracey A.

Hum Mol Genet ; 23(1): 24-39, 2014 Jan 01.

Artigo em Inglês | MEDLINE | ID: mdl-23943793

RESUMO

Iron-sulfur (Fe-S) clusters are ancient enzyme cofactors found in virtually all life forms. We evaluated the physiological effects of chronic Fe-S cluster deficiency in human skeletal muscle, a tissue that relies heavily on Fe-S cluster-mediated aerobic energy metabolism. Despite greatly decreased oxidative capacity, muscle tissue from patients deficient in the Fe-S cluster scaffold protein ISCU showed a predominance of type I oxidative muscle fibers and higher capillary density, enhanced expression of transcriptional co-activator PGC-1α and increased mitochondrial fatty acid oxidation genes. These Fe-S cluster-deficient muscles showed a dramatic up-regulation of the ketogenic enzyme HMGCS2 and the secreted protein FGF21 (fibroblast growth factor 21). Enhanced muscle FGF21 expression was reflected by elevated circulating FGF21 levels in the patients, and robust FGF21 secretion could be recapitulated by respiratory chain inhibition in cultured myotubes. Our findings reveal that mitochondrial energy starvation elicits a coordinated response in Fe-S-deficient skeletal muscle that is reflected systemically by increased plasma FGF21 levels.

Assuntos

Acidose Láctica/congênito , Fatores de Crescimento de Fibroblastos/metabolismo , Hidroximetilglutaril-CoA Sintase/metabolismo , Proteínas Ferro-Enxofre/metabolismo , Músculo Esquelético/metabolismo , Doenças Musculares/congênito , Fatores de Transcrição/genética , Acidose Láctica/genética , Acidose Láctica/metabolismo , Acidose Láctica/patologia , Adulto , Idoso , Estudos de Casos e Controles , Células Cultivadas , Metabolismo Energético , Feminino , Fatores de Crescimento de Fibroblastos/genética , Regulação da Expressão Gênica , Humanos , Hidroximetilglutaril-CoA Sintase/genética , Proteínas Ferro-Enxofre/genética , Masculino , Pessoa de Meia-Idade , Mitocôndrias Musculares/metabolismo , Mitocôndrias Musculares/patologia , Doenças Musculares/genética , Doenças Musculares/metabolismo , Doenças Musculares/patologia , Coativador 1-alfa do Receptor gama Ativado por Proliferador de Peroxissomo , Fatores de Transcrição/metabolismo

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa