Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 31
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bioinform Adv ; 4(1): vbae057, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38721398

RESUMO

Motivation: Data reuse is a common and vital practice in molecular biology and enables the knowledge gathered over recent decades to drive discovery and innovation in the life sciences. Much of this knowledge has been collated into molecular biology databases, such as UniProtKB, and these resources derive enormous value from sharing data among themselves. However, quantifying and documenting this kind of data reuse remains a challenge. Results: The article reports on a one-day virtual workshop hosted by the UniProt Consortium in March 2023, attended by representatives from biodata resources, experts in data management, and NIH program managers. Workshop discussions focused on strategies for tracking data reuse, best practices for reusing data, and the challenges associated with data reuse and tracking. Surveys and discussions showed that data reuse is widespread, but critical information for reproducibility is sometimes lacking. Challenges include costs of tracking data reuse, tensions between tracking data and open sharing, restrictive licenses, and difficulties in tracking commercial data use. Recommendations that emerged from the discussion include: development of standardized formats for documenting data reuse, education about the obstacles posed by restrictive licenses, and continued recognition by funding agencies that data management is a critical activity that requires dedicated resources. Availability and implementation: Summaries of survey results are available at: https://docs.google.com/forms/d/1j-VU2ifEKb9C-sW6l3ATB79dgHdRk5v_lESv2hawnso/viewanalytics (survey of data providers) and https://docs.google.com/forms/d/18WbJFutUd7qiZoEzbOytFYXSfWFT61hVce0vjvIwIjk/viewanalytics (survey of users).

2.
PeerJ ; 11: e16164, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37818330

RESUMO

Background: Aberrant protein kinase regulation leading to abnormal substrate phosphorylation is associated with several human diseases. Despite the promise of therapies targeting kinases, many human kinases remain understudied. Most existing computational tools predicting phosphorylation cover less than 50% of known human kinases. They utilize local feature selection based on protein sequences, motifs, domains, structures, and/or functions, and do not consider the heterogeneous relationships of the proteins. In this work, we present KSFinder, a tool that predicts kinase-substrate links by capturing the inherent association of proteins in a network comprising 85% of the known human kinases. We also postulate the potential role of two understudied kinases based on their substrate predictions from KSFinder. Methods: KSFinder learns the semantic relationships in a phosphoproteome knowledge graph using a knowledge graph embedding algorithm and represents the nodes in low-dimensional vectors. A multilayer perceptron (MLP) classifier is trained to discern kinase-substrate links using the embedded vectors. KSFinder uses a strategic negative generation approach that eliminates biases in entity representation and combines data from experimentally validated non-interacting protein pairs, proteins from different subcellular locations, and random sampling. We assess KSFinder's generalization capability on four different datasets and compare its performance with other state-of-the-art prediction models. We employ KSFinder to predict substrates of 68 "dark" kinases considered understudied by the Illuminating the Druggable Genome program and use our text-mining tool, RLIMS-P along with manual curation, to search for literature evidence for the predictions. In a case study, we performed functional enrichment analysis for two dark kinases - HIPK3 and CAMKK1 using their predicted substrates. Results: KSFinder shows improved performance over other kinase-substrate prediction models and generalized prediction ability on different datasets. We identified literature evidence for 17 novel predictions involving an understudied kinase. All of these 17 predictions had a probability score ≥0.7 (nine at >0.9, six at 0.8-0.9, and two at 0.7-0.8). The evaluation of 93,593 negative predictions (probability ≤0.3) identified four false negatives. The top enriched biological processes of HIPK3 substrates relate to the regulation of extracellular matrix and epigenetic gene expression, while CAMKK1 substrates include lipid storage regulation and glucose homeostasis. Conclusions: KSFinder outperforms the current kinase-substrate prediction tools with higher kinase coverage. The strategically developed negatives provide a superior generalization ability for KSFinder. We predicted substrates of 432 kinases, 68 of which are understudied, and hypothesized the potential functions of two dark kinases using their predicted substrates.


Assuntos
Reconhecimento Automatizado de Padrão , Proteínas Quinases , Humanos , Proteínas Quinases/genética , Fosforilação , Algoritmos , Proteoma/química
3.
PLoS Comput Biol ; 19(3): e1010690, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36996232

RESUMO

We analyzed large-scale post-translational modification (PTM) data to outline cell signaling pathways affected by tyrosine kinase inhibitors (TKIs) in ten lung cancer cell lines. Tyrosine phosphorylated, lysine ubiquitinated, and lysine acetylated proteins were concomitantly identified using sequential enrichment of post translational modification (SEPTM) proteomics. Machine learning was used to identify PTM clusters that represent functional modules that respond to TKIs. To model lung cancer signaling at the protein level, PTM clusters were used to create a co-cluster correlation network (CCCN) and select protein-protein interactions (PPIs) from a large network of curated PPIs to create a cluster-filtered network (CFN). Next, we constructed a Pathway Crosstalk Network (PCN) by connecting pathways from NCATS BioPlanet whose member proteins have PTMs that co-cluster. Interrogating the CCCN, CFN, and PCN individually and in combination yields insights into the response of lung cancer cells to TKIs. We highlight examples where cell signaling pathways involving EGFR and ALK exhibit crosstalk with BioPlanet pathways: Transmembrane transport of small molecules; and Glycolysis and gluconeogenesis. These data identify known and previously unappreciated connections between receptor tyrosine kinase (RTK) signal transduction and oncogenic metabolic reprogramming in lung cancer. Comparison to a CFN generated from a previous multi-PTM analysis of lung cancer cell lines reveals a common core of PPIs involving heat shock/chaperone proteins, metabolic enzymes, cytoskeletal components, and RNA-binding proteins. Elucidation of points of crosstalk among signaling pathways employing different PTMs reveals new potential drug targets and candidates for synergistic attack through combination drug therapy.


Assuntos
Neoplasias Pulmonares , Lisina , Humanos , Fosforilação , Lisina/metabolismo , Acetilação , Processamento de Proteína Pós-Traducional , Neoplasias Pulmonares/metabolismo , Ubiquitinação , Transdução de Sinais
4.
Database (Oxford) ; 20222022 10 05.
Artigo em Inglês | MEDLINE | ID: mdl-36197453

RESUMO

The coronavirus disease 2019 (COVID-19) pandemic has compelled biomedical researchers to communicate data in real time to establish more effective medical treatments and public health policies. Nontraditional sources such as preprint publications, i.e. articles not yet validated by peer review, have become crucial hubs for the dissemination of scientific results. Natural language processing (NLP) systems have been recently developed to extract and organize COVID-19 data in reasoning systems. Given this scenario, the BioCreative COVID-19 text mining tool interactive demonstration track was created to assess the landscape of the available tools and to gauge user interest, thereby providing a two-way communication channel between NLP system developers and potential end users. The goal was to inform system designers about the performance and usability of their products and to suggest new additional features. Considering the exploratory nature of this track, the call for participation solicited teams to apply for the track, based on their system's ability to perform COVID-19-related tasks and interest in receiving user feedback. We also recruited volunteer users to test systems. Seven teams registered systems for the track, and >30 individuals volunteered as test users; these volunteer users covered a broad range of specialties, including bench scientists, bioinformaticians and biocurators. The users, who had the option to participate anonymously, were provided with written and video documentation to familiarize themselves with the NLP tools and completed a survey to record their evaluation. Additional feedback was also provided by NLP system developers. The track was well received as shown by the overall positive feedback from the participating teams and the users. Database URL: https://biocreative.bioinformatics.udel.edu/tasks/biocreative-vii/track-4/.


Assuntos
COVID-19 , COVID-19/epidemiologia , Mineração de Dados/métodos , Bases de Dados Factuais , Documentação , Humanos , Processamento de Linguagem Natural
5.
Methods Mol Biol ; 2499: 187-204, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35696082

RESUMO

iPTMnet is a resource that combines rich information about protein post-translational modifications (PTM) from curated databases as well as text mining tools. Researchers can use the iPTMnet website to query, analyze and download the PTM data. In this chapter we describe the iPTMnet RESTful API which provides a way to streamline the integration of iPTMnet data into an automated data analysis workflow. In the first section, we give an overview of the architecture of the API. In the second section, we describe various function defined by the API and provide detailed examples of using these functions.


Assuntos
Mineração de Dados , Processamento de Proteína Pós-Traducional , Bases de Dados de Proteínas , Proteínas/metabolismo , Fluxo de Trabalho
6.
Bioinformatics ; 37(23): 4597-4598, 2021 12 07.
Artigo em Inglês | MEDLINE | ID: mdl-34613368

RESUMO

SUMMARY: The global response to the COVID-19 pandemic has led to a rapid increase of scientific literature on this deadly disease. Extracting knowledge from biomedical literature and integrating it with relevant information from curated biological databases is essential to gain insight into COVID-19 etiology, diagnosis and treatment. We used Semantic Web technology RDF to integrate COVID-19 knowledge mined from literature by iTextMine, PubTator and SemRep with relevant biological databases and formalized the knowledge in a standardized and computable COVID-19 Knowledge Graph (KG). We published the COVID-19 KG via a SPARQL endpoint to support federated queries on the Semantic Web and developed a knowledge portal with browsing and searching interfaces. We also developed a RESTful API to support programmatic access and provided RDF dumps for download. AVAILABILITY AND IMPLEMENTATION: The COVID-19 Knowledge Graph is publicly available under CC-BY 4.0 license at https://research.bioinformatics.udel.edu/covid19kg/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
COVID-19 , Semântica , Humanos , Pandemias , Reconhecimento Automatizado de Padrão , Bases de Dados Factuais
7.
Cancer Res ; 81(11): 3051-3066, 2021 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-33727228

RESUMO

Lung cancer is the leading cause of cancer mortality worldwide. The treatment of patients with lung cancer harboring mutant EGFR with orally administered EGFR tyrosine kinase inhibitors (TKI) has been a paradigm shift. Osimertinib and rociletinib are third-generation irreversible EGFR TKIs targeting the EGFR T790M mutation. Osimertinib is the current standard of care for patients with EGFR mutations due to increased efficacy, lower side effects, and enhanced brain penetrance. Unfortunately, all patients develop resistance. Genomic approaches have primarily been used to interrogate resistance mechanisms. Here we characterized the proteome and phosphoproteome of a series of isogenic EGFR-mutant lung adenocarcinoma cell lines that are either sensitive or resistant to these drugs, comprising the most comprehensive proteomic dataset resource to date to investigate third generation EGFR TKI resistance in lung adenocarcinoma. Unbiased global quantitative mass spectrometry uncovered alterations in signaling pathways, revealed a proteomic signature of epithelial-mesenchymal transition, and identified kinases and phosphatases with altered expression and phosphorylation in TKI-resistant cells. Decreased tyrosine phosphorylation of key sites in the phosphatase SHP2 suggests its inhibition, resulting in subsequent inhibition of RAS/MAPK and activation of PI3K/AKT pathways. Anticorrelation analyses of this phosphoproteomic dataset with published drug-induced P100 phosphoproteomic datasets from the Library of Integrated Network-Based Cellular Signatures program predicted drugs with the potential to overcome EGFR TKI resistance. The PI3K/MTOR inhibitor dactolisib in combination with osimertinib overcame resistance both in vitro and in vivo. Taken together, this study reveals global proteomic alterations upon third generation EGFR TKI resistance and highlights potential novel approaches to overcome resistance. SIGNIFICANCE: Global quantitative proteomics reveals changes in the proteome and phosphoproteome in lung cancer cells resistant to third generation EGFR TKIs, identifying the PI3K/mTOR inhibitor dactolisib as a potential approach to overcome resistance.


Assuntos
Adenocarcinoma de Pulmão/tratamento farmacológico , Resistencia a Medicamentos Antineoplásicos , Imidazóis/farmacologia , Fosfoproteínas/metabolismo , Inibidores de Proteínas Quinases/farmacologia , Proteoma/metabolismo , Quinolinas/farmacologia , Adenocarcinoma de Pulmão/metabolismo , Adenocarcinoma de Pulmão/patologia , Antineoplásicos/farmacologia , Apoptose , Proliferação de Células , Receptores ErbB/antagonistas & inibidores , Humanos , Neoplasias Pulmonares/tratamento farmacológico , Neoplasias Pulmonares/metabolismo , Neoplasias Pulmonares/patologia , Fosfatidilinositol 3-Quinases/química , Fosfoproteínas/análise , Proteoma/análise , Serina-Treonina Quinases TOR/antagonistas & inibidores , Células Tumorais Cultivadas
8.
Sci Data ; 7(1): 337, 2020 10 12.
Artigo em Inglês | MEDLINE | ID: mdl-33046717

RESUMO

The Protein Ontology (PRO) provides an ontological representation of protein-related entities, ranging from protein families to proteoforms to complexes. Protein Ontology Linked Open Data (LOD) exposes, shares, and connects knowledge about protein-related entities on the Semantic Web using Resource Description Framework (RDF), thus enabling integration with other Linked Open Data for biological knowledge discovery. For example, proteins (or variants thereof) can be retrieved on the basis of specific disease associations. As a community resource, we strive to follow the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles, disseminate regular updates of our data, support multiple methods for accessing, querying and downloading data in various formats, and provide documentation both for scientists and programmers. PRO Linked Open Data can be browsed via faceted browser interface and queried using SPARQL via YASGUI. RDF data dumps are also available for download. Additionally, we developed RESTful APIs to support programmatic data access. We also provide W3C HCLS specification compliant metadata description for our data. The PRO Linked Open Data is available at https://lod.proconsortium.org/ .


Assuntos
Descoberta do Conhecimento , Proteínas/química , Web Semântica , Conjuntos de Dados como Assunto , Software
9.
Adv Biosyst ; 4(9): e2000119, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32603024

RESUMO

Late recurrences of breast cancer are hypothesized to originate from disseminated tumor cells that re-activate after a long period of dormancy, ≥5 years for estrogen-receptor positive (ER+) tumors. An outstanding question remains as to what the key microenvironment interactions are that regulate this complex process, and well-defined human model systems are needed for probing this. Here, a robust, bioinspired 3D ER+ dormancy culture model is established and utilized to probe the effects of matrix properties for common sites of late recurrence on breast cancer cell dormancy. Formation of dormant micrometastases over several weeks is examined for ER+ cells (T47D, BT474), where the timing of entry into dormancy versus persistent growth depends on matrix composition and cell type. In contrast, triple negative cells (MDA-MB-231), associated with early recurrence, are not observed to undergo long-term dormancy. Bioinformatic analyses quantitatively support an increased "dormancy score" gene signature for ER+ cells (T47D) and reveal differential expression of genes associated with different biological processes based on matrix composition. Further, these analyses support a link between dormancy and autophagy, a potential survival mechanism. This robust model system will allow systematic investigations of other cell-microenvironment interactions in dormancy and evaluation of therapeutics for preventing late recurrence.


Assuntos
Neoplasias da Mama , Técnicas de Cultura de Células/métodos , Modelos Biológicos , Receptores de Estrogênio/metabolismo , Microambiente Tumoral/fisiologia , Autofagia , Neoplasias da Mama/química , Neoplasias da Mama/metabolismo , Neoplasias da Mama/fisiopatologia , Linhagem Celular Tumoral , Matriz Extracelular/metabolismo , Feminino , Humanos , Biologia Sintética
10.
Database (Oxford) ; 20202020 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-32395768

RESUMO

iPTMnet is a bioinformatics resource that integrates protein post-translational modification (PTM) data from text mining and curated databases and ontologies to aid in knowledge discovery and scientific study. The current iPTMnet website can be used for querying and browsing rich PTM information but does not support automated iPTMnet data integration with other tools. Hence, we have developed a RESTful API utilizing the latest developments in cloud technologies to facilitate the integration of iPTMnet into existing tools and pipelines. We have packaged iPTMnet API software in Docker containers and published it on DockerHub for easy redistribution. We have also developed Python and R packages that allow users to integrate iPTMnet for scientific discovery, as demonstrated in a use case that connects PTM sites to kinase signaling pathways.


Assuntos
Biologia Computacional , Software , Mineração de Dados , Processamento de Proteína Pós-Traducional , Proteínas/genética
11.
APL Bioeng ; 3(1): 016101, 2019 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31069334

RESUMO

The extracellular matrix (ECM) is thought to play a critical role in the progression of breast cancer. In this work, we have designed a photopolymerizable, biomimetic synthetic matrix for the controlled, 3D culture of breast cancer cells and, in combination with imaging and bioinformatics tools, utilized this system to investigate the breast cancer cell response to different matrix cues. Specifically, hydrogel-based matrices of different densities and modified with receptor-binding peptides derived from ECM proteins [fibronectin/vitronectin (RGDS), collagen (GFOGER), and laminin (IKVAV)] were synthesized to mimic key aspects of the ECM of different soft tissue sites. To assess the breast cancer cell response, the morphology and growth of breast cancer cells (MDA-MB-231 and T47D) were monitored in three dimensions over time, and differences in their transcriptome were assayed using next generation sequencing. We observed increased growth in response to GFOGER and RGDS, whether individually or in combination with IKVAV, where binding of integrin ß1 was key. Importantly, in matrices with GFOGER, increased growth was observed with increasing matrix density for MDA-MB-231s. Further, transcriptomic analyses revealed increased gene expression and enrichment of biological processes associated with cell-matrix interactions, proliferation, and motility in matrices rich in GFOGER relative to IKVAV. In sum, a new approach for investigating breast cancer cell-matrix interactions was established with insights into how microenvironments rich in collagen promote breast cancer growth, a hallmark of disease progression in vivo, with opportunities for future investigations that harness the multidimensional property control afforded by this photopolymerizable system.

12.
Database (Oxford) ; 20182018 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-29860481

RESUMO

Gene expression levels affect biological processes and play a key role in many diseases. Characterizing expression profiles is useful for clinical research, and diagnostics and prognostics of diseases. There are currently several high-quality databases that capture gene expression information, obtained mostly from large-scale studies, such as microarray and next-generation sequencing technologies, in the context of disease. The scientific literature is another rich source of information on gene expression-disease relationships that not only have been captured from large-scale studies but have also been observed in thousands of small-scale studies. Expression information obtained from literature through manual curation can extend expression databases. While many of the existing databases include information from literature, they are limited by the time-consuming nature of manual curation and have difficulty keeping up with the explosion of publications in the biomedical field. In this work, we describe an automated text-mining tool, Disease-Expression Relation Extraction from Text (DEXTER) to extract information from literature on gene and microRNA expression in the context of disease. One of the motivations in developing DEXTER was to extend the BioXpress database, a cancer-focused gene expression database that includes data derived from large-scale experiments and manual curation of publications. The literature-based portion of BioXpress lags behind significantly compared to expression information obtained from large-scale studies and can benefit from our text-mined results. We have conducted two different evaluations to measure the accuracy of our text-mining tool and achieved average F-scores of 88.51 and 81.81% for the two evaluations, respectively. Also, to demonstrate the ability to extract rich expression information in different disease-related scenarios, we used DEXTER to extract information on differential expression information for 2024 genes in lung cancer, 115 glycosyltransferases in 62 cancers and 826 microRNA in 171 cancers. All extractions using DEXTER are integrated in the literature-based portion of BioXpress.Database URL: http://biotm.cis.udel.edu/DEXTER.


Assuntos
Mineração de Dados , Bases de Dados Bibliográficas , Bases de Dados Genéticas , Regulação Neoplásica da Expressão Gênica , Glicosiltransferases , Neoplasias Pulmonares , MicroRNAs , Proteínas de Neoplasias , RNA Neoplásico , Glicosiltransferases/genética , Glicosiltransferases/metabolismo , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , MicroRNAs/biossíntese , MicroRNAs/genética , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , RNA Neoplásico/biossíntese , RNA Neoplásico/genética
13.
Sci Rep ; 8(1): 6518, 2018 04 25.
Artigo em Inglês | MEDLINE | ID: mdl-29695735

RESUMO

Many bioinformatics resources with unique perspectives on the protein landscape are currently available. However, generating new knowledge from these resources requires interoperable workflows that support cross-resource queries. In this study, we employ federated queries linking information from the Protein Kinase Ontology, iPTMnet, Protein Ontology, neXtProt, and the Mouse Genome Informatics to identify key knowledge gaps in the functional coverage of the human kinome and prioritize understudied kinases, cancer variants and post-translational modifications (PTMs) for functional studies. We identify 32 functional domains enriched in cancer variants and PTMs and generate mechanistic hypotheses on overlapping variant and PTM sites by aggregating information at the residue, protein, pathway and species level from these resources. We experimentally test the hypothesis that S768 phosphorylation in the C-helix of EGFR is inhibitory by showing that oncogenic variants altering S768 phosphorylation increase basal EGFR activity. In contrast, oncogenic variants altering conserved phosphorylation sites in the 'hydrophobic motif' of PKCßII (S660F and S660C) are loss-of-function in that they reduce kinase activity and enhance membrane translocation. Our studies provide a framework for integrative, consistent, and reproducible annotation of the cancer kinomes.


Assuntos
Mutação/genética , Neoplasias/genética , Proteínas Quinases/genética , Processamento de Proteína Pós-Traducional/genética , Proteínas/genética , Animais , Células CHO , Células COS , Linhagem Celular , Chlorocebus aethiops , Biologia Computacional/métodos , Cricetulus , Ontologia Genética , Variação Genética/genética , Humanos , Camundongos , Fosforilação/genética
14.
Nucleic Acids Res ; 46(D1): D542-D550, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29145615

RESUMO

Protein post-translational modifications (PTMs) play a pivotal role in numerous biological processes by modulating regulation of protein function. We have developed iPTMnet (http://proteininformationresource.org/iPTMnet) for PTM knowledge discovery, employing an integrative bioinformatics approach-combining text mining, data mining, and ontological representation to capture rich PTM information, including PTM enzyme-substrate-site relationships, PTM-specific protein-protein interactions (PPIs) and PTM conservation across species. iPTMnet encompasses data from (i) our PTM-focused text mining tools, RLIMS-P and eFIP, which extract phosphorylation information from full-scale mining of PubMed abstracts and full-length articles; (ii) a set of curated databases with experimentally observed PTMs; and iii) Protein Ontology that organizes proteins and PTM proteoforms, enabling their representation, annotation and comparison within and across species. Presently covering eight major PTM types (phosphorylation, ubiquitination, acetylation, methylation, glycosylation, S-nitrosylation, sumoylation and myristoylation), iPTMnet knowledgebase contains more than 654 500 unique PTM sites in over 62 100 proteins, along with more than 1200 PTM enzymes and over 24 300 PTM enzyme-substrate-site relations. The website supports online search, browsing, retrieval and visual analysis for scientific queries. Several examples, including functional interpretation of phosphoproteomic data, demonstrate iPTMnet as a gateway for visual exploration and systematic analysis of PTM networks and conservation, thereby enabling PTM discovery and hypothesis generation.


Assuntos
Bases de Dados de Proteínas , Bases de Conhecimento , Processamento de Proteína Pós-Traducional , Animais , Biologia Computacional , Mineração de Dados , Enzimas/metabolismo , Humanos , Internet , Fosforilação , Mapas de Interação de Proteínas , Alinhamento de Sequência
15.
Methods Mol Biol ; 1558: 213-232, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28150240

RESUMO

Post-translational modifications (PTMs) are one of the main contributors to the diversity of proteoforms in the proteomic landscape. In particular, protein phosphorylation represents an essential regulatory mechanism that plays a role in many biological processes. Protein kinases, the enzymes catalyzing this reaction, are key participants in metabolic and signaling pathways. Their activation or inactivation dictate downstream events: what substrates are modified and their subsequent impact (e.g., activation state, localization, protein-protein interactions (PPIs)). The biomedical literature continues to be the main source of evidence for experimental information about protein phosphorylation. Automatic methods to bring together phosphorylation events and phosphorylation-dependent PPIs can help to summarize the current knowledge and to expose hidden connections. In this chapter, we demonstrate two text mining tools, RLIMS-P and eFIP, for the retrieval and extraction of kinase-substrate-site data and phosphorylation-dependent PPIs from the literature. These tools offer several advantages over a literature search in PubMed as their results are specific for phosphorylation. RLIMS-P and eFIP results can be sorted, organized, and viewed in multiple ways to answer relevant biological questions, and the protein mentions are linked to UniProt identifiers.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Fosfoproteínas/metabolismo , Proteínas/metabolismo , Proteômica/métodos , Software , Bases de Dados de Proteínas , Fosforilação , Ligação Proteica , Mapeamento de Interação de Proteínas , Processamento de Proteína Pós-Traducional , Ferramenta de Busca , Interface Usuário-Computador , Navegador
16.
Methods Mol Biol ; 1558: 333-353, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28150246

RESUMO

Protein post-translational modification (PTM) is an essential cellular regulatory mechanism, and disruptions in PTM have been implicated in disease. PTMs are an active area of study in many fields, leading to a wealth of PTM information in the scientific literature. There is a need for user-friendly bioinformatics resources that capture PTM information from the literature and support analyses of PTMs and their functional consequences. This chapter describes the use of iPTMnet ( http://proteininformationresource.org/iPTMnet/ ), a resource that integrates PTM information from text mining, curated databases, and ontologies and provides visualization tools for exploring PTM networks, PTM crosstalk, and PTM conservation across species. We present several PTM-related queries and demonstrate how they can be addressed using iPTMnet.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas , Processamento de Proteína Pós-Traducional , Software , Navegador , Animais , Mineração de Dados/métodos , Humanos , Camundongos , Fosfotransferases , Proteínas de Plantas , Ligação Proteica , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Ratos , Ferramenta de Busca , Interface Usuário-Computador
17.
Methods Mol Biol ; 1558: 57-78, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28150233

RESUMO

The Protein Ontology (PRO) is the reference ontology for proteins in the Open Biomedical Ontologies (OBO) foundry and consists of three sub-ontologies representing protein classes of homologous genes, proteoforms (e.g., splice isoforms, sequence variants, and post-translationally modified forms), and protein complexes. PRO defines classes of proteins and protein complexes, both species-specific and species nonspecific, and indicates their relationships in a hierarchical framework, supporting accurate protein annotation at the appropriate level of granularity, analyses of protein conservation across species, and semantic reasoning. In the first section of this chapter, we describe the PRO framework including categories of PRO terms and the relationship of PRO to other ontologies and protein resources. Next, we provide a tutorial about the PRO website ( proconsortium.org ) where users can browse and search the PRO hierarchy, view reports on individual PRO terms, and visualize relationships among PRO terms in a hierarchical table view, a multiple sequence alignment view, and a Cytoscape network view. Finally, we describe several examples illustrating the unique and rich information available in PRO.


Assuntos
Ontologias Biológicas , Biologia Computacional/métodos , Bases de Dados Genéticas , Proteínas/genética , Proteínas/metabolismo , Software , Navegador , Animais , Humanos , Anotação de Sequência Molecular , Proteínas/química , Interface Usuário-Computador
18.
J Biomed Semantics ; 7(1): 9, 2016 04 29.
Artigo em Inglês | MEDLINE | ID: mdl-27216254

RESUMO

BACKGROUND: MicroRNAs are increasingly being appreciated as critical players in human diseases, and questions concerning the role of microRNAs arise in many areas of biomedical research. There are several manually curated databases of microRNA-disease associations gathered from the biomedical literature; however, it is difficult for curators of these databases to keep up with the explosion of publications in the microRNA-disease field. Moreover, automated literature mining tools that assist manual curation of microRNA-disease associations currently capture only one microRNA property (expression) in the context of one disease (cancer). Thus, there is a clear need to develop more sophisticated automated literature mining tools that capture a variety of microRNA properties and relations in the context of multiple diseases to provide researchers with fast access to the most recent published information and to streamline and accelerate manual curation. METHODS: We have developed miRiaD (microRNAs in association with Disease), a text-mining tool that automatically extracts associations between microRNAs and diseases from the literature. These associations are often not directly linked, and the intermediate relations are often highly informative for the biomedical researcher. Thus, miRiaD extracts the miR-disease pairs together with an explanation for their association. We also developed a procedure that assigns scores to sentences, marking their informativeness, based on the microRNA-disease relation observed within the sentence. RESULTS: miRiaD was applied to the entire Medline corpus, identifying 8301 PMIDs with miR-disease associations. These abstracts and the miR-disease associations are available for browsing at http://biotm.cis.udel.edu/miRiaD . We evaluated the recall and precision of miRiaD with respect to information of high interest to public microRNA-disease database curators (expression and target gene associations), obtaining a recall of 88.46-90.78. When we expanded the evaluation to include sentences with a wide range of microRNA-disease information that may be of interest to biomedical researchers, miRiaD also performed very well with a F-score of 89.4. The informativeness ranking of sentences was evaluated in terms of nDCG (0.977) and correlation metrics (0.678-0.727) when compared to an annotator's ranked list. CONCLUSIONS: miRiaD, a high performance system that can capture a wide variety of microRNA-disease related information, extends beyond the scope of existing microRNA-disease resources. It can be incorporated into manual curation pipelines and serve as a resource for biomedical researchers interested in the role of microRNAs in disease. In our ongoing work we are developing an improved miRiaD web interface that will facilitate complex queries about microRNA-disease relationships, such as "In what diseases does microRNA regulation of apoptosis play a role?" or "Is there overlap in the sets of genes targeted by microRNAs in different types of dementia?"."


Assuntos
Ontologias Biológicas , Mineração de Dados/métodos , Doença/genética , MicroRNAs/genética , Pesquisa Biomédica , Internet , Processamento de Linguagem Natural , Semântica
19.
CEUR Workshop Proc ; 17472016 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28706471

RESUMO

The Protein Ontology (PRO) defines protein classes and their interrelationships from the family to the protein form (proteoform) level within and across species. One of the unique contributions of PRO is its representation of post-translationally modified (PTM) proteoforms. However, progress in adding PTM proteoform classes to PRO has been relatively slow due to the extensive manual curation effort required. Here we report an automated pipeline for creation of PTM proteoform classes that leverages two phosphorylation-focused text mining tools (RLIMS-P, which detects mentions of kinases, substrates, and phosphorylation sites, and eFIP, which detects phosphorylation-dependent protein-protein interactions (PPIs)) and our integrated PTM database, iPTMnet. By applying this pipeline, we obtained a set of ~820 substrate-site pairs that are suitable for automated PRO term generation with literature-based evidence attribution. Inclusion of these terms in PRO will increase PRO coverage of species-specific PTM proteoforms by 50%. Many of these new proteoforms also have associated kinase and/or PPI information. Finally, we show a phosphorylation network for the human and mouse peptidyl-prolyl cis-trans isomerase (PIN1/Pin1) derived from our dataset that demonstrates the biological complexity of the information we have extracted. Our approach addresses scalability in PRO curation and will be further expanded to advance PRO representation of phosphorylated proteoforms.

20.
PLoS One ; 10(10): e0141773, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26509276

RESUMO

Given the wealth of bioinformatics resources and the growing complexity of biological information, it is valuable to integrate data from disparate sources to gain insight into the role of genes/proteins in health and disease. We have developed a bioinformatics framework that combines literature mining with information from biomedical ontologies and curated databases to create knowledge "maps" of genes/proteins of interest. We applied this approach to the study of beta-catenin, a cell adhesion molecule and transcriptional regulator implicated in cancer. The knowledge map includes post-translational modifications (PTMs), protein-protein interactions, disease-associated mutations, and transcription factors co-activated by beta-catenin and their targets and captures the major processes in which beta-catenin is known to participate. Using the map, we generated testable hypotheses about beta-catenin biology in normal and cancer cells. By focusing on proteins participating in multiple relation types, we identified proteins that may participate in feedback loops regulating beta-catenin transcriptional activity. By combining multiple network relations with PTM proteoform-specific functional information, we proposed a mechanism to explain the observation that the cyclin dependent kinase CDK5 positively regulates beta-catenin co-activator activity. Finally, by overlaying cancer-associated mutation data with sequence features, we observed mutation patterns in several beta-catenin PTM sites and PTM enzyme binding sites that varied by tissue type, suggesting multiple mechanisms by which beta-catenin mutations can contribute to cancer. The approach described, which captures rich information for molecular species from genes and proteins to PTM proteoforms, is extensible to other proteins and their involvement in disease.


Assuntos
Biologia Computacional , Modelos Biológicos , Neoplasias/metabolismo , beta Catenina/metabolismo , Análise por Conglomerados , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Mutação , Neoplasias/genética , Ligação Proteica , Mapeamento de Interação de Proteínas , Mapas de Interação de Proteínas , Transdução de Sinais , Ativação Transcricional , beta Catenina/genética
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA