Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 66
Filter
1.
Res Sq ; 2023 Jun 14.
Article in English | MEDLINE | ID: mdl-37398341

ABSTRACT

miR-31 is a highly conserved microRNA that plays critical roles in cell proliferation, migration, and differentiation. We discovered miR-31 and some of its validated targets are enriched on the mitotic spindle of the dividing sea urchin embryo and mammalian cells. Using the sea urchin embryo, we found that miR-31 inhibition led to developmental delay correlated with increased cytoskeleton and chromosomal defects. We identified miR-31 to directly suppress several actin remodeling transcripts, ß-actin, Gelsolin, Rab35 and Fascin, which were localized to the mitotic spindle. miR-31 inhibition leads to increased newly translated Fascin at the spindles. Forced ectopic localization of Fascin transcripts to the cell membrane and translation led to significant developmental and chromosomal segregation defects, leading to our hypothesis that miR-31 regulates local translation at the mitotic spindle to ensure proper cell division. Furthermore, miR-31-mediated post-transcriptional regulation at the mitotic spindle may be an evolutionarily conserved regulatory paradigm of mitosis.

2.
Hum Genet ; 142(7): 927-947, 2023 Jul.
Article in English | MEDLINE | ID: mdl-37191732

ABSTRACT

To expedite gene discovery in eye development and its associated defects, we previously developed a bioinformatics resource-tool iSyTE (integrated Systems Tool for Eye gene discovery). However, iSyTE is presently limited to lens tissue and is predominantly based on transcriptomics datasets. Therefore, to extend iSyTE to other eye tissues on the proteome level, we performed high-throughput tandem mass spectrometry (MS/MS) on mouse embryonic day (E)14.5 retina and retinal pigment epithelium combined tissue and identified an average of 3300 proteins per sample (n = 5). High-throughput expression profiling-based gene discovery approaches-involving either transcriptomics or proteomics-pose a key challenge of prioritizing candidates from thousands of RNA/proteins expressed. To address this, we used MS/MS proteome data from mouse whole embryonic body (WB) as a reference dataset and performed comparative analysis-termed "in silico WB-subtraction"-with the retina proteome dataset. In silico WB-subtraction identified 90 high-priority proteins with retina-enriched expression at stringency criteria of ≥ 2.5 average spectral counts, ≥ 2.0 fold-enrichment, false discovery rate < 0.01. These top candidates represent a pool of retina-enriched proteins, several of which are associated with retinal biology and/or defects (e.g., Aldh1a1, Ank2, Ank3, Dcn, Dync2h1, Egfr, Ephb2, Fbln5, Fbn2, Hras, Igf2bp1, Msi1, Rbp1, Rlbp1, Tenm3, Yap1, etc.), indicating the effectiveness of this approach. Importantly, in silico WB-subtraction also identified several new high-priority candidates with potential regulatory function in retina development. Finally, proteins exhibiting expression or enriched-expression in the retina are made accessible in a user-friendly manner at iSyTE ( https://research.bioinformatics.udel.edu/iSyTE/ ), to allow effective visualization of this information and facilitate eye gene discovery.


Subject(s)
Eye Diseases , Retinal Pigment Epithelium , Animals , Mice , Retinal Pigment Epithelium/metabolism , Tandem Mass Spectrometry , Proteome/genetics , Proteome/metabolism , Proteomics , Retina/metabolism , Gene Expression Profiling , Genetic Association Studies
3.
Res Sq ; 2023 Mar 17.
Article in English | MEDLINE | ID: mdl-36993571

ABSTRACT

To expedite gene discovery in eye development and its associated defects, we previously developed a bioinformatics resource-tool iSyTE (integrated Systems Tool for Eye gene discovery). However, iSyTE is presently limited to lens tissue and is predominantly based on transcriptomics datasets. Therefore, to extend iSyTE to other eye tissues on the proteome level, we performed high-throughput tandem mass spectrometry (MS/MS) on mouse embryonic day (E)14.5 retina and retinal pigment epithelium combined tissue and identified an average of 3,300 proteins per sample (n=5). High-throughput expression profiling-based gene discovery approaches-involving either transcriptomics or proteomics-pose a key challenge of prioritizing candidates from thousands of RNA/proteins expressed. To address this, we used MS/MS proteome data from mouse whole embryonic body (WB) as a reference dataset and performed comparative analysis-termed "in silico WB-subtraction"-with the retina proteome dataset. In silico WB-subtraction identified 90 high-priority proteins with retina-enriched expression at stringency criteria of ³2.5 average spectral counts, ³2.0 fold-enrichment, False Discovery Rate <0.01. These top candidates represent a pool of retina-enriched proteins, several of which are associated with retinal biology and/or defects (e.g., Aldh1a1, Ank2, Ank3, Dcn, Dync2h1, Egfr, Ephb2, Fbln5, Fbn2, Hras, Igf2bp1, Msi1, Rbp1, Rlbp1, Tenm3, Yap1, etc.), indicating the effectiveness of this approach. Importantly, in silico WB-subtraction also identified several new high-priority candidates with potential regulatory function in retina development. Finally, proteins exhibiting expression or enriched-expression in the retina are made accessible in a user-friendly manner at iSyTE (https://research.bioinformatics.udel.edu/iSyTE/), to allow effective visualization of this information and facilitate eye gene discovery.

4.
PLoS Biol ; 19(12): e3001464, 2021 12.
Article in English | MEDLINE | ID: mdl-34871295

ABSTRACT

The UniProt knowledgebase is a public database for protein sequence and function, covering the tree of life and over 220 million protein entries. Now, the whole community can use a new crowdsourcing annotation system to help scale up UniProt curation and receive proper attribution for their biocuration work.


Subject(s)
Crowdsourcing/methods , Data Curation/methods , Molecular Sequence Annotation/methods , Amino Acid Sequence/genetics , Computational Biology/methods , Databases, Protein/trends , Humans , Literature , Proteins/metabolism , Stakeholder Participation
5.
mBio ; 12(5): e0206021, 2021 10 26.
Article in English | MEDLINE | ID: mdl-34517763

ABSTRACT

We describe here the structure and organization of TnCentral (https://tncentral.proteininformationresource.org/ [or the mirror link at https://tncentral.ncc.unesp.br/]), a web resource for prokaryotic transposable elements (TE). TnCentral currently contains ∼400 carefully annotated TE, including transposons from the Tn3, Tn7, Tn402, and Tn554 families; compound transposons; integrons; and associated insertion sequences (IS). These TE carry passenger genes, including genes conferring resistance to over 25 classes of antibiotics and nine types of heavy metal, as well as genes responsible for pathogenesis in plants, toxin/antitoxin gene pairs, transcription factors, and genes involved in metabolism. Each TE has its own entry page, providing details about its transposition genes, passenger genes, and other sequence features required for transposition, as well as a graphical map of all features. TnCentral content can be browsed and queried through text- and sequence-based searches with a graphic output. We describe three use cases, which illustrate how the search interface, results tables, and entry pages can be used to explore and compare TE. TnCentral also includes downloadable software to facilitate user-driven identification, with manual annotation, of certain types of TE in genomic sequences. Through the TnCentral homepage, users can also access TnPedia, which provides comprehensive reviews of the major TE families, including an extensive general section and specialized sections with descriptions of insertion sequence and transposon families. TnCentral and TnPedia are intuitive resources that can be used by clinicians and scientists to assess TE diversity in clinical, veterinary, and environmental samples. IMPORTANCE The ability of bacteria to undergo rapid evolution and adapt to changing environmental circumstances drives the public health crisis of multiple antibiotic resistance, as well as outbreaks of disease in economically important agricultural crops and animal husbandry. Prokaryotic transposable elements (TE) play a critical role in this. Many carry "passenger genes" (not required for the transposition process) conferring resistance to antibiotics or heavy metals or causing disease in plants and animals. Passenger genes are spread by normal TE transposition activities and by insertion into plasmids, which then spread via conjugation within and across bacterial populations. Thus, an understanding of TE composition and transposition mechanisms is key to developing strategies to combat bacterial pathogenesis. Toward this end, we have developed TnCentral, a bioinformatics resource dedicated to describing and exploring the structural and functional features of prokaryotic TE whose use is intuitive and accessible to users with or without bioinformatics expertise.


Subject(s)
Bacteria/genetics , Computational Biology/methods , DNA Transposable Elements , Databases, Genetic , Computational Biology/instrumentation , Internet , Software , Web Browser
7.
Sci Data ; 7(1): 337, 2020 10 12.
Article in English | MEDLINE | ID: mdl-33046717

ABSTRACT

The Protein Ontology (PRO) provides an ontological representation of protein-related entities, ranging from protein families to proteoforms to complexes. Protein Ontology Linked Open Data (LOD) exposes, shares, and connects knowledge about protein-related entities on the Semantic Web using Resource Description Framework (RDF), thus enabling integration with other Linked Open Data for biological knowledge discovery. For example, proteins (or variants thereof) can be retrieved on the basis of specific disease associations. As a community resource, we strive to follow the Findability, Accessibility, Interoperability, and Reusability (FAIR) principles, disseminate regular updates of our data, support multiple methods for accessing, querying and downloading data in various formats, and provide documentation both for scientists and programmers. PRO Linked Open Data can be browsed via faceted browser interface and queried using SPARQL via YASGUI. RDF data dumps are also available for download. Additionally, we developed RESTful APIs to support programmatic data access. We also provide W3C HCLS specification compliant metadata description for our data. The PRO Linked Open Data is available at https://lod.proconsortium.org/ .


Subject(s)
Knowledge Discovery , Proteins/chemistry , Semantic Web , Datasets as Topic , Software
8.
Bioinformatics ; 36(17): 4643-4648, 2020 11 01.
Article in English | MEDLINE | ID: mdl-32399560

ABSTRACT

MOTIVATION: The number of protein records in the UniProt Knowledgebase (UniProtKB: https://www.uniprot.org) continues to grow rapidly as a result of genome sequencing and the prediction of protein-coding genes. Providing functional annotation for these proteins presents a significant and continuing challenge. RESULTS: In response to this challenge, UniProt has developed a method of annotation, known as UniRule, based on expertly curated rules, which integrates related systems (RuleBase, HAMAP, PIRSR, PIRNR) developed by the members of the UniProt consortium. UniRule uses protein family signatures from InterPro, combined with taxonomic and other constraints, to select sets of reviewed proteins which have common functional properties supported by experimental evidence. This annotation is propagated to unreviewed records in UniProtKB that meet the same selection criteria, most of which do not have (and are never likely to have) experimentally verified functional annotation. Release 2020_01 of UniProtKB contains 6496 UniRule rules which provide annotation for 53 million proteins, accounting for 30% of the 178 million records in UniProtKB. UniRule provides scalable enrichment of annotation in UniProtKB. AVAILABILITY AND IMPLEMENTATION: UniRule rules are integrated into UniProtKB and can be viewed at https://www.uniprot.org/unirule/. UniRule rules and the code required to run the rules, are publicly available for researchers who wish to annotate their own sequences. The implementation used to run the rules is known as UniFIRE and is available at https://gitlab.ebi.ac.uk/uniprot-public/unifire.


Subject(s)
Knowledge Bases , Proteins , Chromosome Mapping , Databases, Protein , Molecular Sequence Annotation , Proteins/genetics
9.
Hum Genet ; 139(2): 151-184, 2020 Feb.
Article in English | MEDLINE | ID: mdl-31797049

ABSTRACT

While the bioinformatics resource-tool iSyTE (integrated Systems Tool for Eye gene discovery) effectively identifies human cataract-associated genes, it is currently based on just transcriptome data, and thus, it is necessary to include protein-level information to gain greater confidence in gene prioritization. Here, we expand iSyTE through development of a novel proteome-based resource on the lens and demonstrate its utility in cataract gene discovery. We applied high-throughput tandem mass spectrometry (MS/MS) to generate a global protein expression profile of mouse lens at embryonic day (E)14.5, which identified 2371 lens-expressed proteins. A major challenge of high-throughput expression profiling is identification of high-priority candidates among the thousands of expressed proteins. To address this problem, we generated new MS/MS proteome data on mouse whole embryonic body (WB). WB proteome was then used as a reference dataset for performing "in silico WB-subtraction" comparative analysis with the lens proteome, which effectively identified 422 proteins with lens-enriched expression at ≥ 2.5 average spectral counts, ≥ 2.0 fold enrichment (FDR < 0.01) cut-off. These top 20% candidates represent a rich pool of high-priority proteins in the lens including known human cataract-linked genes and many new potential regulators of lens development and homeostasis. This rich information is made publicly accessible through iSyTE (https://research.bioinformatics.udel.edu/iSyTE/), which enables user-friendly visualization of promising candidates, thus making iSyTE a comprehensive tool for cataract gene discovery.


Subject(s)
Biomarkers/metabolism , Cataract/metabolism , Computer Simulation , Eye Proteins/metabolism , Lens, Crystalline/metabolism , Proteome/metabolism , Tandem Mass Spectrometry/methods , Animals , Cataract/genetics , Cataract/pathology , Computational Biology , Eye Proteins/genetics , Gene Expression Profiling , Humans , Lens, Crystalline/embryology , Mice , Mice, Inbred C57BL , Proteome/analysis , Transcriptome
10.
Hum Mutat ; 40(6): 694-705, 2019 06.
Article in English | MEDLINE | ID: mdl-30840782

ABSTRACT

Understanding the association of genetic variation with its functional consequences in proteins is essential for the interpretation of genomic data and identifying causal variants in diseases. Integration of protein function knowledge with genome annotation can assist in rapidly comprehending genetic variation within complex biological processes. Here, we describe mapping UniProtKB human sequences and positional annotations, such as active sites, binding sites, and variants to the human genome (GRCh38) and the release of a public genome track hub for genome browsers. To demonstrate the power of combining protein annotations with genome annotations for functional interpretation of variants, we present specific biological examples in disease-related genes and proteins. Computational comparisons of UniProtKB annotations and protein variants with ClinVar clinically annotated single nucleotide polymorphism (SNP) data show that 32% of UniProtKB variants colocate with 8% of ClinVar SNPs. The majority of colocated UniProtKB disease-associated variants (86%) map to 'pathogenic' ClinVar SNPs. UniProt and ClinVar are collaborating to provide a unified clinical variant annotation for genomic, protein, and clinical researchers. The genome track hubs, and related UniProtKB files, are downloadable from the UniProt FTP site and discoverable as public track hubs at the UCSC and Ensembl genome browsers.


Subject(s)
Chromosome Mapping/methods , Databases, Genetic , Mutation, Missense , Proteins/chemistry , Binding Sites , Databases, Protein , Genetic Predisposition to Disease , Humans , Molecular Sequence Annotation , Polymorphism, Single Nucleotide , Protein Binding , Proteins/genetics , Proteins/metabolism , Software , Web Browser
11.
Database (Oxford) ; 20192019 01 01.
Article in English | MEDLINE | ID: mdl-30805646

ABSTRACT

Methods focused on predicting 'global' annotations for proteins (such as molecular function, biological process and presence of domains or membership in a family) have reached a relatively mature stage. Methods to provide fine-grained 'local' annotation of functional sites (at the level of individual amino acid) are now coming to the forefront, especially in light of the rapid accumulation of genetic variant data. We have developed a computational method and workflow that predicts functional sites within proteins using position-specific conditional template annotation rules (namely PIR Site Rules or PIRSRs for short). Such rules are curated through review of known protein structural and other experimental data by structural biologists and are used to generate high-quality annotations for the UniProt Knowledgebase (UniProtKB) unreviewed section. To share the PIRSR functional site prediction method with the broader scientific community, we have streamlined our workflow and developed a stand-alone Java software package named PIRSitePredict. We demonstrate the use of PIRSitePredict for functional annotation of de novo assembled genome/transcriptome by annotating uncharacterized proteins from Trinity RNA-seq assembly of embryonic transcriptomes of the following three cartilaginous fishes: Leucoraja erinacea (Little Skate), Scyliorhinus canicula (Small-spotted Catshark) and Callorhinchus milii (Elephant Shark). On average about 1200 lines of annotations were predicted for each species.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Amino Acid Sequence , Animals , Embryo, Nonmammalian/metabolism , Fishes/embryology , Fishes/genetics , Genome , Software , Transcriptome/genetics
12.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Article in English | MEDLINE | ID: mdl-30398656

ABSTRACT

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Subject(s)
Databases, Protein , Molecular Sequence Annotation , Animals , Databases, Genetic , Gene Ontology , Humans , Internet , Multigene Family , Protein Domains/genetics , Sequence Homology, Amino Acid , Software , User-Computer Interface
13.
Hum Genet ; 137(11-12): 941-954, 2018 Dec.
Article in English | MEDLINE | ID: mdl-30417254

ABSTRACT

Isolated or syndromic congenital cataracts are heterogeneous developmental defects, making the identification of the associated genes challenging. In the past, mouse lens expression microarrays have been successfully applied in bioinformatics tools (e.g., iSyTE) to facilitate human cataract-associated gene discovery. To develop a new resource for geneticists, we report high-throughput RNA sequencing (RNA-seq) profiles of mouse lens at key embryonic stages (E)10.5 (lens pit), E12.5 (primary fiber cell differentiation), E14.5 and E16.5 (secondary fiber cell differentiation). These stages capture important events as the lens develops from an invaginating placode into a transparent tissue. Previously, in silico whole-embryo body (WB)-subtraction-based "lens-enriched" expression has been effective in prioritizing cataract-linked genes. To apply an analogous approach, we generated new mouse WB RNA-seq datasets and show that in silico WB subtraction of lens RNA-seq datasets successfully identifies key genes based on lens-enriched expression. At ≥2 counts-per-million expression, ≥1.5 log2 fold-enrichment (p < 0.05) cutoff, E10.5 lens exhibits 1401 enriched genes (17% lens-expressed genes), E12.5 lens exhibits 1937 enriched genes (22% lens-expressed genes), E14.5 lens exhibits 2514 enriched genes (31% lens-expressed genes), and E16.5 lens exhibits 2745 enriched genes (34% lens-expressed genes). Biological pathway analysis identified genes associated with lens development, transcription regulation and signaling pathways, among other functional groups. Furthermore, these new RNA-seq data confirmed high expression of established cataract-linked genes and identified new potential regulators in the lens. Finally, we developed new lens stage-specific UCSC Genome Brower annotation tracks and made these publicly accessible through iSyTE ( https://research.bioinformatics.udel.edu/iSyTE/ ) for user-friendly visualization of lens gene expression/enrichment to prioritize genes from high-throughput data from cataract cases.


Subject(s)
Cataract/genetics , Cell Differentiation/genetics , Embryonic Development/genetics , Gene Expression Regulation/genetics , Animals , Cataract/pathology , Computational Biology , Genetic Association Studies , Genetic Predisposition to Disease , High-Throughput Nucleotide Sequencing , Humans , Lens, Crystalline/pathology , Mice , Sequence Analysis, RNA
14.
BMC Genomics ; 19(1): 695, 2018 Sep 21.
Article in English | MEDLINE | ID: mdl-30241500

ABSTRACT

BACKGROUND: Although hatching is perhaps the most abrupt and profound metabolic challenge that a chicken must undergo; there have been no attempts to functionally map the metabolic pathways induced in liver during the embryo-to-hatchling transition. Furthermore, we know very little about the metabolic and regulatory factors that regulate lipid metabolism in late embryos or newly-hatched chicks. In the present study, we examined hepatic transcriptomes of 12 embryos and 12 hatchling chicks during the peri-hatch period-or the metabolic switch from chorioallantoic to pulmonary respiration. RESULTS: Initial hierarchical clustering revealed two distinct, albeit opposing, patterns of hepatic gene expression. Cluster A genes are largely lipolytic and highly expressed in embryos. While, Cluster B genes are lipogenic/thermogenic and mainly controlled by the lipogenic transcription factor THRSPA. Using pairwise comparisons of embryo and hatchling ages, we found 1272 genes that were differentially expressed between embryos and hatchling chicks, including 24 transcription factors and 284 genes that regulate lipid metabolism. The three most differentially-expressed transcripts found in liver of embryos were MOGAT1, DIO3 and PDK4, whereas THRSPA, FASN and DIO2 were highest in hatchlings. An unusual finding was the "ectopic" and extremely high differentially expression of seven feather keratin transcripts in liver of 16 day embryos, which coincides with engorgement of liver with yolk lipids. Gene interaction networks show several transcription factors, transcriptional co-activators/co-inhibitors and their downstream genes that exert a 'ying-yang' action on lipid metabolism during the embryo-to-hatching transition. These upstream regulators include ligand-activated transcription factors, sirtuins and Kruppel-like factors. CONCLUSIONS: Our genome-wide transcriptional analysis has greatly expanded the hepatic repertoire of regulatory and metabolic genes involved in the embryo-to-hatchling transition. New knowledge was gained on interactive transcriptional networks and metabolic pathways that enable the abrupt switch from ectothermy (embryo) to endothermy (hatchling) in the chicken. Several transcription factors and their coactivators/co-inhibitors appear to exert opposing actions on lipid metabolism, leading to the predominance of lipolysis in embryos and lipogenesis in hatchlings. Our analysis of hepatic transcriptomes has enabled discovery of opposing, interconnected and interdependent transcriptional regulators that provide precise ying-yang or homeorhetic regulation of lipid metabolism during the critical embryo-to-hatchling transition.


Subject(s)
Chickens/growth & development , Chickens/metabolism , Gene Expression Regulation, Developmental , Liver/metabolism , Animals , Breeding , Chick Embryo/growth & development , Chick Embryo/metabolism , Embryonic Development , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Liver/embryology , Liver/growth & development , Transcriptome
15.
Nucleic Acids Res ; 46(D1): D875-D885, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29036527

ABSTRACT

Although successful in identifying new cataract-linked genes, the previous version of the database iSyTE (integrated Systems Tool for Eye gene discovery) was based on expression information on just three mouse lens stages and was functionally limited to visualization by only UCSC-Genome Browser tracks. To increase its efficacy, here we provide an enhanced iSyTE version 2.0 (URL: http://research.bioinformatics.udel.edu/iSyTE) based on well-curated, comprehensive genome-level lens expression data as a one-stop portal for the effective visualization and analysis of candidate genes in lens development and disease. iSyTE 2.0 includes all publicly available lens Affymetrix and Illumina microarray datasets representing a broad range of embryonic and postnatal stages from wild-type and specific gene-perturbation mouse mutants with eye defects. Further, we developed a new user-friendly web interface for direct access and cogent visualization of the curated expression data, which supports convenient searches and a range of downstream analyses. The utility of these new iSyTE 2.0 features is illustrated through examples of established genes associated with lens development and pathobiology, which serve as tutorials for its application by the end-user. iSyTE 2.0 will facilitate the prioritization of eye development and disease-linked candidate genes in studies involving transcriptomics or next-generation sequencing data, linkage analysis and GWAS approaches.


Subject(s)
Cataract/genetics , Databases, Genetic , Eye Proteins/genetics , Gene Expression , Genetic Association Studies/methods , Animals , Cataract/embryology , Cataract/metabolism , Datasets as Topic , Disease Models, Animal , Eye Proteins/biosynthesis , Forecasting , Gene Expression Profiling , Gene Regulatory Networks , Genome-Wide Association Study , Humans , Lens, Crystalline/embryology , Lens, Crystalline/growth & development , Lens, Crystalline/metabolism , Mice , Mice, Mutant Strains , Oligonucleotide Array Sequence Analysis , User-Computer Interface
16.
Nucleic Acids Res ; 46(D1): D542-D550, 2018 01 04.
Article in English | MEDLINE | ID: mdl-29145615

ABSTRACT

Protein post-translational modifications (PTMs) play a pivotal role in numerous biological processes by modulating regulation of protein function. We have developed iPTMnet (http://proteininformationresource.org/iPTMnet) for PTM knowledge discovery, employing an integrative bioinformatics approach-combining text mining, data mining, and ontological representation to capture rich PTM information, including PTM enzyme-substrate-site relationships, PTM-specific protein-protein interactions (PPIs) and PTM conservation across species. iPTMnet encompasses data from (i) our PTM-focused text mining tools, RLIMS-P and eFIP, which extract phosphorylation information from full-scale mining of PubMed abstracts and full-length articles; (ii) a set of curated databases with experimentally observed PTMs; and iii) Protein Ontology that organizes proteins and PTM proteoforms, enabling their representation, annotation and comparison within and across species. Presently covering eight major PTM types (phosphorylation, ubiquitination, acetylation, methylation, glycosylation, S-nitrosylation, sumoylation and myristoylation), iPTMnet knowledgebase contains more than 654 500 unique PTM sites in over 62 100 proteins, along with more than 1200 PTM enzymes and over 24 300 PTM enzyme-substrate-site relations. The website supports online search, browsing, retrieval and visual analysis for scientific queries. Several examples, including functional interpretation of phosphoproteomic data, demonstrate iPTMnet as a gateway for visual exploration and systematic analysis of PTM networks and conservation, thereby enabling PTM discovery and hypothesis generation.


Subject(s)
Databases, Protein , Knowledge Bases , Protein Processing, Post-Translational , Animals , Computational Biology , Data Mining , Enzymes/metabolism , Humans , Internet , Phosphorylation , Protein Interaction Maps , Sequence Alignment
17.
Methods Mol Biol ; 1558: 213-232, 2017.
Article in English | MEDLINE | ID: mdl-28150240

ABSTRACT

Post-translational modifications (PTMs) are one of the main contributors to the diversity of proteoforms in the proteomic landscape. In particular, protein phosphorylation represents an essential regulatory mechanism that plays a role in many biological processes. Protein kinases, the enzymes catalyzing this reaction, are key participants in metabolic and signaling pathways. Their activation or inactivation dictate downstream events: what substrates are modified and their subsequent impact (e.g., activation state, localization, protein-protein interactions (PPIs)). The biomedical literature continues to be the main source of evidence for experimental information about protein phosphorylation. Automatic methods to bring together phosphorylation events and phosphorylation-dependent PPIs can help to summarize the current knowledge and to expose hidden connections. In this chapter, we demonstrate two text mining tools, RLIMS-P and eFIP, for the retrieval and extraction of kinase-substrate-site data and phosphorylation-dependent PPIs from the literature. These tools offer several advantages over a literature search in PubMed as their results are specific for phosphorylation. RLIMS-P and eFIP results can be sorted, organized, and viewed in multiple ways to answer relevant biological questions, and the protein mentions are linked to UniProt identifiers.


Subject(s)
Computational Biology/methods , Data Mining/methods , Phosphoproteins/metabolism , Proteins/metabolism , Proteomics/methods , Software , Databases, Protein , Phosphorylation , Protein Binding , Protein Interaction Mapping , Protein Processing, Post-Translational , Search Engine , User-Computer Interface , Web Browser
18.
Methods Mol Biol ; 1558: 333-353, 2017.
Article in English | MEDLINE | ID: mdl-28150246

ABSTRACT

Protein post-translational modification (PTM) is an essential cellular regulatory mechanism, and disruptions in PTM have been implicated in disease. PTMs are an active area of study in many fields, leading to a wealth of PTM information in the scientific literature. There is a need for user-friendly bioinformatics resources that capture PTM information from the literature and support analyses of PTMs and their functional consequences. This chapter describes the use of iPTMnet ( http://proteininformationresource.org/iPTMnet/ ), a resource that integrates PTM information from text mining, curated databases, and ontologies and provides visualization tools for exploring PTM networks, PTM crosstalk, and PTM conservation across species. We present several PTM-related queries and demonstrate how they can be addressed using iPTMnet.


Subject(s)
Computational Biology/methods , Databases, Protein , Protein Processing, Post-Translational , Software , Web Browser , Animals , Data Mining/methods , Humans , Mice , Phosphotransferases , Plant Proteins , Protein Binding , Protein Interaction Mapping/methods , Protein Interaction Maps , Rats , Search Engine , User-Computer Interface
19.
Methods Mol Biol ; 1558: 3-39, 2017.
Article in English | MEDLINE | ID: mdl-28150231

ABSTRACT

Many publicly available data repositories and resources have been developed to support protein-related information management, data-driven hypothesis generation, and biological knowledge discovery. To help researchers quickly find the appropriate protein-related informatics resources, we present a comprehensive review (with categorization and description) of major protein bioinformatics databases in this chapter. We also discuss the challenges and opportunities for developing next-generation protein bioinformatics databases and resources to support data integration and data analytics in the Big Data era.


Subject(s)
Computational Biology/methods , Databases, Genetic , Proteins/genetics , Proteins/metabolism , Proteomics/methods , Software , Web Browser , Animals , Genomics/methods , Humans
20.
Nucleic Acids Res ; 45(D1): D190-D199, 2017 01 04.
Article in English | MEDLINE | ID: mdl-27899635

ABSTRACT

InterPro (http://www.ebi.ac.uk/interpro/) is a freely available database used to classify protein sequences into families and to predict the presence of important domains and sites. InterProScan is the underlying software that allows both protein and nucleic acid sequences to be searched against InterPro's predictive models, which are provided by its member databases. Here, we report recent developments with InterPro and its associated software, including the addition of two new databases (SFLD and CDD), and the functionality to include residue-level annotation and prediction of intrinsic disorder. These developments enrich the annotations provided by InterPro, increase the overall number of residues annotated and allow more specific functional inferences.


Subject(s)
Computational Biology/methods , Databases, Protein , Protein Interaction Domains and Motifs , Software , Humans , Molecular Sequence Annotation , Phylogeny
SELECTION OF CITATIONS
SEARCH DETAIL