Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 42
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
PLoS Comput Biol ; 19(11): e1011498, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37934729

RESUMEN

Public-domain availability for bioinformatics software resources is a key requirement that ensures long-term permanence and methodological reproducibility for research and development across the life sciences. These issues are particularly critical for widely used, efficient, and well-proven methods, especially those developed in research settings that often face funding discontinuities. We re-launch a range of established software components for computational genomics, as legacy version 1.0.1, suitable for sequence matching, masking, searching, clustering and visualization for protein family discovery, annotation and functional characterization on a genome scale. These applications are made available online as open source and include MagicMatch, GeneCAST, support scripts for CoGenT-like sequence collections, GeneRAGE and DifFuse, supported by centrally administered bioinformatics infrastructure funding. The toolkit may also be conceived as a flexible genome comparison software pipeline that supports research in this domain. We illustrate basic use by examples and pictorial representations of the registered tools, which are further described with appropriate documentation files in the corresponding GitHub release.


Asunto(s)
Genómica , Programas Informáticos , Reproducibilidad de los Resultados , Genómica/métodos , Biología Computacional/métodos , Genoma
2.
Nucleic Acids Res ; 50(D1): D480-D487, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850135

RESUMEN

The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.


Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/metabolismo , Anotación de Secuencia Molecular , Programas Informáticos , Secuencia de Aminoácidos , ADN/genética , ADN/metabolismo , Conjuntos de Datos como Asunto , Ontología de Genes , Humanos , Internet , Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/genética , Unión Proteica , ARN/genética , ARN/metabolismo
3.
J Med Virol ; 95(12): e29264, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-38054553

RESUMEN

The Octamer-binding transcription factor-4 (Oct4) is upregulated in different malignancies, yet a paradigm for mechanisms of Oct4 post-embryonic re-expression is inadequately understood. In cervical cancer, Oct4 expression is higher in human papillomavirus (HPV)-related than HPV-unrelated cervical cancers and this upregulation correlates with the expression of the E7 oncogene. We have reported that E7 affects the Oct4-transcriptional output and Oct4-related phenotypes in cervical cancer, however, the underlying mechanism remains elusive. Here, we characterize the Oct4-protein interactions in cervical cancer cells via computational analyses and Mass Spectrometry and reveal that Methyl-binding proteins (MBD2 and MBD3), are determinants of Oct4-driven transcription. E7 triggers MBD2 downregulation and TET1 upregulation, thereby disrupting the methylation status of the Oct4 gene. This coincides with an increase in the total DNA hydroxymethylation leading to the re-expression of Oct4 in cervical cancer and likely affecting broader transcriptional patterns. Our findings reveal a previously unreported mechanism by which the E7 oncogene can regulate Oct4 re-expression and global transcriptional patterns by increasing DNA hydroxymethylation and lowering the barrier to cellular plasticity during carcinogenesis.


Asunto(s)
Factor 3 de Transcripción de Unión a Octámeros , Proteínas Oncogénicas Virales , Infecciones por Papillomavirus , Neoplasias del Cuello Uterino , Femenino , Humanos , ADN , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Oxigenasas de Función Mixta , Proteínas Oncogénicas Virales/genética , Proteínas E7 de Papillomavirus/genética , Proteínas Proto-Oncogénicas , Neoplasias del Cuello Uterino/genética , Neoplasias del Cuello Uterino/virología , Factor 3 de Transcripción de Unión a Octámeros/genética
4.
Brief Bioinform ; 21(2): 458-472, 2020 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-30698641

RESUMEN

There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. SHORT ABSTRACT: There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.


Asunto(s)
Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Evolución Molecular , Conformación Proteica , Dominios Proteicos
5.
Nucleic Acids Res ; 48(W1): W77-W84, 2020 07 02.
Artículo en Inglés | MEDLINE | ID: mdl-32421769

RESUMEN

Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToLoCo - PLAtform of TOols for LOw COmplexity-a meta-server that integrates and collects the output of five different state-of-the-art tools for discovering LCRs and provides functional annotations such as domain detection, transmembrane segment prediction, and calculation of amino acid frequencies. In addition, the union or intersection of the results of the search on a query sequence can be obtained. By developing the PlaToLoCo meta-server, we provide the community with a fast and easily accessible tool for the analysis of LCRs with additional information included to aid the interpretation of the results. The PlaToLoCo platform is available at: http://platoloco.aei.polsl.pl/.


Asunto(s)
Proteínas/química , Programas Informáticos , Aminoácidos/análisis , Gráficos por Computador , Humanos , Proteínas de la Membrana/química , Anotación de Secuencia Molecular , Dominios Proteicos , Análisis de Secuencia de Proteína
6.
Nucleic Acids Res ; 48(D1): D269-D276, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31713636

RESUMEN

The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.


Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Ontologías Biológicas , Curaduría de Datos , Anotación de Secuencia Molecular
7.
Nucleic Acids Res ; 47(21): 10994-11006, 2019 12 02.
Artículo en Inglés | MEDLINE | ID: mdl-31584084

RESUMEN

The widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with 'ready-to-use' deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotation-deposition workflow, and that may proliferate in public database repositories affecting all downstream analyses. As a case study, we provide examples of the Atlantic cod genome, whose sequencing and assembly were hindered by a particularly high prevalence of tandem repeats. We complement this case study with examples from other species, where mis-annotations and sequencing errors have propagated into protein databases. With this review, we aim to raise the awareness level within the community of database users, and alert scientists working in the underlying workflow of database creation that the data they omit or improperly assemble may well contain important biological information valuable to others.


Asunto(s)
ADN/genética , Bases de Datos de Ácidos Nucleicos , Bases de Datos de Proteínas , Error Científico Experimental , Secuencias Repetidas en Tándem/genética , Animales , Gadus morhua/genética , Análisis de Secuencia de ADN
8.
Proteins ; 2020 Aug 10.
Artículo en Inglés | MEDLINE | ID: mdl-32776636

RESUMEN

The focal adhesion kinase (FAK) and the proline-rich tyrosine kinase 2-beta (PYK2) are implicated in cancer progression and metastasis and represent promising biomarkers and targets for cancer therapy. FAK and PYK2 are recruited to focal adhesions (FAs) via interactions between their FA targeting (FAT) domains and conserved segments (LD motifs) on the proteins Paxillin, Leupaxin, and Hic-5. A promising new approach for the inhibition of FAK and PYK2 targets interactions of the FAK domains with proteins that promote localization at FAs. Advances toward this goal include the development of surface plasmon resonance, heteronuclear single quantum coherence nuclear magnetic resonance (HSQC-NMR) and fluorescence polarization assays for the identification of fragments or compounds interfering with the FAK-Paxillin interaction. We have recently validated this strategy, showing that Paxillin mimicking polypeptides with 2 to 3 LD motifs displace FAK from FAs and block kinase-dependent and independent functions of FAK, including downstream integrin signaling and FA localization of the protein p130Cas. In the present work we study by all-atom molecular dynamics simulations the recognition of peptides with the Paxillin and Leupaxin LD motifs by the FAK-FAT and PYK2-FAT domains. Our simulations and free-energy analysis interpret experimental data on binding of Paxillin and Leupaxin LD motifs at FAK-FAT and PYK2-FAT binding sites, and assess the roles of consensus LD regions and flanking residues. Our results can assist in the design of effective inhibitory peptides of the FAK-FAT: Paxillin and PYK2-FAT:Leupaxin complexes and the construction of pharmacophore models for the discovery of potential small-molecule inhibitors of the FAK-FAT and PYK2-FAT focal adhesion based functions.

9.
Glycobiology ; 29(5): 385-396, 2019 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-30835280

RESUMEN

Despite the controversy regarding the importance of protein N-linked glycosylation in species of the genus Plasmodium, genes potentially encoding core subunits of the oligosaccharyltransferase (OST) complex have already been characterized in completely sequenced genomes of malaria parasites. Nevertheless, the currently established notion is that only four out of eight subunits of the OST complex-which is considered conserved across eukaryotes-are present in Plasmodium species. In this study, we carefully conduct computational analysis to provide unequivocal evidence that all components of the OST complex, with the exception of Swp1/Ribophorin II, can be reliably identified within completely sequenced plasmodial genomes. In fact, most of the subunits currently considered as absent from Plasmodium refer to uncharacterized protein sequences already existing in sequence databases. Interestingly, the main reason why the unusually short Ost4 subunit (36 residues long in yeast) has not been identified so far in plasmodia (and possibly other species) is the failure of gene-prediction pipelines to detect such a short coding sequence. We further identify elusive OST subunits in select protist species with completely sequenced genomes. Thus, our work highlights the necessity of a systematic approach towards the characterization of OST subunits across eukaryotes. This is necessary both for obtaining a concrete picture of the evolution of the OST complex but also for elucidating its possible role in eukaryotic pathogens.


Asunto(s)
Biología Computacional , Hexosiltransferasas/metabolismo , Proteínas de la Membrana/metabolismo , Plasmodium/enzimología , Animales , Bases de Datos de Proteínas , Drosophila melanogaster , Eucariontes/metabolismo , Humanos , Ratones
10.
BMC Biol ; 14(1): 106, 2016 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-27927215

RESUMEN

BACKGROUND: Transcriptome studies have revealed that many eukaryotic genomes are pervasively transcribed producing numerous long non-coding RNAs (lncRNAs). However, only a few lncRNAs have been ascribed a cellular role thus far, with most regulating the expression of adjacent genes. Even less lncRNAs have been annotated as essential hence implying that the majority may be functionally redundant. Therefore, the function of lncRNAs could be illuminated through systematic analysis of their synthetic genetic interactions (GIs). RESULTS: Here, we employ synthetic genetic array (SGA) in Saccharomyces cerevisiae to identify GIs between long intergenic non-coding RNAs (lincRNAs) and protein-coding genes. We first validate this approach by demonstrating that the telomerase RNA TLC1 displays a GI network that corresponds to its well-described function in telomere length maintenance. We subsequently performed SGA screens on a set of uncharacterised lincRNAs and uncover their connection to diverse cellular processes. One of these lincRNAs, SUT457, exhibits a GI profile associating it to telomere organisation and we consistently demonstrate that SUT457 is required for telomeric overhang homeostasis through an Exo1-dependent pathway. Furthermore, the GI profile of SUT457 is distinct from that of its neighbouring genes suggesting a function independent to its genomic location. Accordingly, we show that ectopic expression of this lincRNA suppresses telomeric overhang accumulation in sut457Δ cells assigning a trans-acting role for SUT457 in telomere biology. CONCLUSIONS: Overall, our work proposes that systematic application of this genetic approach could determine the functional significance of individual lncRNAs in yeast and other complex organisms.


Asunto(s)
Genoma Fúngico , ARN Largo no Codificante/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/genética , Telómero/genética , ADN de Hongos/genética , Exodesoxirribonucleasas/genética , Exodesoxirribonucleasas/metabolismo , Perfilación de la Expresión Génica , Ontología de Genes , Genómica , Proteínas de Saccharomyces cerevisiae/genética , Telomerasa/genética , Telomerasa/metabolismo
11.
Brief Bioinform ; 15(3): 443-54, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-23220349

RESUMEN

More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.


Asunto(s)
Biología Computacional/métodos , Fusión Génica , Animales , Genes Fúngicos , Genoma Humano , Genómica , Humanos , Filogenia , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Proteómica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
12.
Bioinformatics ; 31(13): 2208-10, 2015 Jul 01.
Artículo en Inglés | MEDLINE | ID: mdl-25712690

RESUMEN

MOTIVATION: Local compositionally biased and low complexity regions (LCRs) in amino acid sequences have initially attracted the interest of researchers due to their implication in generating artifacts in sequence database searches. There is accumulating evidence of the biological significance of LCRs both in physiological and in pathological situations. Nonetheless, LCR-related algorithms and tools have not gained wide appreciation across the research community, partly due to the fact that only a handful of user-friendly software is currently freely available. RESULTS: We developed LCR-eXXXplorer, an extensible online platform attempting to fill this gap. LCR-eXXXplorer offers tools for displaying LCRs from the UniProt/SwissProt knowledgebase, in combination with other relevant protein features, predicted or experimentally verified. Moreover, users may perform powerful queries against a custom designed sequence/LCR-centric database. We anticipate that LCR-eXXXplorer will be a useful starting point in research efforts for the elucidation of the structure, function and evolution of proteins with LCRs. AVAILABILITY AND IMPLEMENTATION: LCR-eXXXplorer is freely available at the URL http://repeat.biol.ucy.ac.cy/lcr-exxxplorer. CONTACT: vprobon@ucy.ac.cy SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Gráficos por Computador , Bases de Datos de Proteínas , Internet , Proteínas/química , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Humanos
13.
Bioinformatics ; 36(9): 2963-2965, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32129821
14.
Bioinformatics ; 30(22): 3249-56, 2014 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-25100685

RESUMEN

SUMMARY: The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. AVAILABILITY: The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. CONTACT: g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Autoria , Análisis por Conglomerados , Enfermedad/genética , Genes , Humanos , Internet , Medical Subject Headings , Proteínas , PubMed , Publicaciones
16.
Bioinformatics ; 28(4): 591-2, 2012 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-22199385

RESUMEN

UNLABELLED: We present LaTcOm, a new web tool, which offers several alternative methods for 'rare codon cluster' (RCC) identification from a single and simple graphical user interface. In the current version, three RCC detection schemes are implemented: the recently described %MinMax algorithm and a simplified sliding window approach, along with a novel modification of a linear-time algorithm for the detection of maximally scoring subsequences tailored to the RCC detection problem. Among a number of user tunable parameters, several codon-based scales relevant for RCC detection are available, including tRNA abundance values from Escherichia coli and several codon usage tables from a selection of genomes. Furthermore, useful scale transformations may be performed upon user request (e.g. linear, sigmoid). Users may choose to visualize RCC positions within the submitted sequences either with graphical representations or in textual form for further processing. AVAILABILITY: LaTcOm is freely available online at the URL http://troodos.biol.ucy.ac.cy/latcom.html.


Asunto(s)
Codón , Programas Informáticos , Algoritmos , Análisis por Conglomerados , Escherichia coli/genética , Internet , ARN de Transferencia/metabolismo
17.
NAR Genom Bioinform ; 5(1): lqad025, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36968432

RESUMEN

The nuclear pore complex exhibits different manifestations across eukaryotes, with certain components being restricted to specific clades. Several studies have been conducted to delineate the nuclear pore complex composition in various model organisms. Due to its pivotal role in cell viability, traditional lab experiments, such as gene knockdowns, can prove inconclusive and need to be complemented by a high-quality computational process. Here, using an extensive data collection, we create a robust library of nucleoporin protein sequences and their respective family-specific position-specific scoring matrices. By extensively validating each profile in different settings, we propose that the created profiles can be used to detect nucleoporins in proteomes with high sensitivity and specificity compared to existing methods. This library of profiles and the underlying sequence data can be used for the detection of nucleoporins in target proteomes.

18.
Autophagy ; 19(12): 3189-3200, 2023 12.
Artículo en Inglés | MEDLINE | ID: mdl-37530436

RESUMEN

Several selective macroautophagy receptor and adaptor proteins bind members of the Atg8 (autophagy related 8) family using short linear motifs (SLiMs), most often referred to as Atg8-family interacting motifs (AIMs) or LC3-interacting regions (LIRs). AIM/LIR motifs have been extensively studied during the last fifteen years, since they can uncover the underlying biological mechanisms and possible substrates for this key catabolic process of eukaryotic cells. Prompted by the fact that experimental information regarding LIR motifs can be found scattered across heterogeneous literature resources, we have developed LIRcentral (https://lircentral.eu), a freely available online repository for user-friendly access to comprehensive, high-quality information regarding LIR motifs from manually curated publications. Herein, we describe the development of LIRcentral and showcase currently available data and features, along with our plans for the expansion of this resource. Information incorporated in LIRcentral is useful for accomplishing a variety of research tasks, including: (i) guiding wet biology researchers for the characterization of novel instances of LIR motifs, (ii) giving bioinformaticians/computational biologists access to high-quality LIR motifs for building novel prediction methods for LIR motifs and LIR containing proteins (LIRCPs) and (iii) performing analyses to better understand the biological importance/features of functional LIR motifs. We welcome feedback on the LIRcentral content and functionality by all interested researchers and anticipate this work to spearhead a community effort for sustaining this resource which will further promote progress in studying LIR motifs/LIRCPs.Abbreviations: AIM, Atg8-family interacting motif; Atg8, autophagy related 8; GABARAP, GABA type A receptor-associated protein; LIR, LC3-interacting region; LIRCP, LIR-containing protein; MAP1LC3/LC3, microtubule associated protein 1 light chain 3; PMID, PubMed identifier; PPI, protein-protein interaction; SLiM, short linear motif.


Asunto(s)
Autofagia , Proteínas Asociadas a Microtúbulos , Familia de las Proteínas 8 Relacionadas con la Autofagia/metabolismo , Proteínas Asociadas a Microtúbulos/metabolismo , Autofagia/fisiología , Secuencias de Aminoácidos , Proteínas Portadoras/metabolismo
19.
Bioinformatics ; 27(23): 3327-8, 2011 Dec 01.
Artículo en Inglés | MEDLINE | ID: mdl-21994227

RESUMEN

SUMMARY: BioTextQuest combines automated discovery of significant terms in article clusters with structured knowledge annotation, via Named Entity Recognition services, offering interactive user-friendly visualization. A tag-cloud-based illustration of terms labeling each document cluster are semantically annotated according to the biological entity, and a list of document titles enable users to simultaneously compare terms and documents of each cluster, facilitating concept association and hypothesis generation. BioTextQuest allows customization of analysis parameters, e.g. clustering/stemming algorithms, exclusion of documents/significant terms, to better match the biological question addressed. AVAILABILITY: http://biotextquest.biol.ucy.ac.cy CONTACT: vprobon@ucy.ac.cy; iliopj@med.uoc.gr SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Minería de Datos , Procesamiento de Lenguaje Natural , Algoritmos , Animales , Análisis por Conglomerados , Drosophila/embriología , Drosophila/genética , Internet
20.
Biomolecules ; 12(10)2022 10 15.
Artículo en Inglés | MEDLINE | ID: mdl-36291695

RESUMEN

Intrinsically disordered regions (IDRs) in protein sequences are flexible, have low structural constraints and as a result have faster rates of evolution. This lack of evolutionary conservation greatly limits the use of sequence homology for the classification and functional assessment of IDRs, as opposed to globular domains. The study of IDRs requires other properties for their classification and functional prediction. While composition bias is not a necessary property of IDRs, compositionally biased regions (CBRs) have been noted as frequent part of IDRs. We hypothesized that to characterize IDRs, it could be helpful to study their overlap with particular types of CBRs. Here, we evaluate this overlap in the human proteome. A total of 2/3 of residues in IDRs overlap CBRs. Considering CBRs enriched in one type of amino acid, we can distinguish CBRs that tend to be fully included within long IDRs (R, H, N, D, P, G), from those that partially overlap shorter IDRs (S, E, K, T), and others that tend to overlap IDR terminals (Q, A). CBRs overlap more often IDRs in nuclear proteins and in proteins involved in liquid-liquid phase separation (LLPS). Study of protein interaction networks reveals the enrichment of CBRs in IDRs by tandem repetition of short linear motifs (rich in S or P), and the existence of E-rich polar regions that could support specific protein interactions with non-specific interactions. Our results open ways to pin down the function of IDRs from their partial compositional biases.


Asunto(s)
Proteínas Intrínsecamente Desordenadas , Humanos , Proteínas Intrínsecamente Desordenadas/química , Proteoma , Sesgo , Aminoácidos , Proteínas Nucleares/metabolismo , Conformación Proteica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA