Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 103
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Nature ; 622(7983): 594-602, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37821698

RESUMEN

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Asunto(s)
Metagenoma , Metagenómica , Microbiología , Proteínas , Análisis por Conglomerados , Metagenoma/genética , Metagenómica/métodos , Proteínas/química , Proteínas/clasificación , Proteínas/genética , Bases de Datos de Proteínas , Conformación Proteica
2.
PLoS Comput Biol ; 19(11): e1011498, 2023 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-37934729

RESUMEN

Public-domain availability for bioinformatics software resources is a key requirement that ensures long-term permanence and methodological reproducibility for research and development across the life sciences. These issues are particularly critical for widely used, efficient, and well-proven methods, especially those developed in research settings that often face funding discontinuities. We re-launch a range of established software components for computational genomics, as legacy version 1.0.1, suitable for sequence matching, masking, searching, clustering and visualization for protein family discovery, annotation and functional characterization on a genome scale. These applications are made available online as open source and include MagicMatch, GeneCAST, support scripts for CoGenT-like sequence collections, GeneRAGE and DifFuse, supported by centrally administered bioinformatics infrastructure funding. The toolkit may also be conceived as a flexible genome comparison software pipeline that supports research in this domain. We illustrate basic use by examples and pictorial representations of the registered tools, which are further described with appropriate documentation files in the corresponding GitHub release.


Asunto(s)
Genómica , Programas Informáticos , Reproducibilidad de los Resultados , Genómica/métodos , Biología Computacional/métodos , Genoma
3.
Nucleic Acids Res ; 50(D1): D480-D487, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850135

RESUMEN

The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.


Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/metabolismo , Anotación de Secuencia Molecular , Programas Informáticos , Secuencia de Aminoácidos , ADN/genética , ADN/metabolismo , Conjuntos de Datos como Asunto , Ontología de Genes , Humanos , Internet , Proteínas Intrínsecamente Desordenadas/química , Proteínas Intrínsecamente Desordenadas/genética , Unión Proteica , ARN/genética , ARN/metabolismo
4.
J Mol Evol ; 91(4): 471-481, 2023 08.
Artículo en Inglés | MEDLINE | ID: mdl-37039856

RESUMEN

Selenium-binding proteins represent a ubiquitous protein family and recently SBP1 was described as a new stress response regulator in plants. SBP1 has been characterized as a methanethiol oxidase, however its exact role remains unclear. Moreover, in mammals, it is involved in the regulation of anti-carcinogenic growth and progression as well as reduction/oxidation modulation and detoxification. In this work, we delineate the functional potential of certain motifs of SBP in the context of evolutionary relationships. The phylogenetic profiling approach revealed the absence of SBP in the fungi phylum as well as in most non eukaryotic organisms. The phylogenetic tree also indicates the differentiation and evolution of characteristic SBP motifs. Main evolutionary events concern the CSSC motif for which Acidobacteria, Fungi and Archaea carry modifications. Moreover, the CC motif is harbored by some bacteria and remains conserved in Plants, while modified to CxxC in Animals. Thus, the characteristic sequence motifs of SBPs mainly appeared in Archaea and Bacteria and retained in Animals and Plants. Our results demonstrate the emergence of SBP from bacteria and most likely as a methanethiol oxidase.


Asunto(s)
Proteínas , Proteínas de Unión al Selenio , Animales , Proteínas de Unión al Selenio/genética , Proteínas de Unión al Selenio/metabolismo , Filogenia , Bacterias/genética , Bacterias/metabolismo , Archaea/genética , Archaea/metabolismo , Plantas , Oxidorreductasas/genética , Mamíferos/metabolismo
5.
Brief Bioinform ; 21(2): 458-472, 2020 03 23.
Artículo en Inglés | MEDLINE | ID: mdl-30698641

RESUMEN

There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. SHORT ABSTRACT: There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.


Asunto(s)
Proteínas/química , Algoritmos , Secuencia de Aminoácidos , Bases de Datos de Proteínas , Evolución Molecular , Conformación Proteica , Dominios Proteicos
6.
EMBO Rep ; 21(4): e50388, 2020 04 03.
Artículo en Inglés | MEDLINE | ID: mdl-32216085

RESUMEN

University accountants and administrators should support scientists going to meetings, not further burden them with bureaucratic hurdles, expense claims or unnecessary auditing.


Asunto(s)
Viaje , Humanos
7.
Environ Res ; 207: 112183, 2022 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-34637759

RESUMEN

In urban ecosystems, microbes play a key role in maintaining major ecological functions that directly support human health and city life. However, the knowledge about the species composition and functions involved in urban environments is still limited, which is largely due to the lack of reference genomes in metagenomic studies comprises more than half of unclassified reads. Here we uncovered 732 novel bacterial species from 4728 samples collected from various common surface with the matching materials in the mass transit system across 60 cities by the MetaSUB Consortium. The number of novel species is significantly and positively correlated with the city population, and more novel species can be identified in the skin-associated samples. The in-depth analysis of the new gene catalog showed that the functional terms have a significant geographical distinguishability. Moreover, we revealed that more biosynthetic gene clusters (BGCs) can be found in novel species. The co-occurrence relationship between BGCs and genera and the geographical specificity of BGCs can also provide us more information for the synthesis pathways of natural products. Expanded the known urban microbiome diversity and suggested additional mechanisms for taxonomic and functional characterization of the urban microbiome. Considering the great impact of urban microbiomes on human life, our study can also facilitate the microbial interaction analysis between human and urban environment.


Asunto(s)
Metagenoma , Microbiota , Bacterias/genética , Humanos , Metagenómica , Interacciones Microbianas , Microbiota/genética
8.
Nucleic Acids Res ; 48(D1): D269-D276, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31713636

RESUMEN

The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.


Asunto(s)
Bases de Datos de Proteínas , Proteínas Intrínsecamente Desordenadas/química , Ontologías Biológicas , Curaduría de Datos , Anotación de Secuencia Molecular
9.
Nucleic Acids Res ; 46(6): e33, 2018 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-29315405

RESUMEN

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein-protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL's scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Biología Computacional/métodos , Redes Reguladoras de Genes , Cadenas de Markov , Expresión Génica , Mapas de Interacción de Proteínas/genética
10.
Bioinformatics ; 33(9): 1418-1420, 2017 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-28453679

RESUMEN

Summary: BioPAXViz is a Cytoscape (version 3) application, providing a comprehensive framework for metabolic pathway visualization. Beyond the basic parsing, viewing and browsing roles, the main novel function that BioPAXViz provides is a visual comparative analysis of metabolic pathway topologies across pre-computed pathway phylogenomic profiles given a species phylogeny. Furthermore, BioPAXViz supports the display of hierarchical trees that allow efficient navigation through sets of variants of a single reference pathway. Thus, BioPAXViz can significantly facilitate, and contribute to, the study of metabolic pathway evolution and engineering. Availability and Implementation: BioPAXViz has been developed as a Cytoscape app and is available at: https://github.com/CGU-CERTH/BioPAX.Viz. The software is distributed under the MIT License and is accompanied by example files and data. Additional documentation is available at the aforementioned GitHub repository. Contact: ouzounis@certh.gr.


Asunto(s)
Biología Computacional/métodos , Evolución Molecular , Redes y Vías Metabólicas/genética , Programas Informáticos , Filogenia
11.
PLoS Biol ; 12(8): e1001920, 2014 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-25093819

RESUMEN

Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.


Asunto(s)
Genoma Arqueal/genética , Genoma Bacteriano/genética , Genómica , Análisis de Secuencia de ADN , Archaea/clasificación , Archaea/genética , Bacterias/clasificación , Bacterias/genética , Bases de Datos Genéticas , Filogenia
12.
BMC Bioinformatics ; 17(1): 212, 2016 May 11.
Artículo en Inglés | MEDLINE | ID: mdl-27170263

RESUMEN

BACKGROUND: The underlying molecular processes representing stress responses to low-dose ionising radiation (LDIR) in mammals are just beginning to be understood. In particular, LDIR effects on the brain and their possible association with neurodegenerative disease are currently being explored using omics technologies. RESULTS: We describe a light-weight approach for the storage, analysis and distribution of relevant LDIR omics datasets. The data integration platform, called BRIDE, contains information from the literature as well as experimental information from transcriptomics and proteomics studies. It deploys a hybrid, distributed solution using both local storage and cloud technology. CONCLUSIONS: BRIDE can act as a knowledge broker for LDIR researchers, to facilitate molecular research on the systems biology of LDIR response in mammals. Its flexible design can capture a range of experimental information for genomics, epigenomics, transcriptomics, and proteomics. The data collection is available at: .


Asunto(s)
Encéfalo/efectos de la radiación , Radiación Ionizante , Investigación , Programas Informáticos , Relación Dosis-Respuesta en la Radiación , Humanos
13.
Brief Bioinform ; 15(3): 443-54, 2014 May.
Artículo en Inglés | MEDLINE | ID: mdl-23220349

RESUMEN

More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.


Asunto(s)
Biología Computacional/métodos , Fusión Génica , Animales , Genes Fúngicos , Genoma Humano , Genómica , Humanos , Filogenia , Mapeo de Interacción de Proteínas/estadística & datos numéricos , Proteómica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
14.
Bioinformatics ; 36(9): 2963-2965, 2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32129821
15.
Bioinformatics ; 30(22): 3249-56, 2014 Nov 15.
Artículo en Inglés | MEDLINE | ID: mdl-25100685

RESUMEN

SUMMARY: The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. AVAILABILITY: The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. CONTACT: g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Minería de Datos/métodos , Programas Informáticos , Autoria , Análisis por Conglomerados , Enfermedad/genética , Genes , Humanos , Internet , Medical Subject Headings , Proteínas , PubMed , Publicaciones
17.
Bioinform Adv ; 4(1): vbae069, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38799705

RESUMEN

Summary: We explore the nuanced temporal and epistemological distinctions among natural sciences, particularly the contrasting treatment of time and the interplay between theory and experimentation. Physics, an exemplar of mature science, relies on theoretical models for predictability and simulations. In contrast, biology, traditionally experimental, is witnessing a computational surge, with data analytics and simulations reshaping its research paradigms. Despite these strides, a unified theoretical framework in biology remains elusive. We propose that contemporary global challenges might usher in a renewed emphasis, presenting an opportunity for the establishment of a novel theoretical underpinning for the life sciences. Availability and implementation: https://github.com/ouzounis/CLS-emerges Data in Json format, Images in PNG format.

18.
Biosystems ; 239: 105199, 2024 May.
Artículo en Inglés | MEDLINE | ID: mdl-38641198

RESUMEN

Over the past quarter-century, the field of evolutionary biology has been transformed by the emergence of complete genome sequences and the conceptual framework known as the 'Net of Life.' This paradigm shift challenges traditional notions of evolution as a tree-like process, emphasizing the complex, interconnected network of gene flow that may blur the boundaries between distinct lineages. In this context, gene loss, rather than horizontal gene transfer, is the primary driver of gene content, with vertical inheritance playing a principal role. The 'Net of Life' not only impacts our understanding of genome evolution but also has profound implications for classification systems, the rapid appearance of new traits, and the spread of diseases. Here, we explore the core tenets of the 'Net of Life' and its implications for genome-scale phylogenetic divergence, providing a comprehensive framework for further investigations in evolutionary biology.


Asunto(s)
Evolución Molecular , Flujo Génico , Genoma , Filogenia , Genoma/genética , Animales , Humanos , Transferencia de Gen Horizontal , Modelos Genéticos , Evolución Biológica
19.
PLoS Comput Biol ; 8(4): e1002487, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22570600

RESUMEN

The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself.


Asunto(s)
Biología Computacional/tendencias , Predicción
20.
NAR Genom Bioinform ; 5(1): lqad025, 2023 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-36968432

RESUMEN

The nuclear pore complex exhibits different manifestations across eukaryotes, with certain components being restricted to specific clades. Several studies have been conducted to delineate the nuclear pore complex composition in various model organisms. Due to its pivotal role in cell viability, traditional lab experiments, such as gene knockdowns, can prove inconclusive and need to be complemented by a high-quality computational process. Here, using an extensive data collection, we create a robust library of nucleoporin protein sequences and their respective family-specific position-specific scoring matrices. By extensively validating each profile in different settings, we propose that the created profiles can be used to detect nucleoporins in proteomes with high sensitivity and specificity compared to existing methods. This library of profiles and the underlying sequence data can be used for the detection of nucleoporins in target proteomes.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA