Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 119
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 622(7983): 594-602, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37821698

RESUMO

Metagenomes encode an enormous diversity of proteins, reflecting a multiplicity of functions and activities1,2. Exploration of this vast sequence space has been limited to a comparative analysis against reference microbial genomes and protein families derived from those genomes. Here, to examine the scale of yet untapped functional diversity beyond what is currently possible through the lens of reference genomes, we develop a computational approach to generate reference-free protein families from the sequence space in metagenomes. We analyse 26,931 metagenomes and identify 1.17 billion protein sequences longer than 35 amino acids with no similarity to any sequences from 102,491 reference genomes or the Pfam database3. Using massively parallel graph-based clustering, we group these proteins into 106,198 novel sequence clusters with more than 100 members, doubling the number of protein families obtained from the reference genomes clustered using the same approach. We annotate these families on the basis of their taxonomic, habitat, geographical and gene neighbourhood distributions and, where sufficient sequence diversity is available, predict protein three-dimensional models, revealing novel structures. Overall, our results uncover an enormously diverse functional space, highlighting the importance of further exploring the microbial functional dark matter.


Assuntos
Metagenoma , Metagenômica , Microbiologia , Proteínas , Análise por Conglomerados , Metagenoma/genética , Metagenômica/métodos , Proteínas/química , Proteínas/classificação , Proteínas/genética , Bases de Dados de Proteínas , Conformação Proteica
2.
PLoS Comput Biol ; 19(11): e1011498, 2023 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-37934729

RESUMO

Public-domain availability for bioinformatics software resources is a key requirement that ensures long-term permanence and methodological reproducibility for research and development across the life sciences. These issues are particularly critical for widely used, efficient, and well-proven methods, especially those developed in research settings that often face funding discontinuities. We re-launch a range of established software components for computational genomics, as legacy version 1.0.1, suitable for sequence matching, masking, searching, clustering and visualization for protein family discovery, annotation and functional characterization on a genome scale. These applications are made available online as open source and include MagicMatch, GeneCAST, support scripts for CoGenT-like sequence collections, GeneRAGE and DifFuse, supported by centrally administered bioinformatics infrastructure funding. The toolkit may also be conceived as a flexible genome comparison software pipeline that supports research in this domain. We illustrate basic use by examples and pictorial representations of the registered tools, which are further described with appropriate documentation files in the corresponding GitHub release.


Assuntos
Genômica , Software , Reprodutibilidade dos Testes , Genômica/métodos , Biologia Computacional/métodos , Genoma
3.
Nucleic Acids Res ; 50(D1): D480-D487, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34850135

RESUMO

The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.


Assuntos
Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas/metabolismo , Anotação de Sequência Molecular , Software , Sequência de Aminoácidos , DNA/genética , DNA/metabolismo , Conjuntos de Dados como Assunto , Ontologia Genética , Humanos , Internet , Proteínas Intrinsicamente Desordenadas/química , Proteínas Intrinsicamente Desordenadas/genética , Ligação Proteica , RNA/genética , RNA/metabolismo
4.
J Mol Evol ; 91(4): 471-481, 2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37039856

RESUMO

Selenium-binding proteins represent a ubiquitous protein family and recently SBP1 was described as a new stress response regulator in plants. SBP1 has been characterized as a methanethiol oxidase, however its exact role remains unclear. Moreover, in mammals, it is involved in the regulation of anti-carcinogenic growth and progression as well as reduction/oxidation modulation and detoxification. In this work, we delineate the functional potential of certain motifs of SBP in the context of evolutionary relationships. The phylogenetic profiling approach revealed the absence of SBP in the fungi phylum as well as in most non eukaryotic organisms. The phylogenetic tree also indicates the differentiation and evolution of characteristic SBP motifs. Main evolutionary events concern the CSSC motif for which Acidobacteria, Fungi and Archaea carry modifications. Moreover, the CC motif is harbored by some bacteria and remains conserved in Plants, while modified to CxxC in Animals. Thus, the characteristic sequence motifs of SBPs mainly appeared in Archaea and Bacteria and retained in Animals and Plants. Our results demonstrate the emergence of SBP from bacteria and most likely as a methanethiol oxidase.


Assuntos
Proteínas , Proteínas de Ligação a Selênio , Animais , Proteínas de Ligação a Selênio/genética , Proteínas de Ligação a Selênio/metabolismo , Filogenia , Bactérias/genética , Bactérias/metabolismo , Archaea/genética , Archaea/metabolismo , Plantas , Oxirredutases/genética , Mamíferos/metabolismo
5.
Brief Bioinform ; 21(2): 458-472, 2020 03 23.
Artigo em Inglês | MEDLINE | ID: mdl-30698641

RESUMO

There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, and more generally the overlaps between different properties related to LCRs, using examples. We argue that statistical measures alone cannot capture all structural aspects of LCRs and recommend the combined usage of a variety of predictive tools and measurements. While the methodologies available to study LCRs are already very advanced, we foresee that a more comprehensive annotation of sequences in the databases will enable the improvement of predictions and a better understanding of the evolution and the connection between structure and function of LCRs. This will require the use of standards for the generation and exchange of data describing all aspects of LCRs. SHORT ABSTRACT: There are multiple definitions for low complexity regions (LCRs) in protein sequences. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichotomy, plus overlaps between different properties related to LCRs, using examples.


Assuntos
Proteínas/química , Algoritmos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Evolução Molecular , Conformação Proteica , Domínios Proteicos
6.
EMBO Rep ; 21(4): e50388, 2020 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-32216085

RESUMO

University accountants and administrators should support scientists going to meetings, not further burden them with bureaucratic hurdles, expense claims or unnecessary auditing.


Assuntos
Viagem , Humanos
7.
Environ Res ; 207: 112183, 2022 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-34637759

RESUMO

In urban ecosystems, microbes play a key role in maintaining major ecological functions that directly support human health and city life. However, the knowledge about the species composition and functions involved in urban environments is still limited, which is largely due to the lack of reference genomes in metagenomic studies comprises more than half of unclassified reads. Here we uncovered 732 novel bacterial species from 4728 samples collected from various common surface with the matching materials in the mass transit system across 60 cities by the MetaSUB Consortium. The number of novel species is significantly and positively correlated with the city population, and more novel species can be identified in the skin-associated samples. The in-depth analysis of the new gene catalog showed that the functional terms have a significant geographical distinguishability. Moreover, we revealed that more biosynthetic gene clusters (BGCs) can be found in novel species. The co-occurrence relationship between BGCs and genera and the geographical specificity of BGCs can also provide us more information for the synthesis pathways of natural products. Expanded the known urban microbiome diversity and suggested additional mechanisms for taxonomic and functional characterization of the urban microbiome. Considering the great impact of urban microbiomes on human life, our study can also facilitate the microbial interaction analysis between human and urban environment.


Assuntos
Metagenoma , Microbiota , Bactérias/genética , Humanos , Metagenômica , Interações Microbianas , Microbiota/genética
8.
Nucleic Acids Res ; 48(D1): D269-D276, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31713636

RESUMO

The Database of Protein Disorder (DisProt, URL: https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.


Assuntos
Bases de Dados de Proteínas , Proteínas Intrinsicamente Desordenadas/química , Ontologias Biológicas , Curadoria de Dados , Anotação de Sequência Molecular
9.
Nucleic Acids Res ; 46(6): e33, 2018 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-29315405

RESUMO

Biological networks capture structural or functional properties of relevant entities such as molecules, proteins or genes. Characteristic examples are gene expression networks or protein-protein interaction networks, which hold information about functional affinities or structural similarities. Such networks have been expanding in size due to increasing scale and abundance of biological data. While various clustering algorithms have been proposed to find highly connected regions, Markov Clustering (MCL) has been one of the most successful approaches to cluster sequence similarity or expression networks. Despite its popularity, MCL's scalability to cluster large datasets still remains a bottleneck due to high running times and memory demands. Here, we present High-performance MCL (HipMCL), a parallel implementation of the original MCL algorithm that can run on distributed-memory computers. We show that HipMCL can efficiently utilize 2000 compute nodes and cluster a network of ∼70 million nodes with ∼68 billion edges in ∼2.4 h. By exploiting distributed-memory environments, HipMCL clusters large-scale networks several orders of magnitude faster than MCL and enables clustering of even bigger networks. HipMCL is based on MPI and OpenMP and is freely available under a modified BSD license.


Assuntos
Algoritmos , Análise por Conglomerados , Biologia Computacional/métodos , Redes Reguladoras de Genes , Cadeias de Markov , Expressão Gênica , Mapas de Interação de Proteínas/genética
10.
Bioinformatics ; 33(9): 1418-1420, 2017 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-28453679

RESUMO

Summary: BioPAXViz is a Cytoscape (version 3) application, providing a comprehensive framework for metabolic pathway visualization. Beyond the basic parsing, viewing and browsing roles, the main novel function that BioPAXViz provides is a visual comparative analysis of metabolic pathway topologies across pre-computed pathway phylogenomic profiles given a species phylogeny. Furthermore, BioPAXViz supports the display of hierarchical trees that allow efficient navigation through sets of variants of a single reference pathway. Thus, BioPAXViz can significantly facilitate, and contribute to, the study of metabolic pathway evolution and engineering. Availability and Implementation: BioPAXViz has been developed as a Cytoscape app and is available at: https://github.com/CGU-CERTH/BioPAX.Viz. The software is distributed under the MIT License and is accompanied by example files and data. Additional documentation is available at the aforementioned GitHub repository. Contact: ouzounis@certh.gr.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Redes e Vias Metabólicas/genética , Software , Filogenia
11.
PLoS Biol ; 12(8): e1001920, 2014 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-25093819

RESUMO

Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.


Assuntos
Genoma Arqueal/genética , Genoma Bacteriano/genética , Genômica , Análise de Sequência de DNA , Archaea/classificação , Archaea/genética , Bactérias/classificação , Bactérias/genética , Bases de Dados Genéticas , Filogenia
12.
BMC Bioinformatics ; 17(1): 212, 2016 May 11.
Artigo em Inglês | MEDLINE | ID: mdl-27170263

RESUMO

BACKGROUND: The underlying molecular processes representing stress responses to low-dose ionising radiation (LDIR) in mammals are just beginning to be understood. In particular, LDIR effects on the brain and their possible association with neurodegenerative disease are currently being explored using omics technologies. RESULTS: We describe a light-weight approach for the storage, analysis and distribution of relevant LDIR omics datasets. The data integration platform, called BRIDE, contains information from the literature as well as experimental information from transcriptomics and proteomics studies. It deploys a hybrid, distributed solution using both local storage and cloud technology. CONCLUSIONS: BRIDE can act as a knowledge broker for LDIR researchers, to facilitate molecular research on the systems biology of LDIR response in mammals. Its flexible design can capture a range of experimental information for genomics, epigenomics, transcriptomics, and proteomics. The data collection is available at: .


Assuntos
Encéfalo/efeitos da radiação , Radiação Ionizante , Pesquisa , Software , Relação Dose-Resposta à Radiação , Humanos
13.
Brief Bioinform ; 15(3): 443-54, 2014 May.
Artigo em Inglês | MEDLINE | ID: mdl-23220349

RESUMO

More than a decade ago, a number of methods were proposed for the inference of protein interactions, using whole-genome information from gene clusters, gene fusions and phylogenetic profiles. This structural and evolutionary view of entire genomes has provided a valuable approach for the functional characterization of proteins, especially those without sequence similarity to proteins of known function. Furthermore, this view has raised the real possibility to detect functional associations of genes and their corresponding proteins for any entire genome sequence. Yet, despite these exciting developments, there have been relatively few cases of real use of these methods outside the computational biology field, as reflected from citation analysis. These methods have the potential to be used in high-throughput experimental settings in functional genomics and proteomics to validate results with very high accuracy and good coverage. In this critical survey, we provide a comprehensive overview of 30 most prominent examples of single pairwise protein interaction cases in small-scale studies, where protein interactions have either been detected by gene fusion or yielded additional, corroborating evidence from biochemical observations. Our conclusion is that with the derivation of a validated gold-standard corpus and better data integration with big experiments, gene fusion detection can truly become a valuable tool for large-scale experimental biology.


Assuntos
Biologia Computacional/métodos , Fusão Gênica , Animais , Genes Fúngicos , Genoma Humano , Genômica , Humanos , Filogenia , Mapeamento de Interação de Proteínas/estatística & dados numéricos , Proteômica , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
14.
Bioinformatics ; 36(9): 2963-2965, 2020 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-32129821
15.
Bioinformatics ; 30(22): 3249-56, 2014 Nov 15.
Artigo em Inglês | MEDLINE | ID: mdl-25100685

RESUMO

SUMMARY: The iterative process of finding relevant information in biomedical literature and performing bioinformatics analyses might result in an endless loop for an inexperienced user, considering the exponential growth of scientific corpora and the plethora of tools designed to mine PubMed(®) and related biological databases. Herein, we describe BioTextQuest(+), a web-based interactive knowledge exploration platform with significant advances to its predecessor (BioTextQuest), aiming to bridge processes such as bioentity recognition, functional annotation, document clustering and data integration towards literature mining and concept discovery. BioTextQuest(+) enables PubMed and OMIM querying, retrieval of abstracts related to a targeted request and optimal detection of genes, proteins, molecular functions, pathways and biological processes within the retrieved documents. The front-end interface facilitates the browsing of document clustering per subject, the analysis of term co-occurrence, the generation of tag clouds containing highly represented terms per cluster and at-a-glance popup windows with information about relevant genes and proteins. Moreover, to support experimental research, BioTextQuest(+) addresses integration of its primary functionality with biological repositories and software tools able to deliver further bioinformatics services. The Google-like interface extends beyond simple use by offering a range of advanced parameterization for expert users. We demonstrate the functionality of BioTextQuest(+) through several exemplary research scenarios including author disambiguation, functional term enrichment, knowledge acquisition and concept discovery linking major human diseases, such as obesity and ageing. AVAILABILITY: The service is accessible at http://bioinformatics.med.uoc.gr/biotextquest. CONTACT: g.pavlopoulos@gmail.com or georgios.pavlopoulos@esat.kuleuven.be SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Mineração de Dados/métodos , Software , Autoria , Análise por Conglomerados , Doença/genética , Genes , Humanos , Internet , Medical Subject Headings , Proteínas , PubMed , Publicações
17.
Biosystems ; 239: 105199, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38641198

RESUMO

Over the past quarter-century, the field of evolutionary biology has been transformed by the emergence of complete genome sequences and the conceptual framework known as the 'Net of Life.' This paradigm shift challenges traditional notions of evolution as a tree-like process, emphasizing the complex, interconnected network of gene flow that may blur the boundaries between distinct lineages. In this context, gene loss, rather than horizontal gene transfer, is the primary driver of gene content, with vertical inheritance playing a principal role. The 'Net of Life' not only impacts our understanding of genome evolution but also has profound implications for classification systems, the rapid appearance of new traits, and the spread of diseases. Here, we explore the core tenets of the 'Net of Life' and its implications for genome-scale phylogenetic divergence, providing a comprehensive framework for further investigations in evolutionary biology.


Assuntos
Evolução Molecular , Fluxo Gênico , Genoma , Animais , Humanos , Transferência Genética Horizontal , Genoma/genética , Modelos Genéticos , Filogenia
18.
Bioinform Adv ; 4(1): vbae069, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38799705

RESUMO

Summary: We explore the nuanced temporal and epistemological distinctions among natural sciences, particularly the contrasting treatment of time and the interplay between theory and experimentation. Physics, an exemplar of mature science, relies on theoretical models for predictability and simulations. In contrast, biology, traditionally experimental, is witnessing a computational surge, with data analytics and simulations reshaping its research paradigms. Despite these strides, a unified theoretical framework in biology remains elusive. We propose that contemporary global challenges might usher in a renewed emphasis, presenting an opportunity for the establishment of a novel theoretical underpinning for the life sciences. Availability and implementation: https://github.com/ouzounis/CLS-emerges Data in Json format, Images in PNG format.

19.
Microbiology (Reading) ; 159(Pt 4): 757-770, 2013 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-23429746

RESUMO

Continuous updating of the genome sequence of Bacillus subtilis, the model of the Firmicutes, is a basic requirement needed by the biology community. In this work new genomic objects have been included (toxin/antitoxin genes and small RNA genes) and the metabolic network has been entirely updated. The curated view of the validated metabolic pathways present in the organism as of 2012 shows several significant differences from pathways present in the other bacterial reference, Escherichia coli: variants in synthesis of cofactors (thiamine, biotin, bacillithiol), amino acids (lysine, methionine), branched-chain fatty acids, tRNA modification and RNA degradation. In this new version, gene products that are enzymes or transporters are explicitly linked to the biochemical reactions of the RHEA reaction resource (http://www.ebi.ac.uk/rhea/), while novel compound entries have been created in the database Chemical Entities of Biological Interest (http://www.ebi.ac.uk/chebi/). The newly annotated sequence is deposited at the International Nucleotide Sequence Data Collaboration with accession number AL009126.4.


Assuntos
Bacillus subtilis/metabolismo , Proteínas de Bactérias/metabolismo , Genoma Bacteriano , Redes e Vias Metabólicas/genética , Bacillus subtilis/genética , Proteínas de Bactérias/genética , Genômica , Anotação de Sequência Molecular , Dados de Sequência Molecular , Análise de Sequência de DNA
20.
PLoS Comput Biol ; 8(4): e1002487, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22570600

RESUMO

The field of bioinformatics and computational biology has gone through a number of transformations during the past 15 years, establishing itself as a key component of new biology. This spectacular growth has been challenged by a number of disruptive changes in science and technology. Despite the apparent fatigue of the linguistic use of the term itself, bioinformatics has grown perhaps to a point beyond recognition. We explore both historical aspects and future trends and argue that as the field expands, key questions remain unanswered and acquire new meaning while at the same time the range of applications is widening to cover an ever increasing number of biological disciplines. These trends appear to be pointing to a redefinition of certain objectives, milestones, and possibly the field itself.


Assuntos
Biologia Computacional/tendências , Previsões
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA