Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
Mais filtros

Base de dados
Tipo de documento
País de afiliação
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D891-D899, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953337

RESUMO

Ensembl (https://www.ensembl.org) is a freely available genomic resource that has produced high-quality annotations, tools, and services for vertebrates and model organisms for more than two decades. In recent years, there has been a dramatic shift in the genomic landscape, with a large increase in the number and phylogenetic breadth of high-quality reference genomes, alongside major advances in the pan-genome representations of higher species. In order to support these efforts and accelerate downstream research, Ensembl continues to focus on scaling for the rapid annotation of new genome assemblies, developing new methods for comparative analysis, and expanding the depth and quality of our genome annotations. This year we have continued our expansion to support global biodiversity research, doubling the number of annotated genomes we support on our Rapid Release site to over 1700, driven by our close collaboration with biodiversity projects such as Darwin Tree of Life. We have also strengthened support for key agricultural species, including the first regulatory builds for farmed animals, and have updated key tools and resources that support the global scientific community, notably the Ensembl Variant Effect Predictor. Ensembl data, software, and tools are freely available.


Assuntos
Bases de Dados Genéticas , Genômica , Animais , Genoma , Anotação de Sequência Molecular , Filogenia , Software , Humanos
2.
Nucleic Acids Res ; 52(D1): D808-D816, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37953350

RESUMO

The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) is a Bioinformatics Resource Center funded by the National Institutes of Health with additional funding from the Wellcome Trust. VEuPathDB supports >600 organisms that comprise invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Since 2004, VEuPathDB has analyzed omics data from the public domain using contemporary bioinformatic workflows, including orthology predictions via OrthoMCL, and integrated the analysis results with analysis tools, visualizations, and advanced search capabilities. The unique data mining platform coupled with >3000 pre-analyzed data sets facilitates the exploration of pertinent omics data in support of hypothesis driven research. Comparisons are easily made across data sets, data types and organisms. A Galaxy workspace offers the opportunity for the analysis of private large-scale datasets and for porting to VEuPathDB for comparisons with integrated data. The MapVEu tool provides a platform for exploration of spatially resolved data such as vector surveillance and insecticide resistance monitoring. To address the growing body of omics data and advances in laboratory techniques, VEuPathDB has added several new data types, searches and features, improved the Galaxy workspace environment, redesigned the MapVEu interface and updated the infrastructure to accommodate these changes.


Assuntos
Biologia Computacional , Eucariotos , Animais , Biologia Computacional/métodos , Invertebrados , Bases de Dados Factuais
3.
Nucleic Acids Res ; 50(D1): D898-D911, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34718728

RESUMO

The Eukaryotic Pathogen, Vector and Host Informatics Resource (VEuPathDB, https://veupathdb.org) represents the 2019 merger of VectorBase with the EuPathDB projects. As a Bioinformatics Resource Center funded by the National Institutes of Health, with additional support from the Welllcome Trust, VEuPathDB supports >500 organisms comprising invertebrate vectors, eukaryotic pathogens (protists and fungi) and relevant free-living or non-pathogenic species or hosts. Designed to empower researchers with access to Omics data and bioinformatic analyses, VEuPathDB projects integrate >1700 pre-analysed datasets (and associated metadata) with advanced search capabilities, visualizations, and analysis tools in a graphic interface. Diverse data types are analysed with standardized workflows including an in-house OrthoMCL algorithm for predicting orthology. Comparisons are easily made across datasets, data types and organisms in this unique data mining platform. A new site-wide search facilitates access for both experienced and novice users. Upgraded infrastructure and workflows support numerous updates to the web interface, tools, searches and strategies, and Galaxy workspace where users can privately analyse their own data. Forthcoming upgrades include cloud-ready application architecture, expanded support for the Galaxy workspace, tools for interrogating host-pathogen interactions, and improved interactions with affiliated databases (ClinEpiDB, MicrobiomeDB) and other scientific resources, and increased interoperability with the Bacterial & Viral BRC.


Assuntos
Bases de Dados Factuais , Vetores de Doenças/classificação , Interações Hospedeiro-Patógeno/genética , Fenótipo , Interface Usuário-Computador , Animais , Apicomplexa/classificação , Apicomplexa/genética , Apicomplexa/patogenicidade , Bactérias/classificação , Bactérias/genética , Bactérias/patogenicidade , Doenças Transmissíveis/microbiologia , Doenças Transmissíveis/parasitologia , Doenças Transmissíveis/patologia , Doenças Transmissíveis/transmissão , Biologia Computacional/métodos , Mineração de Dados/métodos , Diplomonadida/classificação , Diplomonadida/genética , Diplomonadida/patogenicidade , Fungos/classificação , Fungos/genética , Fungos/patogenicidade , Humanos , Insetos/classificação , Insetos/genética , Insetos/patogenicidade , Internet , Nematoides/classificação , Nematoides/genética , Nematoides/patogenicidade , Filogenia , Virulência , Fluxo de Trabalho
4.
Nucleic Acids Res ; 50(D1): D996-D1003, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791415

RESUMO

Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.


Assuntos
Bases de Dados Genéticas , Genômica , Internet , Software , Animais , Biologia Computacional , Genoma Bacteriano/genética , Genoma Fúngico/genética , Genoma de Planta/genética , Plantas/classificação , Plantas/genética , Vertebrados/classificação , Vertebrados/genética
5.
Nucleic Acids Res ; 48(D1): D689-D695, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31598706

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animais , Caenorhabditis elegans/genética , Genômica , Internet , Anotação de Sequência Molecular , Fenótipo , Plantas/genética , Valores de Referência , Software , Interface Usuário-Computador
6.
Nucleic Acids Res ; 46(D1): D802-D808, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29092050

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of programmatic and interactive interfaces to a rich range of data including genome sequence, gene models, transcript sequence, genetic variation, and comparative analysis. This paper provides an update to the previous publications about the resource, with a focus on recent developments and expansions. These include the incorporation of almost 20 000 additional genome sequences and over 35 000 tracks of RNA-Seq data, which have been aligned to genomic sequence and made available for visualization. Other advances since 2015 include the release of the database in Resource Description Framework (RDF) format, a large increase in community-derived curation, a new high-performance protein sequence search, additional cross-references, improved annotation of non-protein-coding genes, and the launch of pre-release and archival sites. Collectively, these changes are part of a continuing response to the increasing quantity of publicly-available genome-scale data, and the consequent need to archive, integrate, annotate and disseminate these using automated, scalable methods.


Assuntos
Archaea/genética , Bactérias/genética , Bases de Dados Genéticas , Bases de Dados de Proteínas , Eucariotos/genética , Genômica , Sequência de Aminoácidos , Animais , Sequência de Bases , Mineração de Dados , Previsões , Genoma , Anotação de Sequência Molecular , RNA/genética , Interface Usuário-Computador
7.
NAR Genom Bioinform ; 6(2): lqae069, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38915823

RESUMO

Microbial specialized metabolite biosynthetic gene clusters (SMBGCs) are a formidable source of natural products of pharmaceutical interest. With the multiplication of genomic data available, very efficient bioinformatic tools for automatic SMBGC detection have been developed. Nevertheless, most of these tools identify SMBGCs based on sequence similarity with enzymes typically involved in specialised metabolism and thus may miss SMBGCs coding for undercharacterised enzymes. Here we present Synteruptor (https://bioi2.i2bc.paris-saclay.fr/synteruptor), a program that identifies genomic islands, known to be enriched in SMBGCs, in the genomes of closely related species. With this tool, we identified a SMBGC in the genome of Streptomyces ambofaciens ATCC23877, undetected by antiSMASH versions prior to antiSMASH 5, and experimentally demonstrated that it directs the biosynthesis of two metabolites, one of which was identified as sphydrofuran. Synteruptor is also a valuable resource for the delineation of individual SMBGCs within antiSMASH regions that may encompass multiple clusters, and for refining the boundaries of these SMBGCs.

8.
J Mol Evol ; 77(3): 70-80, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23979262

RESUMO

Dihydroorotases are universal proteins catalyzing the third step of pyrimidine biosynthesis. These zinc metalloenzymes belong to the superfamily of cyclic amidohydrolases, comprising also other enzymes that are involved in degradation of either purines (allantoinases), pyrimidines (dihydropyrimidinases) or hydantoins (hydantoinases). The evolutionary relationships between these mechanistically related enzymes were estimated after designing a method to build an accurate multiple sequence alignment. The amino acid sequences that have been crystallized were used to build a seed alignment. All the remaining homologues were progressively added by aligning their HMM profiles to the seed HMM profile, allowing to obtain a reliable phylogeny of the superfamily. This helped us to propose a new evolutionary classification of dihydroorotases into three major types, while at the same time disentangling an important part of the history of their complex structure-function relationships. Although differing in their substrate specificity, allantoinases, hydantoinases and dihydropyrimidinases are found to be phylogenetically closer to DHOase Type I than the proximity of the three DHOase types to each other. This suggests that the primordial cyclic amidohydrolase was a multifunctional, highly evolvable generalist, with high conformational diversity allowing for promiscuous activities. Then, successive gene duplications allowed resolving the primordial substrate ambiguity in various substrate specificities. The present-day superfamily of cyclic amidohydrolases is the result of the progressive divergence of these ancestral paralogous copies by descent with modification.


Assuntos
Amidoidrolases/química , Amidoidrolases/classificação , Amidoidrolases/genética , Evolução Molecular , Amidoidrolases/metabolismo , Filogenia , Pirimidinas/biossíntese , Alinhamento de Sequência , Especificidade por Substrato
9.
BMC Genomics ; 9: 501, 2008 Oct 24.
Artigo em Inglês | MEDLINE | ID: mdl-18950477

RESUMO

BACKGROUND: Curated databases of completely sequenced genomes have been designed independently at the NCBI (RefSeq) and EBI (Genome Reviews) to cope with non-standard annotation found in the version of the sequenced genome that has been published by databanks GenBank/EMBL/DDBJ. These curation attempts were expected to review the annotations and to improve their pertinence when using them to annotate newly released genome sequences by homology to previously annotated genomes. However, we observed that such an uncoordinated effort has two unwanted consequences. First, it is not trivial to map the protein identifiers of the same sequence in both databases. Secondly, the two reannotated versions of the same genome differ at the level of their structural annotation. RESULTS: Here, we propose CorBank, a program devised to provide cross-referencing protein identifiers no matter what the level of identity is found between their matching sequences. Approximately 98% of the 1,983,258 amino acid sequences are matching, allowing instantaneous retrieval of their respective cross-references. CorBank further allows detecting any differences between the independently curated versions of the same genome. We found that the RefSeq and Genome Reviews versions are perfectly matching for only 50 of the 641 complete genomes we have analyzed. In all other cases there are differences occurring at the level of the coding sequence (CDS), and/or in the total number of CDS in the respective version of the same genome.CorBank is freely accessible at http://www.corbank.u-psud.fr. The CorBank site contains also updated publication of the exhaustive results obtained by comparing RefSeq and Genome Reviews versions of each genome. Accordingly, this web site allows easy search of cross-references between RefSeq, Genome Reviews, and UniProt, for either a single CDS or a whole replicon. CONCLUSION: CorBank is very efficient in rapid detection of the numerous differences existing between RefSeq and Genome Reviews versions of the same curated genome. Although such differences are acceptable as reflecting different views, we suggest that curators of both genome databases could help reducing further divergence by agreeing on a minimal dialogue and attempting to publish the point of view of the other database whenever it is technically possible.


Assuntos
Biologia Computacional/métodos , Sistemas de Gerenciamento de Base de Dados , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genômica/métodos , Alinhamento de Sequência
10.
BMC Syst Biol ; 7: 99, 2013 Oct 05.
Artigo em Inglês | MEDLINE | ID: mdl-24093154

RESUMO

BACKGROUND: Enzymes belonging to mechanistically diverse superfamilies often display similar catalytic mechanisms. We previously observed such an association in the case of the cyclic amidohydrolase superfamily whose members play a role in related steps of purine and pyrimidine metabolic pathways. To establish a possible link between enzyme homology and chemical similarity, we investigated further the neighbouring steps in the respective pathways. RESULTS: We identified that successive reactions of the purine and pyrimidine pathways display similar chemistry. These mechanistically-related reactions are often catalyzed by homologous enzymes. Detection of series of similar catalysis made by succeeding enzyme families suggested some modularity in the architecture of the central metabolism. Accordingly, we introduce the concept of a reaction module to define at least two successive steps catalyzed by homologous enzymes in pathways alignable by similar chemical reactions. Applying such a concept allowed us to propose new function for misannotated paralogues. In particular, we discovered a putative ureidoglycine carbamoyltransferase (UGTCase) activity. Finally, we present experimental data supporting the conclusion that this UGTCase is likely to be involved in a new route in purine catabolism. CONCLUSIONS: Using the reaction module concept should be of great value. It will help us to trace how the primordial promiscuous enzymes were assembled progressively in functional modules, as the present pathways diverged from ancestral pathways to give birth to the present-day mechanistically diversified superfamilies. In addition, the concept allows the determination of the actual function of misannotated proteins.


Assuntos
Biologia Computacional/métodos , Redes e Vias Metabólicas , Purinas/metabolismo , Carboxil e Carbamoil Transferases/metabolismo , Di-Hidro-Orotato Desidrogenase , Di-Hidrouracila Desidrogenase (NADP)/metabolismo , Glicina/análogos & derivados , Glicina/metabolismo , Oxirredutases atuantes sobre Doadores de Grupo CH-CH/metabolismo , Filogenia , Ureia/análogos & derivados , Ureia/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA