Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Bioinform Adv ; 3(1): vbad064, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37359723

RESUMO

Motivation: The visualization of biological data is a fundamental technique that enables researchers to understand and explain biology. Some of these visualizations have become iconic, for instance: tree views for taxonomy, cartoon rendering of 3D protein structures or tracks to represent features in a gene or protein, for instance in a genome browser. Nightingale provides visualizations in the context of proteins and protein features. Results: Nightingale is a library of re-usable data visualization web components that are currently used by UniProt and InterPro, among other projects. The components can be used to display protein sequence features, variants, interaction data, 3D structure, etc. These components are flexible, allowing users to easily view multiple data sources within the same context, as well as compose these components to create a customized view. Availability and implementation: Nightingale examples and documentation are freely available at https://ebi-webcomponents.github.io/nightingale/. It is distributed under the MIT license, and its source code can be found at https://github.com/ebi-webcomponents/nightingale.

2.
J Mol Biol ; 435(14): 168016, 2023 07 15.
Artigo em Inglês | MEDLINE | ID: mdl-36806692

RESUMO

An increasingly common output arising from the analysis of shotgun metagenomic datasets is the generation of metagenome-assembled genomes (MAGs), with tens of thousands of MAGs now described in the literature. However, the discovery and comparison of these MAG collections is hampered by the lack of uniformity in their generation, annotation and storage. To address this, we have developed MGnify Genomes, a growing collection of biome-specific non-redundant microbial genome catalogues generated using MAGs and publicly available isolate genomes. Genomes within a biome-specific catalogue are organised into species clusters. For species that contain multiple conspecific genomes, the highest quality genome is selected as the representative, always prioritising an isolate genome over a MAG. The species representative sequences and annotations can be visualised on the MGnify website and the full catalogue and associated analysis outputs can be downloaded from MGnify servers. A suite of online search tools is provided allowing users to compare their own sequences, ranging from a gene to sets of genomes, against the catalogues. Seven biomes are available currently, comprising over 300,000 genomes that represent 11,048 non-redundant species, and include 36 taxonomic classes not currently represented by cultured genomes. MGnify Genomes is available at https://www.ebi.ac.uk/metagenomics/browse/genomes/.


Assuntos
Genoma Microbiano , Metagenoma , Metagenoma/genética , Metagenômica
3.
Nucleic Acids Res ; 51(D1): D418-D427, 2023 01 06.
Artigo em Inglês | MEDLINE | ID: mdl-36350672

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. Here, we report recent developments with InterPro (version 90.0) and its associated software, including updates to data content and to the website. These developments extend and enrich the information provided by InterPro, and provide a more user friendly access to the data. Additionally, we have worked on adding Pfam website features to the InterPro website, as the Pfam website will be retired in late 2022. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB. Moreover, we report the development of a card game as a method of engaging the non-scientific community. Finally, we discuss the benefits and challenges brought by the use of artificial intelligence for protein structure prediction.


Assuntos
Bases de Dados de Proteínas , Humanos , Sequência de Aminoácidos , Inteligência Artificial , Internet , Proteínas/química , Software
4.
Nucleic Acids Res ; 49(D1): D344-D354, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33156333

RESUMO

The InterPro database (https://www.ebi.ac.uk/interpro/) provides an integrative classification of protein sequences into families, and identifies functionally important domains and conserved sites. InterProScan is the underlying software that allows protein and nucleic acid sequences to be searched against InterPro's signatures. Signatures are predictive models which describe protein families, domains or sites, and are provided by multiple databases. InterPro combines signatures representing equivalent families, domains or sites, and provides additional information such as descriptions, literature references and Gene Ontology (GO) terms, to produce a comprehensive resource for protein classification. Founded in 1999, InterPro has become one of the most widely used resources for protein family annotation. Here, we report the status of InterPro (version 81.0) in its 20th year of operation, and its associated software, including updates to database content, the release of a new website and REST API, and performance improvements in InterProScan.


Assuntos
Bases de Dados de Proteínas , Proteínas/química , Sequência de Aminoácidos , COVID-19/metabolismo , Internet , Anotação de Sequência Molecular , Domínios Proteicos , Mapas de Interação de Proteínas , SARS-CoV-2/metabolismo , Alinhamento de Sequência
5.
Nucleic Acids Res ; 49(D1): D412-D419, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33125078

RESUMO

The Pfam database is a widely used resource for classifying protein sequences into families and domains. Since Pfam was last described in this journal, over 350 new families have been added in Pfam 33.1 and numerous improvements have been made to existing entries. To facilitate research on COVID-19, we have revised the Pfam entries that cover the SARS-CoV-2 proteome, and built new entries for regions that were not covered by Pfam. We have reintroduced Pfam-B which provides an automatically generated supplement to Pfam and contains 136 730 novel clusters of sequences that are not yet matched by a Pfam family. The new Pfam-B is based on a clustering by the MMseqs2 software. We have compared all of the regions in the RepeatsDB to those in Pfam and have started to use the results to build and refine Pfam repeat families. Pfam is freely available for browsing and download at http://pfam.xfam.org/.


Assuntos
Biologia Computacional/estatística & dados numéricos , Bases de Dados de Proteínas , Proteínas/metabolismo , Proteoma/metabolismo , Animais , COVID-19/epidemiologia , COVID-19/prevenção & controle , COVID-19/virologia , Biologia Computacional/métodos , Epidemias , Humanos , Internet , Modelos Moleculares , Estrutura Terciária de Proteína , Proteínas/química , Proteínas/genética , Proteoma/classificação , Proteoma/genética , Sequências Repetitivas de Aminoácidos/genética , SARS-CoV-2/genética , SARS-CoV-2/fisiologia , Análise de Sequência de Proteína/métodos
6.
Nucleic Acids Res ; 48(D1): D314-D319, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31733063

RESUMO

Genome3D (https://www.genome3d.eu) is a freely available resource that provides consensus structural annotations for representative protein sequences taken from a selection of model organisms. Since the last NAR update in 2015, the method of data submission has been overhauled, with annotations now being 'pushed' to the database via an API. As a result, contributing groups are now able to manage their own structural annotations, making the resource more flexible and maintainable. The new submission protocol brings a number of additional benefits including: providing instant validation of data and avoiding the requirement to synchronise releases between resources. It also makes it possible to implement the submission of these structural annotations as an automated part of existing internal workflows. In turn, these improvements facilitate Genome3D being opened up to new prediction algorithms and groups. For the latest release of Genome3D (v2.1), the underlying dataset of sequences used as prediction targets has been updated using the latest reference proteomes available in UniProtKB. A number of new reference proteomes have also been added of particular interest to the wider scientific community: cow, pig, wheat and mycobacterium tuberculosis. These additions, along with improvements to the underlying predictions from contributing resources, has ensured that the number of annotations in Genome3D has nearly doubled since the last NAR update article. The new API has also been used to facilitate the dissemination of Genome3D data into InterPro, thereby widening the visibility of both the annotation data and annotation algorithms.


Assuntos
Proteínas/química , Bases de Dados de Proteínas , Proteínas/classificação , Proteínas/genética , Interface Usuário-Computador
7.
Nucleic Acids Res ; 47(D1): D351-D360, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30398656

RESUMO

The InterPro database (http://www.ebi.ac.uk/interpro/) classifies protein sequences into families and predicts the presence of functionally important domains and sites. Here, we report recent developments with InterPro (version 70.0) and its associated software, including an 18% growth in the size of the database in terms on new InterPro entries, updates to content, the inclusion of an additional entry type, refined modelling of discontinuous domains, and the development of a new programmatic interface and website. These developments extend and enrich the information provided by InterPro, and provide greater flexibility in terms of data access. We also show that InterPro's sequence coverage has kept pace with the growth of UniProtKB, and discuss how our evaluation of residue coverage may help guide future curation activities.


Assuntos
Bases de Dados de Proteínas , Anotação de Sequência Molecular , Animais , Bases de Dados Genéticas , Ontologia Genética , Humanos , Internet , Família Multigênica , Domínios Proteicos/genética , Homologia de Sequência de Aminoácidos , Software , Interface Usuário-Computador
8.
Nucleic Acids Res ; 47(D1): D564-D572, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30364992

RESUMO

Automatic annotation of protein function is routinely applied to newly sequenced genomes. While this provides a fine-grained view of an organism's functional protein repertoire, proteins, more commonly function in a coordinated manner, such as in pathways or multimeric complexes. Genome Properties (GPs) define such functional entities as a series of steps, originally described by either TIGRFAMs or Pfam entries. To increase the scope of coverage, we have migrated GPs to function as a companion resource utilizing InterPro entries. Having introduced GPs-specific versioned releases, we provide software and data via a GitHub repository, and have developed a new web interface to GPs (available at https://www.ebi.ac.uk/interpro/genomeproperties). In addition to exploring each of the 1286 GPs, the website contains GPs pre-calculated for a representative set of proteomes; these results can be used to profile GPs phylogenetically via an interactive viewer. Users can upload novel data to the viewer for comparison with the pre-calculated results. Over the last year, we have added ∼700 new GPs, increasing the coverage of eukaryotic systems, as well as increasing general coverage through automatic generation of GPs from related resources. All data are freely available via the website and the GitHub repository.


Assuntos
Bases de Dados de Proteínas , Genoma , Proteínas/genética , Genoma Microbiano , Redes e Vias Metabólicas/genética , Complexos Multiproteicos/genética , Proteínas/metabolismo , Proteoma
9.
Nucleic Acids Res ; 47(D1): D427-D432, 2019 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-30357350

RESUMO

The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families (EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Anotação de Sequência Molecular , Domínios Proteicos , Proteínas/química , Sequências Repetitivas de Aminoácidos
10.
Nucleic Acids Res ; 46(D1): D726-D735, 2018 01 04.
Artigo em Inglês | MEDLINE | ID: mdl-29069476

RESUMO

EBI metagenomics (http://www.ebi.ac.uk/metagenomics) provides a free to use platform for the analysis and archiving of sequence data derived from the microbial populations found in a particular environment. Over the past two years, EBI metagenomics has increased the number of datasets analysed 10-fold. In addition to increased throughput, the underlying analysis pipeline has been overhauled to include both new or updated tools and reference databases. Of particular note is a new workflow for taxonomic assignments that has been extended to include assignments based on both the large and small subunit RNA marker genes and to encompass all cellular micro-organisms. We also describe the addition of metagenomic assembly as a new analysis service. Our pilot studies have produced over 2400 assemblies from datasets in the public domain. From these assemblies, we have produced a searchable, non-redundant protein database of over 50 million sequences. To provide improved access to the data stored within the resource, we have developed a programmatic interface that provides access to the analysis results and associated sample metadata. Finally, we have integrated the results of a series of statistical analyses that provide estimations of diversity and sample comparisons.


Assuntos
Bases de Dados Genéticas , Metagenômica , Microbiota , Algoritmos , Sequência de Bases , Classificação/métodos , Conjuntos de Dados como Assunto , Metagenômica/métodos , RNA Arqueal/genética , RNA Bacteriano/genética , RNA Viral/genética , Ribotipagem , Software , Transcriptoma , Interface Usuário-Computador , Navegador , Fluxo de Trabalho
11.
Nucleic Acids Res ; 44(D1): D279-85, 2016 Jan 04.
Artigo em Inglês | MEDLINE | ID: mdl-26673716

RESUMO

In the last two years the Pfam database (http://pfam.xfam.org) has undergone a substantial reorganisation to reduce the effort involved in making a release, thereby permitting more frequent releases. Arguably the most significant of these changes is that Pfam is now primarily based on the UniProtKB reference proteomes, with the counts of matched sequences and species reported on the website restricted to this smaller set. Building families on reference proteomes sequences brings greater stability, which decreases the amount of manual curation required to maintain them. It also reduces the number of sequences displayed on the website, whilst still providing access to many important model organisms. Matches to the full UniProtKB database are, however, still available and Pfam annotations for individual UniProtKB sequences can still be retrieved. Some Pfam entries (1.6%) which have no matches to reference proteomes remain; we are working with UniProt to see if sequences from them can be incorporated into reference proteomes. Pfam-B, the automatically-generated supplement to Pfam, has been removed. The current release (Pfam 29.0) includes 16 295 entries and 559 clans. The facility to view the relationship between families within a clan has been improved by the introduction of a new tool.


Assuntos
Bases de Dados de Proteínas , Proteínas/classificação , Proteoma/química , Alinhamento de Sequência , Análise de Sequência de Proteína , Anotação de Sequência Molecular
12.
Angew Chem Int Ed Engl ; 54(16): 4842-6, 2015 Apr 13.
Artigo em Inglês | MEDLINE | ID: mdl-25735258

RESUMO

The trinuclear triangle-shaped system [tris{3,5-bis(heptafluoropropyl)-1,2,4-triazolatosilver(I)}] (1) and the multi-armed square-shaped metalloporphyrin PtOEP or the free porphyrin base H2OEP serve as excellent octopus hosts (OEP=2,3,7,8,12,13,17,18-octaethyl-21H,23H-porphine). Coupling of the fluorous/organic molecular octopi 1 and H2OEP or PtOEP by strong quadrupole-quadrupole and metal-π interactions affords the supramolecular assemblies [1⋅PtOEP] or [1⋅H2OEP] (2 a), which feature nanoscopic cavities surrounding the upper triangular and lower square cores. The fluorous/organic biphasic configuration of [1⋅PtOEP] leads to an increase in the phosphorescence of PtOEP under ambient conditions. Guest molecules can be included in the biphasic double-octopus assembly in three different site-selective modes.

13.
F1000Res ; 3: 50, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25075288

RESUMO

SUMMARY: We present two web-based components for the display of Protein-Protein Interaction networks using different self-organizing layout methods: force-directed and circular. These components conform to the BioJS standard and can be rendered in an HTML5-compliant browser without the need for third-party plugins. We provide examples of interaction networks and how the components can be used to visualize them, and refer to a more complex tool that uses these components. AVAILABILITY: http://github.com/biojs/biojs; http://dx.doi.org/10.5281/zenodo.7753.

14.
BMC Bioinformatics ; 15: 129, 2014 May 06.
Artigo em Inglês | MEDLINE | ID: mdl-24885165

RESUMO

BACKGROUND: Interaction between proteins is one of the most important mechanisms in the execution of cellular functions. The study of these interactions has provided insight into the functioning of an organism's processes. As of October 2013, Homo sapiens had over 170000 Protein-Protein interactions (PPI) registered in the Interologous Interaction Database, which is only one of the many public resources where protein interactions can be accessed. These numbers exemplify the volume of data that research on the topic has generated. Visualization of large data sets is a well known strategy to make sense of information, and protein interaction data is no exception. There are several tools that allow the exploration of this data, providing different methods to visualize protein network interactions. However, there is still no native web tool that allows this data to be explored interactively online. RESULTS: Given the advances that web technologies have made recently it is time to bring these interactive views to the web to provide an easily accessible forum to visualize PPI. We have created a Web-based Protein Interaction Network Visualizer: PINV, an open source, native web application that facilitates the visualization of protein interactions (http://biosual.cbio.uct.ac.za/pinv.html). We developed PINV as a set of components that follow the protocol defined in BioJS and use the D3 library to create the graphic layouts. We demonstrate the use of PINV with multi-organism interaction networks for a predicted target from Mycobacterium tuberculosis, its interacting partners and its orthologs. CONCLUSIONS: The resultant tool provides an attractive view of complex, fully interactive networks with components that allow the querying, filtering and manipulation of the visible subset. Moreover, as a web resource, PINV simplifies sharing and publishing, activities which are vital in today's research collaborative environments. The source code is freely available for download at https://github.com/4ndr01d3/biosual.


Assuntos
Mapas de Interação de Proteínas , Software , Gráficos por Computador , Humanos , Internet , Mapeamento de Interação de Proteínas
15.
Bioinformatics ; 29(15): 1919-21, 2013 Aug 01.
Artigo em Inglês | MEDLINE | ID: mdl-23742982

RESUMO

SUMMARY: We present iAnn, an open source community-driven platform for dissemination of life science events, such as courses, conferences and workshops. iAnn allows automatic visualisation and integration of customised event reports. A central repository lies at the core of the platform: curators add submitted events, and these are subsequently accessed via web services. Thus, once an iAnn widget is incorporated into a website, it permanently shows timely relevant information as if it were native to the remote site. At the same time, announcements submitted to the repository are automatically disseminated to all portals that query the system. To facilitate the visualization of announcements, iAnn provides powerful filtering options and views, integrated in Google Maps and Google Calendar. All iAnn widgets are freely available. AVAILABILITY: http://iann.pro/iannviewer CONTACT: manuel.corpas@tgac.ac.uk.


Assuntos
Disciplinas das Ciências Biológicas , Software , Aniversários e Eventos Especiais , Congressos como Assunto , Internet
16.
Bioinformatics ; 29(8): 1103-4, 2013 Apr 15.
Artigo em Inglês | MEDLINE | ID: mdl-23435069

RESUMO

SUMMARY: BioJS is an open-source project whose main objective is the visualization of biological data in JavaScript. BioJS provides an easy-to-use consistent framework for bioinformatics application programmers. It follows a community-driven standard specification that includes a collection of components purposely designed to require a very simple configuration and installation. In addition to the programming framework, BioJS provides a centralized repository of components available for reutilization by the bioinformatics community. AVAILABILITY AND IMPLEMENTATION: http://code.google.com/p/biojs/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Gráficos por Computador , Software , Linguagens de Programação
17.
PLoS One ; 7(9): e44180, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23028496

RESUMO

A large number of diverse, complex, and distributed data resources are currently available in the Bioinformatics domain. The pace of discovery and the diversity of information means that centralised reference databases like UniProt and Ensembl cannot integrate all potentially relevant information sources. From a user perspective however, centralised access to all relevant information concerning a specific query is essential. The Distributed Annotation System (DAS) defines a communication protocol to exchange annotations on genomic and protein sequences; this standardisation enables clients to retrieve data from a myriad of sources, thus offering centralised access to end-users.We introduce MyDas, a web server that facilitates the publishing of biological annotations according to the DAS specification. It deals with the common functionality requirements of making data available, while also providing an extension mechanism in order to implement the specifics of data store interaction. MyDas allows the user to define where the required information is located along with its structure, and is then responsible for the communication protocol details.


Assuntos
Biologia Computacional , Redes de Comunicação de Computadores , Bases de Dados Genéticas , Biologia Computacional/instrumentação , Biologia Computacional/métodos , Redes de Comunicação de Computadores/instrumentação
18.
PLoS One ; 6(10): e26345, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-22046276

RESUMO

The Distributed Annotation System (DAS) is a protocol for easy sharing and integration of biological annotations. In order to visualize feature annotations in a genomic context a client is required. Here we present myKaryoView, a simple light-weight DAS tool for visualization of genomic annotation. myKaryoView has been specifically configured to help analyse data derived from personal genomics, although it can also be used as a generic genome browser visualization. Several well-known data sources are provided to facilitate comparison of known genes and normal variation regions. The navigation experience is enhanced by simultaneous rendering of different levels of detail across chromosomes. A simple interface is provided to allow searches for any SNP, gene or chromosomal region. User-defined DAS data sources may also be added when querying the system. We demonstrate myKaryoView capabilities for adding user-defined sources with a set of genetic profiles of family-related individuals downloaded directly from 23andMe. myKaryoView is a web tool for visualization of genomic data specifically designed for direct-to-consumer genomic data that uses publicly available data distributed throughout the Internet. It does not require data to be held locally and it is capable of rendering any feature as long as it conforms to DAS specifications. Configuration and addition of sources to myKaryoView can be done through the interface. Here we show a proof of principle of myKaryoView's ability to display personal genomics data with 23andMe genome data sources. The tool is available at: http://mykaryoview.com.


Assuntos
Genômica/métodos , Anotação de Sequência Molecular/métodos , Gráficos por Computador , Bases de Dados Genéticas , Internet , Interface Usuário-Computador
19.
Bioinformatics ; 27(18): 2616-7, 2011 Sep 15.
Artigo em Inglês | MEDLINE | ID: mdl-21798964

RESUMO

MOTIVATION: Dasty3 is a highly interactive and extensible Web-based framework. It provides a rich Application Programming Interface upon which it is possible to develop specialized clients capable of retrieving information from DAS sources as well as from data providers not using the DAS protocol. Dasty3 provides significant improvements on previous Web-based frameworks and is implemented using the 1.6 DAS specification. AVAILABILITY: Dasty3 is an open-source tool freely available at http://www.ebi.ac.uk/dasty/ under the terms of the GNU General public license. Source and documentation can be found at http://code.google.com/p/dasty/. CONTACT: hhe@ebi.ac.uk.


Assuntos
Bases de Dados Genéticas , Bases de Dados de Proteínas , Software , Biologia Computacional/métodos , Internet , Sistema de Registros
20.
BMC Bioinformatics ; 12: 143, 2011 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-21569281

RESUMO

BACKGROUND: Centralised resources such as GenBank and UniProt are perfect examples of the major international efforts that have been made to integrate and share biological information. However, additional data that adds value to these resources needs a simple and rapid route to public access. The Distributed Annotation System (DAS) provides an adequate environment to integrate genomic and proteomic information from multiple sources, making this information accessible to the community. DAS offers a way to distribute and access information but it does not provide domain experts with the mechanisms to participate in the curation process of the available biological entities and their annotations. RESULTS: We designed and developed a Collaborative Annotation System for proteins called DAS Writeback. DAS writeback is a protocol extension of DAS to provide the functionalities of adding, editing and deleting annotations. We implemented this new specification as extensions of both a DAS server and a DAS client. The architecture was designed with the involvement of the DAS community and it was improved after performing usability experiments emulating a real annotation task. CONCLUSIONS: We demonstrate that DAS Writeback is effective, usable and will provide the appropriate environment for the creation and evolution of community protein annotation.


Assuntos
Bases de Dados Genéticas , Armazenamento e Recuperação da Informação , Software , Redes de Comunicação de Computadores , Anotação de Sequência Molecular
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...