Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 241
Filtrar
1.
Nucleic Acids Res ; 2021 Nov 25.
Artigo em Inglês | MEDLINE | ID: mdl-34850134

RESUMO

The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.

2.
Nucleic Acids Res ; 2021 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-34871441

RESUMO

The rapid development of proteomics studies has resulted in large volumes of experimental data. The emergence of big data platform provides the opportunity to handle these large amounts of data. The integrated proteome resource, iProX (https://www.iprox.cn), which was initiated in 2017, has been greatly improved with an up-to-date big data platform implemented in 2021. Here, we describe the main iProX developments since its first publication in Nucleic Acids Research in 2019. First, a hyper-converged architecture with high scalability supports the submission process. A hadoop cluster can store large amounts of proteomics datasets, and a distributed, RESTful-styled Elastic Search engine can query millions of records within one second. Also, several new features, including the Universal Spectrum Identifier (USI) mechanism proposed by ProteomeXchange, RESTful Web Service API, and a high-efficiency reanalysis pipeline, have been added to iProX for better open data sharing. By the end of August 2021, 1526 datasets had been submitted to iProX, reaching a total data volume of 92.42TB. With the implementation of the big data platform, iProX can support PB-level data storage, hundreds of billions of spectra records, and second-level latency service capabilities that meet the requirements of the fast growing field of proteomics.

3.
Nucleic Acids Res ; 2021 Nov 12.
Artigo em Inglês | MEDLINE | ID: mdl-34788843

RESUMO

The Reactome Knowledgebase (https://reactome.org), an Elixir core resource, provides manually curated molecular details across a broad range of physiological and pathological biological processes in humans, including both hereditary and acquired disease processes. The processes are annotated as an ordered network of molecular transformations in a single consistent data model. Reactome thus functions both as a digital archive of manually curated human biological processes and as a tool for discovering functional relationships in data such as gene expression profiles or somatic mutation catalogs from tumor cells. Recent curation work has expanded our annotations of normal and disease-associated signaling processes and of the drugs that target them, in particular infections caused by the SARS-CoV-1 and SARS-CoV-2 coronaviruses and the host response to infection. New tools support better simultaneous analysis of high-throughput data from multiple sources and the placement of understudied ('dark') proteins from analyzed datasets in the context of Reactome's manually curated pathways.

4.
Front Pharmacol ; 12: 708296, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34721010

RESUMO

The early prediction of drug adverse effects is of great interest to pharmaceutical research, as toxicity is one of the leading reasons for drug attrition. Understanding the cell signaling and regulatory pathways affected by a drug candidate is crucial to the study of drug toxicity. In this study, we present a computational technique that employs the propagation of drug-protein interactions to connect compounds to biological pathways. Target profiles for drugs were built by retrieving drug target proteins from public repositories such as ChEMBL, DrugBank, IUPHAR, PharmGKB, and TTD. Subsequent enrichment test of the protein pool using Reactome revealed potential pathways affected by the drugs. Furthermore, an optional tissue filter utilizing the Human Protein Atlas was applied to identify tissue-specific pathways. The analysis pipeline was implemented in an open-source KNIME workflow called Path4Drug to allow automated data retrieval and reconstruction for any given drug present in ChEMBL. The pipeline was applied to withdrawn drugs and cardio- and hepatotoxic drugs with black box warnings to identify biochemical pathways they affect and to find pathways that can be potentially connected to the toxic events. To complement this approach, drugs used in cardiac therapy without any record of toxicity were also analyzed. The results provide already known associations as well as a large amount of additional potential connections. Consequently, our approach can link drugs to biological pathways by leveraging big data available in public resources. The developed tool is openly available and modifiable to support other systems biology analyses.

5.
Nucleic Acids Res ; 2021 Nov 11.
Artigo em Inglês | MEDLINE | ID: mdl-34761267

RESUMO

The IntAct molecular interaction database (https://www.ebi.ac.uk/intact) is a curated resource of molecular interactions, derived from the scientific literature and from direct data depositions. As of August 2021, IntAct provides more than one million binary interactions, curated by twelve global partners of the International Molecular Exchange consortium, for which the IntAct database provides a shared curation and dissemination platform. The IMEx curation policy has always emphasised a fine-grained data and curation model, aiming to capture the relevant experimental detail essential for the interpretation of the provided molecular interaction data. Here, we present recent curation focus and progress, as well as a completely redeveloped website which presents IntAct data in a much more user-friendly and detailed way.

6.
Nucleic Acids Res ; 2021 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-34718729

RESUMO

The Complex Portal (www.ebi.ac.uk/complexportal) is a manually curated, encyclopaedic database of macromolecular complexes with known function from a range of model organisms. It summarizes complex composition, topology and function along with links to a large range of domain-specific resources (i.e. wwPDB, EMDB and Reactome). Since the last update in 2019, we have produced a first draft complexome for Escherichia coli, maintained and updated that of Saccharomyces cerevisiae, added over 40 coronavirus complexes and increased the human complexome to over 1100 complexes that include approximately 200 complexes that act as targets for viral proteins or are part of the immune system. The display of protein features in ComplexViewer has been improved and the participant table is now colour-coordinated with the nodes in ComplexViewer. Community collaboration has expanded, for example by contributing to an analysis of putative transcription cofactors and providing data accessible to semantic web tools through Wikidata which is now populated with manually curated Complex Portal content through a new bot. Our data license is now CC0 to encourage data reuse. Users are encouraged to get in touch, provide us with feedback and send curation requests through the 'Support' link.

7.
Bioinformatics ; 2021 May 07.
Artigo em Inglês | MEDLINE | ID: mdl-33961020

RESUMO

SUMMARY: IntAct App is a Cytoscape 3 application that grants in-depth access to IntAct's molecular interaction data. It builds networks where nodes are interacting molecules (mainly proteins, but also genes, RNA, chemicals…) and edges represent evidence of interaction. Users can query a network by providing its molecules, identified by different fields, and optionally include all their interacting partners in the resulting network. The app offers three visualisations: one only displaying interactions, another representing every evidence, and the last one emphasizing evidence where mutated versions of proteins were used. Users can also filter networks and click on nodes and edges to access all their related details. Finally, the application supports automation of its main features via Cytoscape commands. AVAILABILITY AND IMPLEMENTATION: Implementation available at https://apps.cytoscape.org/apps/intactapp, while the source code is available at https://github.com/EBI-IntAct/IntactApp. SUPPLEMENTARY INFORMATION: Supplementary data is available at Bioinformatics online.

8.
Bioinformatics ; 2021 Apr 08.
Artigo em Inglês | MEDLINE | ID: mdl-33830216

RESUMO

MOTIVATION: Curation is essential for any data platform to maintain the quality of the data it provides. Today, more effective curation tools are often vital to keep up with the rapid growth of existing, maintenance-requiring databases and the amount of newly published information that needs to be surveyed. However, curation interfaces are often complex and challenging to be further developed. Therefore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources or a reluctance to change sensitive production systems. RESULTS: We propose a decoupled, modular and scriptable architecture to build new curation tools on top of existing platforms. Our architecture treats the existing platform as a black box. It therefore only relies on its public application programming interfaces (APIs) and web application instead of requiring any changes to the existing infrastructure. As a case study, we have implemented this architecture in cmd-iaso, a curation tool for the identifiers.org registry. With cmd-iaso, we also show that the proposed design's flexibility can be utilised to streamline and enhance the curator's workflow with the platform's existing web interface. AVAILABILITY: The cmd-iaso curation tool is implemented in Python 3.7+ and supports Linux, macOS and Windows. Its source code and documentation are freely available from https://github.com/identifiers-org/cmd-iaso. It is also published as a Docker container at https://hub.docker.com/r/identifiersorg/cmd-iaso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

9.
Nucleic Acids Res ; 49(6): 3156-3167, 2021 04 06.
Artigo em Inglês | MEDLINE | ID: mdl-33677561

RESUMO

The EMBL-EBI Complex Portal is a knowledgebase of macromolecular complexes providing persistent stable identifiers. Entries are linked to literature evidence and provide details of complex membership, function, structure and complex-specific Gene Ontology annotations. Data are freely available and downloadable in HUPO-PSI community standards and missing entries can be requested for curation. In collaboration with Saccharomyces Genome Database and UniProt, the yeast complexome, a compendium of all known heteromeric assemblies from the model organism Saccharomyces cerevisiae, was curated. This expansion of knowledge and scope has led to a 50% increase in curated complexes compared to the previously published dataset, CYC2008. The yeast complexome is used as a reference resource for the analysis of complexes from large-scale experiments. Our analysis showed that genes coding for proteins in complexes tend to have more genetic interactions, are co-expressed with more genes, are more multifunctional, localize more often in the nucleus, and are more often involved in nucleic acid-related metabolic processes and processes where large machineries are the predominant functional drivers. A comparison to genetic interactions showed that about 40% of expanded co-complex pairs also have genetic interactions, suggesting strong functional links between complex members.


Assuntos
Proteínas de Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/metabolismo , Conjuntos de Dados como Assunto , Ontologia Genética , Bases de Conhecimento , Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/genética
10.
Mol Syst Biol ; 17(2): e9982, 2021 02.
Artigo em Inglês | MEDLINE | ID: mdl-33620773

RESUMO

Reproducibility of scientific results is a key element of science and credibility. The lack of reproducibility across many scientific fields has emerged as an important concern. In this piece, we assess mathematical model reproducibility and propose a scorecard for improving reproducibility in this field.


Assuntos
Biologia de Sistemas/métodos , Curadoria de Dados , Humanos , Modelos Teóricos , Reprodutibilidade dos Testes
11.
Bioinformatics ; 37(12): 1781-1782, 2021 07 19.
Artigo em Inglês | MEDLINE | ID: mdl-33031499

RESUMO

MOTIVATION: Since its launch in 2010, Identifiers.org has become an important tool for the annotation and cross-referencing of Life Science data. In 2016, we established the Compact Identifier (CID) scheme (prefix: accession) to generate globally unique identifiers for data resources using their locally assigned accession identifiers. Since then, we have developed and improved services to support the growing need to create, reference and resolve CIDs, in systems ranging from human readable text to cloud-based e-infrastructures, by providing high availability and low-latency cloud-based services, backed by a high-quality, manually curated resource. RESULTS: We describe a set of services that can be used to construct and resolve CIDs in Life Sciences and beyond. We have developed a new front end for accessing the Identifiers.org registry data and APIs to simplify integration of Identifiers.org CID services with third-party applications. We have also deployed the new Identifiers.org infrastructure in a commercial cloud environment, bringing our services closer to the data. AVAILABILITYAND IMPLEMENTATION: https://identifiers.org.


Assuntos
Disciplinas das Ciências Biológicas , Computação em Nuvem , Humanos
12.
J Proteomics ; 232: 104070, 2021 02 10.
Artigo em Inglês | MEDLINE | ID: mdl-33307250

RESUMO

Spectral similarity calculation is widely used in protein identification tools and mass spectra clustering algorithms while comparing theoretical or experimental spectra. The performance of the spectral similarity calculation plays an important role in these tools and algorithms especially in the analysis of large-scale datasets. Recently, deep learning methods have been proposed to improve the performance of clustering algorithms and protein identification by training the algorithms with existing data and the use of multiple spectra and identified peptide features. While the efficiency of these algorithms is still under study in comparison with traditional approaches, their application in proteomics data analysis is becoming more common. Here, we propose the use of deep learning to improve spectral similarity comparison. We assessed the performance of deep learning for spectral similarity, with GLEAMS and a newly trained embedder model (DLEAMSE), which uses high-quality spectra from PRIDE Cluster. Also, we developed a new bioinformatics tool (mslookup - https://github.com/bigbio/DLEAMSE/) that allows users to quickly search for spectra in previously identified mass spectra publish in public repositories and spectral libraries. Finally, we released a human database to enable bioinformaticians and biologists to search for identified spectra in their machines. SIGNIFICANCE STATEMENT: Spectral similarity calculation plays an important role in proteomics data analysis. With deep learning's ability to learn the implicit and effective features from large-scale training datasets, deep learning-based MS/MS spectra embedding models has emerged as a solution to improve mass spectral clustering similarity calculation algorithms. We compare multiple similarity scoring and deep learning methods in terms of accuracy (compute the similarity for a pair of the mass spectrum) and computing-time performance. The benchmark results showed no major differences in accuracy between DLEAMSE and normalized dot product for spectrum similarity calculations. The DLEAMSE GPU implementation is faster than NDP in preprocessing on the GPU server and the similarity calculation of DLEAMSE (Euclidean distance on 32-D vectors) takes about 1/3 of dot product calculations. The deep learning model (DLEAMSE) encoding and embedding steps needed to run once for each spectrum and the embedded 32-D points can be persisted in the repository for future comparison, which is faster for future comparisons and large-scale data. Based on these, we proposed a new tool mslookup that enables the researcher to find spectra previously identified in public data. The tool can be also used to generate in-house databases of previously identified spectra to share with other laboratories and consortiums.


Assuntos
Aprendizado Profundo , Espectrometria de Massas em Tandem , Algoritmos , Análise por Conglomerados , Bases de Dados de Proteínas , Humanos , Proteômica , Software
13.
Bioinformatics ; 36(24): 5712-5718, 2021 04 05.
Artigo em Inglês | MEDLINE | ID: mdl-32637990

RESUMO

MOTIVATION: A large variety of molecular interactions occurs between biomolecular components in cells. When a molecular interaction results in a regulatory effect, exerted by one component onto a downstream component, a so-called 'causal interaction' takes place. Causal interactions constitute the building blocks in our understanding of larger regulatory networks in cells. These causal interactions and the biological processes they enable (e.g. gene regulation) need to be described with a careful appreciation of the underlying molecular reactions. A proper description of this information enables archiving, sharing and reuse by humans and for automated computational processing. Various representations of causal relationships between biological components are currently used in a variety of resources. RESULTS: Here, we propose a checklist that accommodates current representations, called the Minimum Information about a Molecular Interaction CAusal STatement (MI2CAST). This checklist defines both the required core information, as well as a comprehensive set of other contextual details valuable to the end user and relevant for reusing and reproducing causal molecular interaction information. The MI2CAST checklist can be used as reporting guidelines when annotating and curating causal statements, while fostering uniformity and interoperability of the data across resources. AVAILABILITY AND IMPLEMENTATION: The checklist together with examples is accessible at https://github.com/MI2CAST/MI2CAST. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Causalidade , Humanos
14.
Autophagy ; 17(6): 1543-1554, 2021 06.
Artigo em Inglês | MEDLINE | ID: mdl-32486891

RESUMO

The 21st century has revealed much about the fundamental cellular process of autophagy. Autophagy controls the catabolism and recycling of various cellular components both as a constitutive process and as a response to stress and foreign material invasion. There is considerable knowledge of the molecular mechanisms of autophagy, and this is still growing as new modalities emerge. There is a need to investigate autophagy mechanisms reliably, comprehensively and conveniently. Reactome is a freely available knowledgebase that consists of manually curated molecular events (reactions) organized into cellular pathways (https://reactome.org/). Pathways/reactions in Reactome are hierarchically structured, graphically presented and extensively annotated. Data analysis tools, such as pathway enrichment, expression data overlay and species comparison, are also available. For customized analysis, information can also be programmatically queried. Here, we discuss the curation and annotation of the molecular mechanisms of autophagy in Reactome. We also demonstrate the value that Reactome adds to research by reanalyzing a previously published work on genome-wide CRISPR screening of autophagy components.Abbreviations: CMA: chaperone-mediated autophagy; GO: Gene Ontology; MA: macroautophagy; MI: microautophagy; MTOR: mechanistic target of rapamycin kinase; SQSTM1: sequestosome 1.

15.
Nat Commun ; 11(1): 6144, 2020 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-33262342

RESUMO

The International Molecular Exchange (IMEx) Consortium provides scientists with a single body of experimentally verified protein interactions curated in rich contextual detail to an internationally agreed standard. In this update to the work of the IMEx Consortium, we discuss how this initiative has been working in practice, how it has ensured database sustainability, and how it is meeting emerging annotation challenges through the introduction of new interactor types and data formats. Additionally, we provide examples of how IMEx data are being used by biomedical researchers and integrated in other bioinformatic tools and resources.


Assuntos
Acesso à Informação , Bases de Dados Genéticas , Humanos , Disseminação de Informação , Cooperação Internacional
16.
Mol Cell Proteomics ; 19(12): 2115-2125, 2020 12.
Artigo em Inglês | MEDLINE | ID: mdl-32907876

RESUMO

Pathway analyses are key methods to analyze 'omics experiments. Nevertheless, integrating data from different 'omics technologies and different species still requires considerable bioinformatics knowledge.Here we present the novel ReactomeGSA resource for comparative pathway analyses of multi-omics datasets. ReactomeGSA can be used through Reactome's existing web interface and the novel ReactomeGSA R Bioconductor package with explicit support for scRNA-seq data. Data from different species is automatically mapped to a common pathway space. Public data from ExpressionAtlas and Single Cell ExpressionAtlas can be directly integrated in the analysis. ReactomeGSA greatly reduces the technical barrier for multi-omics, cross-species, comparative pathway analyses.We used ReactomeGSA to characterize the role of B cells in anti-tumor immunity. We compared B cell rich and poor human cancer samples from five of the Cancer Genome Atlas (TCGA) transcriptomics and two of the Clinical Proteomic Tumor Analysis Consortium (CPTAC) proteomics studies. B cell-rich lung adenocarcinoma samples lacked the otherwise present activation through NFkappaB. This may be linked to the presence of a specific subset of tumor associated IgG+ plasma cells that lack NFkappaB activation in scRNA-seq data from human melanoma. This showcases how ReactomeGSA can derive novel biomedical insights by integrating large multi-omics datasets.


Assuntos
Bases de Dados Genéticas , Proteômica , Software , Linfócitos B/imunologia , Humanos , Internet , Interface Usuário-Computador
17.
Mol Syst Biol ; 16(8): e9110, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32845085

RESUMO

Systems biology has experienced dramatic growth in the number, size, and complexity of computational models. To reproduce simulation results and reuse models, researchers must exchange unambiguous model descriptions. We review the latest edition of the Systems Biology Markup Language (SBML), a format designed for this purpose. A community of modelers and software authors developed SBML Level 3 over the past decade. Its modular form consists of a core suited to representing reaction-based models and packages that extend the core with features suited to other model types including constraint-based models, reaction-diffusion models, logical network models, and rule-based models. The format leverages two decades of SBML and a rich software ecosystem that transformed how systems biologists build and interact with models. More recently, the rise of multiscale models of whole cells and organs, and new data sources such as single-cell measurements and live imaging, has precipitated new ways of integrating data with models. We provide our perspectives on the challenges presented by these developments and how SBML Level 3 provides the foundation needed to support this evolution.


Assuntos
Biologia de Sistemas/métodos , Animais , Humanos , Modelos Logísticos , Modelos Biológicos , Software
19.
Bioinformatics ; 36(17): 4649-4654, 2020 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-32573648

RESUMO

MOTIVATION: One of the major bottlenecks in building systems biology models is identification and estimation of model parameters for model calibration. Searching for model parameters from published literature and models is an essential, yet laborious task. RESULTS: We have developed a new service, BioModels Parameters, to facilitate search and retrieval of parameter values from the Systems Biology Markup Language models stored in BioModels. Modellers can now directly search for a model entity (e.g. a protein or drug) to retrieve the rate equations describing it; the associated parameter values (e.g. degradation rate, production rate, Kcat, Michaelis-Menten constant, etc.) and the initial concentrations. Currently, BioModels Parameters contains entries from over 84,000 reactions and 60 different taxa with cross-references. The retrieved rate equations and parameters can be used for scanning parameter ranges, model fitting and model extension. Thus, BioModels Parameters will be a valuable service for systems biology modellers. AVAILABILITY AND IMPLEMENTATION: The data are accessible via web interface and API. BioModels Parameters is free to use and is publicly available at https://www.ebi.ac.uk/biomodels/parameterSearch. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Modelos Biológicos , Biologia de Sistemas , Software
20.
J Integr Bioinform ; 17(2-3)2020 Jun 29.
Artigo em Inglês | MEDLINE | ID: mdl-32598315

RESUMO

This paper presents a report on outcomes of the 10th Computational Modeling in Biology Network (COMBINE) meeting that was held in Heidelberg, Germany, in July of 2019. The annual event brings together researchers, biocurators and software engineers to present recent results and discuss future work in the area of standards for systems and synthetic biology. The COMBINE initiative coordinates the development of various community standards and formats for computational models in the life sciences. Over the past 10 years, COMBINE has brought together standard communities that have further developed and harmonized their standards for better interoperability of models and data. COMBINE 2019 was co-located with a stakeholder workshop of the European EU-STANDS4PM initiative that aims at harmonized data and model standardization for in silico models in the field of personalized medicine, as well as with the FAIRDOM PALs meeting to discuss findable, accessible, interoperable and reusable (FAIR) data sharing. This report briefly describes the work discussed in invited and contributed talks as well as during breakout sessions. It also highlights recent advancements in data, model, and annotation standardization efforts. Finally, this report concludes with some challenges and opportunities that this community will face during the next 10 years.


Assuntos
Biologia Computacional , Biologia Sintética , Alemanha , Padrões de Referência , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...