RESUMO
With the evermore emphasis put on open science and its invaluable benefits to the scientific community, it is no longer the case where a research project simply ends with a scientific publication. The benefits of data sharing and reproducibility of results have taken the centerpiece within the life science research supported by FAIR principles that firmly underline the importance of open data. The current data-intensive multidisciplinary research has also highlighted the significance of how data is mined and managed. Here we describe some of the features adopted by EMBL-EBI data resources to support data mining, data quality, and data management. We also highlight how EMBL-EBI has responded to the current pandemic through its data resources.
Assuntos
Disciplinas das Ciências Biológicas , Gerenciamento de Dados , Mineração de Dados , Disseminação de Informação , Reprodutibilidade dos TestesRESUMO
The European Bioinformatics Institute (EMBL-EBI) maintains a comprehensive range of freely available and up-to-date molecular data resources, which includes over 40 resources covering every major data type in the life sciences. This year's service update for EMBL-EBI includes new resources, PGS Catalog and AlphaFold DB, and updates on existing resources, including the COVID-19 Data Platform, trRosetta and RoseTTAfold models introduced in Pfam and InterPro, and the launch of Genome Integrations with Function and Sequence by UniProt and Ensembl. Furthermore, we highlight projects through which EMBL-EBI has contributed to the development of community-driven data standards and guidelines, including the Recommended Metadata for Biological Images (REMBI), and the BioModels Reproducibility Scorecard. Training is one of EMBL-EBI's core missions and a key component of the provision of bioinformatics services to users: this year's update includes many of the improvements that have been developed to EMBL-EBI's online training offering.
Assuntos
Biologia Computacional/educação , Biologia Computacional/métodos , Bases de Dados Factuais , Academias e Institutos , Inteligência Artificial , COVID-19 , Bases de Dados Factuais/economia , Bases de Dados Factuais/estatística & dados numéricos , Bases de Dados de Produtos Farmacêuticos , Bases de Dados de Proteínas , Europa (Continente) , Genoma Humano , Humanos , Armazenamento e Recuperação da Informação , RNA não Traduzido/genética , SARS-CoV-2/genéticaRESUMO
The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic will be remembered as one of the defining events of the 21st century. The rapid global outbreak has had significant impacts on human society and is already responsible for millions of deaths. Understanding and tackling the impact of the virus has required a worldwide mobilisation and coordination of scientific research. The COVID-19 Data Portal (https://www.covid19dataportal.org/) was first released as part of the European COVID-19 Data Platform, on April 20th 2020 to facilitate rapid and open data sharing and analysis, to accelerate global SARS-CoV-2 and COVID-19 research. The COVID-19 Data Portal has fortnightly feature releases to continue to add new data types, search options, visualisations and improvements based on user feedback and research. The open datasets and intuitive suite of search, identification and download services, represent a truly FAIR (Findable, Accessible, Interoperable and Reusable) resource that enables researchers to easily identify and quickly obtain the key datasets needed for their COVID-19 research.
Assuntos
Pesquisa Biomédica , COVID-19 , Bases de Dados Factuais , Conjuntos de Dados como Assunto , Disseminação de Informação , Publicação de Acesso Aberto , SARS-CoV-2 , COVID-19/epidemiologia , COVID-19/genética , COVID-19/virologia , Bases de Dados Bibliográficas , Surtos de Doenças , Humanos , Pandemias , SARS-CoV-2/química , SARS-CoV-2/genética , SARS-CoV-2/metabolismo , SARS-CoV-2/ultraestrutura , Fatores de Tempo , Proteínas Virais/química , Proteínas Virais/genéticaRESUMO
The General Data Protection Regulation (GDPR) became binding law in the European Union Member States in 2018, as a step toward harmonizing personal data protection legislation in the European Union. The Regulation governs almost all types of personal data processing, hence, also, those pertaining to biomedical research. The purpose of this article is to highlight the main practical issues related to data and biological sample sharing that biomedical researchers face regularly, and to specify how these are addressed in the context of GDPR, after consulting with ethics/legal experts. We identify areas in which clarifications of the GDPR are needed, particularly those related to consent requirements by study participants. Amendments should target the following: (1) restricting exceptions based on national laws and increasing harmonization, (2) confirming the concept of broad consent, and (3) defining a roadmap for secondary use of data. These changes will be achieved by acknowledged learned societies in the field taking the lead in preparing a document giving guidance for the optimal interpretation of the GDPR, which will be finalized following a period of commenting by a broad multistakeholder audience. In parallel, promoting engagement and education of the public in the relevant issues (such as different consent types or residual risk for re-identification), on both local/national and international levels, is considered critical for advancement. We hope that this article will open this broad discussion involving all major stakeholders, toward optimizing the GDPR and allowing a harmonized transnational research approach.
Assuntos
Pesquisa Biomédica , Segurança Computacional , Registros de Saúde Pessoal/ética , Disseminação de Informação , Pesquisa Biomédica/ética , Pesquisa Biomédica/legislação & jurisprudência , Segurança Computacional/legislação & jurisprudência , Segurança Computacional/tendências , Europa (Continente) , Humanos , Disseminação de Informação/legislação & jurisprudência , Disseminação de Informação/métodosRESUMO
The European Bioinformatics Institute (EMBL-EBI; https://www.ebi.ac.uk/) provides freely available data and bioinformatics services to the scientific community, alongside its research activity and training provision. The 2020 COVID-19 pandemic has brought to the forefront a need for the scientific community to work even more cooperatively to effectively tackle a global health crisis. EMBL-EBI has been able to build on its position to contribute to the fight against COVID-19 in a number of ways. Firstly, EMBL-EBI has used its infrastructure, expertise and network of international collaborations to help build the European COVID-19 Data Platform (https://www.covid19dataportal.org/), which brings together COVID-19 biomolecular data and connects it to researchers, clinicians and public health professionals. By September 2020, the COVID-19 Data Platform has integrated in excess of 170 000 COVID-19 biomolecular data and literature records, collected through a number of EMBL-EBI resources. Secondly, EMBL-EBI has strived to continue its support of the life science communities through the crisis, with updated Training provision and improved service provision throughout its resources. The COVID-19 pandemic has highlighted the importance of EMBL-EBI's core principles, including international cooperation, resource sharing and central data brokering, and has further empowered scientific cooperation.
Assuntos
COVID-19/prevenção & controle , Biologia Computacional/estatística & dados numéricos , Bases de Dados de Ácidos Nucleicos/estatística & dados numéricos , Armazenamento e Recuperação da Informação/métodos , SARS-CoV-2/genética , Proteínas Virais/genética , COVID-19/epidemiologia , COVID-19/virologia , Biologia Computacional/métodos , Biologia Computacional/organização & administração , Bases de Dados de Ácidos Nucleicos/organização & administração , Saúde Global , Humanos , Armazenamento e Recuperação da Informação/estatística & dados numéricos , Internet , Pandemias , SARS-CoV-2/metabolismo , SARS-CoV-2/fisiologia , Proteínas Virais/metabolismoRESUMO
Data resources at the European Bioinformatics Institute (EMBL-EBI, https://www.ebi.ac.uk/) archive, organize and provide added-value analysis of research data produced around the world. This year's update for EMBL-EBI focuses on data exchanges among resources, both within the institute and with a wider global infrastructure. Within EMBL-EBI, data resources exchange data through a rich network of data flows mediated by automated systems. This network ensures that users are served with as much information as possible from any search and any starting point within EMBL-EBI's websites. EMBL-EBI data resources also exchange data with hundreds of other data resources worldwide and collectively are a key component of a global infrastructure of interconnected life sciences data resources. We also describe the BioImage Archive, a deposition database for raw images derived from primary research that will supply data for future knowledgebases that will add value through curation of primary image data. We also report a new release of the PRIDE database with an improved technical infrastructure, a new API, a new webpage, and improved data exchange with UniProt and Expression Atlas. Training is a core mission of EMBL-EBI and in 2018 our training team served more users, both in-person and through web-based programmes, than ever before.
Assuntos
Academias e Institutos , Disciplinas das Ciências Biológicas/organização & administração , Biologia Computacional/métodos , Biologia Computacional/organização & administração , Bases de Dados Genéticas , Gerenciamento de Dados , Europa (Continente) , Humanos , Armazenamento e Recuperação da InformaçãoRESUMO
The European Bioinformatics Institute (https://www.ebi.ac.uk/) archives, curates and analyses life sciences data produced by researchers throughout the world, and makes these data available for re-use globally (https://www.ebi.ac.uk/). Data volumes continue to grow exponentially: total raw storage capacity now exceeds 160 petabytes, and we manage these increasing data flows while maintaining the quality of our services. This year we have improved the efficiency of our computational infrastructure and doubled the bandwidth of our connection to the worldwide web. We report two new data resources, the Single Cell Expression Atlas (https://www.ebi.ac.uk/gxa/sc/), which is a component of the Expression Atlas; and the PDBe-Knowledgebase (https://www.ebi.ac.uk/pdbe/pdbe-kb), which collates functional annotations and predictions for structure data in the Protein Data Bank. Additionally, Europe PMC (http://europepmc.org/) has added preprint abstracts to its search results, supplementing results from peer-reviewed publications. EMBL-EBI maintains over 150 analytical bioinformatics tools that complement our data resources. We make these tools available for users through a web interface as well as programmatically using application programming interfaces, whilst ensuring the latest versions are available for our users. Our training team, with support from all of our staff, continued to provide on-site, off-site and web-based training opportunities for thousands of researchers worldwide this year.
Assuntos
Academias e Institutos , Biologia Computacional/organização & administração , Biologia Computacional/tendências , Biologia Computacional/história , Bases de Dados Genéticas , Europa (Continente) , História do Século XXI , Humanos , SoftwareRESUMO
New technologies to generate, store and retrieve medical and research data are inducing a rapid change in clinical and translational research and health care. Systems medicine is the interdisciplinary approach wherein physicians and clinical investigators team up with experts from biology, biostatistics, informatics, mathematics and computational modeling to develop methods to use new and stored data to the benefit of the patient. We here provide a critical assessment of the opportunities and challenges arising out of systems approaches in medicine and from this provide a definition of what systems medicine entails. Based on our analysis of current developments in medicine and healthcare and associated research needs, we emphasize the role of systems medicine as a multilevel and multidisciplinary methodological framework for informed data acquisition and interdisciplinary data analysis to extract previously inaccessible knowledge for the benefit of patients.
Assuntos
Pesquisa Biomédica , Análise de Sistemas , Sistemas de Apoio a Decisões Clínicas , Humanos , Pesquisa Translacional BiomédicaRESUMO
The European Bioinformatics Institute (EMBL-EBI) supports life-science research throughout the world by providing open data, open-source software and analytical tools, and technical infrastructure (https://www.ebi.ac.uk). We accommodate an increasingly diverse range of data types and integrate them, so that biologists in all disciplines can explore life in ever-increasing detail. We maintain over 40 data resources, many of which are run collaboratively with partners in 16 countries (https://www.ebi.ac.uk/services). Submissions continue to increase exponentially: our data storage has doubled in less than two years to 120 petabytes. Recent advances in cellular imaging and single-cell sequencing techniques are generating a vast amount of high-dimensional data, bringing to light new cell types and new perspectives on anatomy. Accordingly, one of our main focus areas is integrating high-quality information from bioimaging, biobanking and other types of molecular data. This is reflected in our deep involvement in Open Targets, stewarding of plant phenotyping standards (MIAPPE) and partnership in the Human Cell Atlas data coordination platform, as well as the 2017 launch of the Omics Discovery Index. This update gives a birds-eye view of EMBL-EBI's approach to data integration and service development as genomics begins to enter the clinic.
Assuntos
Biologia Computacional , Bases de Dados Genéticas , Academias e Institutos , Animais , Ontologias Biológicas , Disciplinas das Ciências Biológicas , Bancos de Espécimes Biológicos , Computação em Nuvem , Biologia Computacional/educação , Biologia Computacional/tendências , Análise de Dados , Coleta de Dados , Bases de Dados Factuais , Europa (Continente) , Humanos , Processamento de Imagem Assistida por Computador , InternetRESUMO
The core mission of ELIXIR is to build a stable and sustainable infrastructure for biological information across Europe. At the heart of this are the data resources, tools and services that ELIXIR offers to the life-sciences community, providing stable and sustainable access to biological data. ELIXIR aims to ensure that these resources are available long-term and that the life-cycles of these resources are managed such that they support the scientific needs of the life-sciences, including biological research. ELIXIR Core Data Resources are defined as a set of European data resources that are of fundamental importance to the wider life-science community and the long-term preservation of biological data. They are complete collections of generic value to life-science, are considered an authority in their field with respect to one or more characteristics, and show high levels of scientific quality and service. Thus, ELIXIR Core Data Resources are of wide applicability and usage. This paper describes the structures, governance and processes that support the identification and evaluation of ELIXIR Core Data Resources. It identifies key indicators which reflect the essence of the definition of an ELIXIR Core Data Resource and support the promotion of excellence in resource development and operation. It describes the specific indicators in more detail and explains their application within ELIXIR's sustainability strategy and science policy actions, and in capacity building, life-cycle management and technical actions. The identification process is currently being implemented and tested for the first time. The findings and outcome will be evaluated by the ELIXIR Scientific Advisory Board in March 2017. Establishing the portfolio of ELIXIR Core Data Resources and ELIXIR Services is a key priority for ELIXIR and publicly marks the transition towards a cohesive infrastructure.
RESUMO
New technologies are revolutionising biological research and its applications by making it easier and cheaper to generate ever-greater volumes and types of data. In response, the services and infrastructure of the European Bioinformatics Institute (EMBL-EBI, www.ebi.ac.uk) are continually expanding: total disk capacity increases significantly every year to keep pace with demand (75 petabytes as of December 2015), and interoperability between resources remains a strategic priority. Since 2014 we have launched two new resources: the European Variation Archive for genetic variation data and EMPIAR for two-dimensional electron microscopy data, as well as a Resource Description Framework platform. We also launched the Embassy Cloud service, which allows users to run large analyses in a virtual environment next to EMBL-EBI's vast public data resources.
Assuntos
Bases de Dados Factuais , Biologia Computacional , Bases de Dados de Compostos Químicos , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Genes , Variação Genética , Genoma , Microscopia Eletrônica , Análise de Sequência de DNA , Análise de Sequência de RNA , Software , Integração de SistemasRESUMO
Gene Ontology (GO) provides dynamic controlled vocabularies to aid in the description of the functional biological attributes and subcellular locations of gene products from all taxonomic groups (www.geneontology.org). Here we describe collaboration between the renal biomedical research community and the GO Consortium to improve the quality and quantity of GO terms describing renal development. In the associated annotation activity, the new and revised terms were associated with gene products involved in renal development and function. This project resulted in a total of 522 GO terms being added to the ontology and the creation of approximately 9,600 kidney-related GO term associations to 940 UniProt Knowledgebase (UniProtKB) entries, covering 66 taxonomic groups. We demonstrate the impact of these improvements on the interpretation of GO term analyses performed on genes differentially expressed in kidney glomeruli affected by diabetic nephropathy. In summary, we have produced a resource that can be utilized in the interpretation of data from small- and large-scale experiments investigating molecular mechanisms of kidney function and development and thereby help towards alleviating renal disease.
Assuntos
Ontologia Genética , Rim/embriologia , Rim/metabolismo , Animais , Bases de Dados Genéticas , Bases de Dados de Proteínas , Humanos , Camundongos , Anotação de Sequência Molecular , Especificidade da Espécie , Estatística como AssuntoRESUMO
Molecular Biology has been at the heart of the 'big data' revolution from its very beginning, and the need for access to biological data is a common thread running from the 1965 publication of Dayhoff's 'Atlas of Protein Sequence and Structure' through the Human Genome Project in the late 1990s and early 2000s to today's population-scale sequencing initiatives. The European Bioinformatics Institute (EMBL-EBI; http://www.ebi.ac.uk) is one of three organizations worldwide that provides free access to comprehensive, integrated molecular data sets. Here, we summarize the principles underpinning the development of these public resources and provide an overview of EMBL-EBI's database collection to complement the reviews of individual databases provided elsewhere in this issue.
Assuntos
Bases de Dados de Compostos Químicos , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Animais , Europa (Continente) , Perfilação da Expressão Gênica , Ontologia Genética , Genômica , Genótipo , Humanos , Internet , Metabolômica , Metagenômica , Camundongos , Fenótipo , ProteômicaRESUMO
Mitochondria are a common energy source for organs and organisms; their diverse functions are specialized according to the unique phenotypes of their hosting environment. Perturbation of mitochondrial homeostasis accompanies significant pathological phenotypes. However, the connections between mitochondrial proteome properties and function remain to be experimentally established on a systematic level. This uncertainty impedes the contextualization and translation of proteomic data to the molecular derivations of mitochondrial diseases. We present a collection of mitochondrial features and functions from four model systems, including two cardiac mitochondrial proteomes from distinct genomes (human and mouse), two unique organ mitochondrial proteomes from identical genetic codons (mouse heart and mouse liver), as well as a relevant metazoan out-group (drosophila). The data, composed of mitochondrial protein abundance and their biochemical activities, capture the core functionalities of these mitochondria. This investigation allowed us to redefine the core mitochondrial proteome from organs and organisms, as well as the relevant contributions from genetic information and hosting milieu. Our study has identified significant enrichment of disease-associated genes and their products. Furthermore, correlational analyses suggest that mitochondrial proteome design is primarily driven by cellular environment. Taken together, these results connect proteome feature with mitochondrial function, providing a prospective resource for mitochondrial pathophysiology and developing novel therapeutic targets in medicine.
Assuntos
Proteínas Mitocondriais/metabolismo , Proteoma , Animais , Cromatografia Líquida , Drosophila melanogaster , Eletroforese em Gel de Poliacrilamida , Humanos , Camundongos , Espectrometria de Massas em TandemRESUMO
RATIONALE: Omics sciences enable a systems-level perspective in characterizing cardiovascular biology. Integration of diverse proteomics data via a computational strategy will catalyze the assembly of contextualized knowledge, foster discoveries through multidisciplinary investigations, and minimize unnecessary redundancy in research efforts. OBJECTIVE: The goal of this project is to develop a consolidated cardiac proteome knowledgebase with novel bioinformatics pipeline and Web portals, thereby serving as a new resource to advance cardiovascular biology and medicine. METHODS AND RESULTS: We created Cardiac Organellar Protein Atlas Knowledgebase (COPaKB; www.HeartProteome.org), a centralized platform of high-quality cardiac proteomic data, bioinformatics tools, and relevant cardiovascular phenotypes. Currently, COPaKB features 8 organellar modules, comprising 4203 LC-MS/MS experiments from human, mouse, drosophila, and Caenorhabditis elegans, as well as expression images of 10,924 proteins in human myocardium. In addition, the Java-coded bioinformatics tools provided by COPaKB enable cardiovascular investigators in all disciplines to retrieve and analyze pertinent organellar protein properties of interest. CONCLUSIONS: COPaKB provides an innovative and interactive resource that connects research interests with the new biological discoveries in protein sciences. With an array of intuitive tools in this unified Web server, nonproteomics investigators can conveniently collaborate with proteomics specialists to dissect the molecular signatures of cardiovascular phenotypes.
Assuntos
Bases de Dados de Proteínas , Bases de Conhecimento , Proteínas Musculares/metabolismo , Miocárdio/metabolismo , Proteômica/métodos , Biologia de Sistemas , Integração de Sistemas , Acesso à Informação , Animais , Caenorhabditis elegans , Difusão de Inovações , Drosophila , Humanos , Camundongos , Design de Software , Fluxo de TrabalhoRESUMO
Transcriptional control ensures genes are expressed in the right amounts at the correct times and locations. Understanding quantitatively how regulatory systems convert input signals to appropriate outputs remains a challenge. For the first time, we successfully model even skipped (eve) stripes 2 and 3+7 across the entire fly embryo at cellular resolution. A straightforward statistical relationship explains how transcription factor (TF) concentrations define eve's complex spatial expression, without the need for pairwise interactions or cross-regulatory dynamics. Simulating thousands of TF combinations, we recover known regulators and suggest new candidates. Finally, we accurately predict the intricate effects of perturbations including TF mutations and misexpression. Our approach imposes minimal assumptions about regulatory function; instead we infer underlying mechanisms from models that best fit the data, like the lack of TF-specific thresholds and the positional value of homotypic interactions. Our study provides a general and quantitative method for elucidating the regulation of diverse biological systems. DOI:http://dx.doi.org/10.7554/eLife.00522.001.
Assuntos
Drosophila melanogaster/embriologia , Modelos Biológicos , RNA Mensageiro/genética , Animais , Modelos LogísticosRESUMO
Protein sequence databases are the pillar upon which modern proteomics is supported, representing a stable reference space of predicted and validated proteins. One example of such resources is UniProt, enriched with both expertly curated and automatic annotations. Taken largely for granted, similar mature resources such as UniProt are not available yet in some other "omics" fields, lipidomics being one of them. While having a seasoned community of wet lab scientists, lipidomics lies significantly behind proteomics in the adoption of data standards and other core bioinformatics concepts. This work aims to reduce the gap by developing an equivalent resource to UniProt called 'LipidHome', providing theoretically generated lipid molecules and useful metadata. Using the 'FASTLipid' Java library, a database was populated with theoretical lipids, generated from a set of community agreed upon chemical bounds. In parallel, a web application was developed to present the information and provide computational access via a web service. Designed specifically to accommodate high throughput mass spectrometry based approaches, lipids are organised into a hierarchy that reflects the variety in the structural resolution of lipid identifications. Additionally, cross-references to other lipid related resources and papers that cite specific lipids were used to annotate lipid records. The web application encompasses a browser for viewing lipid records and a 'tools' section where an MS1 search engine is currently implemented. LipidHome can be accessed at http://www.ebi.ac.uk/apweiler-srv/lipidhome.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Metabolismo dos Lipídeos , Espectrometria de Massas , InternetRESUMO
The Gene Ontology (GO) is the de facto standard for the functional description of gene products, providing a consistent, information-rich terminology applicable across species and information repositories. The UniProt Consortium uses both manual and automatic GO annotation approaches to curate UniProt Knowledgebase (UniProtKB) entries. The selection of a protein set prioritized for manual annotation has implications for the characteristics of the information provided to users working in a specific field or interested in particular pathways or processes. In this article, we describe an organelle-focused, manual curation initiative targeting proteins from the human peroxisome. We discuss the steps taken to define the peroxisome proteome and the challenges encountered in defining the boundaries of this protein set. We illustrate with the use of examples how GO annotations now capture cell and tissue type information and the advantages that such an annotation approach provides to users. Database URL: http://www.ebi.ac.uk/GOA/ and http://www.uniprot.org.
Assuntos
Anotação de Sequência Molecular , Peroxissomos/metabolismo , Proteoma/metabolismo , Bases de Dados de Proteínas , Humanos , Especificidade de Órgãos , Peroxissomos/genética , Ligação Proteica , Mapeamento de Interação de Proteínas , Isoformas de Proteínas/genética , Isoformas de Proteínas/metabolismo , Transporte Proteico , Proteoma/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/metabolismo , Especificidade da EspécieRESUMO
The community working on model organisms is growing steadily and the number of model organisms for which proteome data are being generated is continuously increasing. To standardize efforts and to make optimal use of proteomics data acquired from model organisms, a new Human Proteome Organisation (HUPO) initiative on model organism proteomes (iMOP) was approved at the HUPO Ninth Annual World Congress in Sydney, 2010. iMOP will seek to stimulate scientific exchange and disseminate HUPO best practices. The needs of model organism researchers for central databases will be better represented, catalyzing the integration of proteomics and organism-specific databases. Full details of iMOP activities, members, tools and resources can be found at our website http://www.imop.uzh.ch/ and new members are invited to join us.