Pesquisa | BVS - MINISTÉRIO DA SAÚDE

A framework to assess the quality and impact of bioinformatics training across ELIXIR.

Gurwitz, Kim T; Singh Gaur, Prakash; Bellis, Louisa J; Larcombe, Lee; Alloza, Eva; Balint, Balint Laszlo; Botzki, Alexander; Dimec, Jure; Dominguez Del Angel, Victoria; Fernandes, Pedro L; Korpelainen, Eija; Krause, Roland; Kuzak, Mateusz; Le Pera, Loredana; Leskosek, Brane; Lindvall, Jessica M; Marek, Diana; Martinez, Paula A; Muyldermans, Tuur; Nygård, Ståle; Palagi, Patricia M; Peterson, Hedi; Psomopoulos, Fotis; Spiwok, Vojtech; van Gelder, Celia W G; Via, Allegra; Vidak, Marko; Wibberg, Daniel; Morgan, Sarah L; Rustici, Gabriella.

PLoS Comput Biol ; 16(7): e1007976, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32702016

RESUMO

ELIXIR is a pan-European intergovernmental organisation for life science that aims to coordinate bioinformatics resources in a single infrastructure across Europe; bioinformatics training is central to its strategy, which aims to develop a training community that spans all ELIXIR member states. In an evidence-based approach for strengthening bioinformatics training programmes across Europe, the ELIXIR Training Platform, led by the ELIXIR EXCELERATE Quality and Impact Assessment Subtask in collaboration with the ELIXIR Training Coordinators Group, has implemented an assessment strategy to measure quality and impact of its entire training portfolio. Here, we present ELIXIR's framework for assessing training quality and impact, which includes the following: specifying assessment aims, determining what data to collect in order to address these aims, and our strategy for centralised data collection to allow for ELIXIR-wide analyses. In addition, we present an overview of the ELIXIR training data collected over the past 4 years. We highlight the importance of a coordinated and consistent data collection approach and the relevance of defining specific metrics and answer scales for consortium-wide analyses as well as for comparison of data across iterations of the same course.

Assuntos

Biologia Computacional/educação , Controle de Qualidade , Algoritmos , Pesquisa Biomédica , Biologia Computacional/normas , Currículo , Coleta de Dados , Bases de Dados Factuais , Educação Continuada , Europa (Continente) , Avaliação de Programas e Projetos de Saúde , Reprodutibilidade dos Testes , Pesquisadores , Software , Interface Usuário-Computador

An open source chemical structure curation pipeline using RDKit.

Bento, A Patrícia; Hersey, Anne; Félix, Eloy; Landrum, Greg; Gaulton, Anna; Atkinson, Francis; Bellis, Louisa J; De Veij, Marleen; Leach, Andrew R.

J Cheminform ; 12(1): 51, 2020 Sep 01.

Artigo em Inglês | MEDLINE | ID: mdl-33431044

RESUMO

BACKGROUND: The ChEMBL database is one of a number of public databases that contain bioactivity data on small molecule compounds curated from diverse sources. Incoming compounds are typically not standardised according to consistent rules. In order to maintain the quality of the final database and to easily compare and integrate data on the same compound from different sources it is necessary for the chemical structures in the database to be appropriately standardised. RESULTS: A chemical curation pipeline has been developed using the open source toolkit RDKit. It comprises three components: a Checker to test the validity of chemical structures and flag any serious errors; a Standardizer which formats compounds according to defined rules and conventions and a GetParent component that removes any salts and solvents from the compound to create its parent. This pipeline has been applied to the latest version of the ChEMBL database as well as uncurated datasets from other sources to test the robustness of the process and to identify common issues in database molecular structures. CONCLUSION: All the components of the structure pipeline have been made freely available for other researchers to use and adapt for their own use. The code is available in a GitHub repository and it can also be accessed via the ChEMBL Beaker webservices. It has been used successfully to standardise the nearly 2 million compounds in the ChEMBL database and the compound validity checker has been used to identify compounds with the most serious issues so that they can be prioritised for manual curation.

Training bioinformaticians in High Performance Computing.

Pérez-Wohlfeil, Esteban; Torreno, Oscar; Bellis, Louisa J; Fernandes, Pedro L; Leskosek, Brane; Trelles, Oswaldo.

Heliyon ; 4(12): e01057, 2018 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-30582061

RESUMO

In the last decade, bioinformatics has become an indispensable branch of modern science research, experiencing an explosion in financial support, developed applications and data collection. The growth of the datasets that are emerging from research laboratories, industry, the health sector, etc., are increasingly raising the levels of demand in computing power and storage. Processing biological data, in the large scales of these datasets, often requires the use of High Performance Computing (HPC) resources, especially when dealing with certain types of omics data, such as genomic and metagenomic data. Such computational resources not only require substantial investments, but they also involve high maintenance costs. More importantly, in order to keep good returns from the investments, specific training needs to be put in place to ensure that wasting is minimized. Furthermore, given that bioinformatics is a highly interdisciplinary field where several other domains intersect (such as biology, chemistry, physics and computer science), researchers from these areas also require bioinformatics-specific training in HPC, in order to fully take advantage of supercomputing centers. In this document, we describe our experience in training researchers from several different disciplines in HPC, as applied to bioinformatics under the framework of the leading European bioinformatics platform ELIXIR, and analyze both the content and outcomes of the course.

The ChEMBL database in 2017.

Gaulton, Anna; Hersey, Anne; Nowotka, Michal; Bento, A Patrícia; Chambers, Jon; Mendez, David; Mutowo, Prudence; Atkinson, Francis; Bellis, Louisa J; Cibrián-Uhalte, Elena; Davies, Mark; Dedman, Nathan; Karlsson, Anneli; Magariños, María Paula; Overington, John P; Papadatos, George; Smit, Ines; Leach, Andrew R.

Nucleic Acids Res ; 45(D1): D945-D954, 2017 01 04.

Artigo em Inglês | MEDLINE | ID: mdl-27899562

RESUMO

ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 and 2014 Nucleic Acids Research Database Issues. Since then, alongside the continued extraction of data from the medicinal chemistry literature, new sources of bioactivity data have also been added to the database. These include: deposited data sets from neglected disease screening; crop protection data; drug metabolism and disposition data and bioactivity data from patents. A number of improvements and new features have also been incorporated. These include the annotation of assays and targets using ontologies, the inclusion of targets and indications for clinical candidates, addition of metabolic pathways for drugs and calculation of structural alerts. The ChEMBL data can be accessed via a web-interface, RDF distribution, data downloads and RESTful web-services.

Assuntos

Bases de Dados de Compostos Químicos , Bases de Dados de Ácidos Nucleicos , Ferramenta de Busca , Biologia Computacional/métodos , Proteção de Cultivos , Descoberta de Drogas , Ontologia Genética , Humanos , Anotação de Sequência Molecular , Farmacologia/métodos , Interface Usuário-Computador , Navegador

A large-scale crop protection bioassay data set.

Gaulton, Anna; Kale, Namrata; van Westen, Gerard J P; Bellis, Louisa J; Bento, A Patrícia; Davies, Mark; Hersey, Anne; Papadatos, George; Forster, Mark; Wege, Philip; Overington, John P.

Sci Data ; 2: 150032, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26175909

RESUMO

ChEMBL is a large-scale drug discovery database containing bioactivity information primarily extracted from scientific literature. Due to the medicinal chemistry focus of the journals from which data are extracted, the data are currently of most direct value in the field of human health research. However, many of the scientific use-cases for the current data set are equally applicable in other fields, such as crop protection research: for example, identification of chemical scaffolds active against a particular target or endpoint, the de-convolution of the potential targets of a phenotypic assay, or the potential targets/pathways for safety liabilities. In order to broaden the applicability of the ChEMBL database and allow more widespread use in crop protection research, an extensive data set of bioactivity data of insecticidal, fungicidal and herbicidal compounds and assays was collated and added to the database.

Assuntos

Proteção de Cultivos , Bases de Dados de Compostos Químicos , Bioensaio , Herbicidas , Inseticidas

The ChEMBL bioactivity database: an update.

Bento, A Patrícia; Gaulton, Anna; Hersey, Anne; Bellis, Louisa J; Chambers, Jon; Davies, Mark; Krüger, Felix A; Light, Yvonne; Mak, Lora; McGlinchey, Shaun; Nowotka, Michal; Papadatos, George; Santos, Rita; Overington, John P.

Nucleic Acids Res ; 42(Database issue): D1083-90, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24214965

RESUMO

ChEMBL is an open large-scale bioactivity database (https://www.ebi.ac.uk/chembl), previously described in the 2012 Nucleic Acids Research Database Issue. Since then, a variety of new data sources and improvements in functionality have contributed to the growth and utility of the resource. In particular, more comprehensive tracking of compounds from research stages through clinical development to market is provided through the inclusion of data from United States Adopted Name applications; a new richer data model for representing drug targets has been developed; and a number of methods have been put in place to allow users to more easily identify reliable data. Finally, access to ChEMBL is now available via a new Resource Description Framework format, in addition to the web-based interface, data downloads and web services.

Assuntos

Bases de Dados de Compostos Químicos , Descoberta de Drogas , Sítios de Ligação , Humanos , Internet , Ligantes , Preparações Farmacêuticas/química , Proteínas/química , Proteínas/efeitos dos fármacos

ChEMBL: a large-scale bioactivity database for drug discovery.

Gaulton, Anna; Bellis, Louisa J; Bento, A Patricia; Chambers, Jon; Davies, Mark; Hersey, Anne; Light, Yvonne; McGlinchey, Shaun; Michalovich, David; Al-Lazikani, Bissan; Overington, John P.

Nucleic Acids Res ; 40(Database issue): D1100-7, 2012 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-21948594

RESUMO

ChEMBL is an Open Data database containing binding, functional and ADMET information for a large number of drug-like bioactive compounds. These data are manually abstracted from the primary published literature on a regular basis, then further curated and standardized to maximize their quality and utility across a wide range of chemical biology and drug-discovery research problems. Currently, the database contains 5.4 million bioactivity measurements for more than 1 million compounds and 5200 protein targets. Access is available through a web-based interface, data downloads and web services at: https://www.ebi.ac.uk/chembldb.

Assuntos

Bases de Dados Factuais , Descoberta de Drogas , Bases de Dados de Proteínas , Humanos , Preparações Farmacêuticas/química , Proteínas/química , Proteínas/metabolismo , Interface Usuário-Computador

Collation and data-mining of literature bioactivity data for drug discovery.

Bellis, Louisa J; Akhtar, Ruth; Al-Lazikani, Bissan; Atkinson, Francis; Bento, A Patricia; Chambers, Jon; Davies, Mark; Gaulton, Anna; Hersey, Anne; Ikeda, Kazuyoshi; Krüger, Felix A; Light, Yvonne; McGlinchey, Shaun; Santos, Rita; Stauch, Benjamin; Overington, John P.

Biochem Soc Trans ; 39(5): 1365-70, 2011 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-21936816

RESUMO

The challenge of translating the huge amount of genomic and biochemical data into new drugs is a costly and challenging task. Historically, there has been comparatively little focus on linking the biochemical and chemical worlds. To address this need, we have developed ChEMBL, an online resource of small-molecule SAR (structure-activity relationship) data, which can be used to support chemical biology, lead discovery and target selection in drug discovery. The database contains the abstracted structures, properties and biological activities for over 700000 distinct compounds and in excess of more than 3 million bioactivity records abstracted from over 40000 publications. Additional public domain resources can be readily integrated into the same data model (e.g. PubChem BioAssay data). The compounds in ChEMBL are largely extracted from the primary medicinal chemistry literature, and are therefore usually 'drug-like' or 'lead-like' small molecules with full experimental context. The data cover a significant fraction of the discovery of modern drugs, and are useful in a wide range of drug design and discovery tasks. In addition to the compound data, ChEMBL also contains information for over 8000 protein, cell line and whole-organism 'targets', with over 4000 of those being proteins linked to their underlying genes. The database is searchable both chemically, using an interactive compound sketch tool, protein sequences, family hierarchies, SMILES strings, compound research codes and key words, and biologically, using a variety of gene identifiers, protein sequence similarity and protein families. The information retrieved can then be readily filtered and downloaded into various formats. ChEMBL can be accessed online at https://www.ebi.ac.uk/chembldb.

Assuntos

Mineração de Dados , Bases de Dados Factuais , Descoberta de Drogas , Animais , Biologia Computacional/métodos , Genômica , Humanos , Armazenamento e Recuperação da Informação , Estrutura Molecular , Preparações Farmacêuticas/química , Preparações Farmacêuticas/metabolismo , Proteínas/química , Relação Estrutura-Atividade

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA