Search | Virtual Health Library

The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods.

Zdrazil, Barbara; Felix, Eloy; Hunter, Fiona; Manners, Emma J; Blackshaw, James; Corbett, Sybilla; de Veij, Marleen; Ioannidis, Harris; Lopez, David Mendez; Mosquera, Juan F; Magarinos, Maria Paula; Bosc, Nicolas; Arcila, Ricardo; Kizilören, Tevfik; Gaulton, Anna; Bento, A Patrícia; Adasme, Melissa F; Monecke, Peter; Landrum, Gregory A; Leach, Andrew R.

Nucleic Acids Res ; 52(D1): D1180-D1192, 2024 Jan 05.

Article in English | MEDLINE | ID: mdl-37933841

ABSTRACT

ChEMBL (https://www.ebi.ac.uk/chembl/) is a manually curated, high-quality, large-scale, open, FAIR and Global Core Biodata Resource of bioactive molecules with drug-like properties, previously described in the 2012, 2014, 2017 and 2019 Nucleic Acids Research Database Issues. Since its introduction in 2009, ChEMBL's content has changed dramatically in size and diversity of data types. Through incorporation of multiple new datasets from depositors since the 2019 update, ChEMBL now contains slightly more bioactivity data from deposited data vs data extracted from literature. In collaboration with the EUbOPEN consortium, chemical probe data is now regularly deposited into ChEMBL. Release 27 made curated data available for compounds screened for potential anti-SARS-CoV-2 activity from several large-scale drug repurposing screens. In addition, new patent bioactivity data have been added to the latest ChEMBL releases, and various new features have been incorporated, including a Natural Product likeness score, updated flags for Natural Products, a new flag for Chemical Probes, and the initial annotation of the action type for â¼270 000 bioactivity measurements.

Subject(s)

Drug Discovery , Databases, Factual , Time Factors

Ten simple rules for making training materials FAIR.

Garcia, Leyla; Batut, Bérénice; Burke, Melissa L; Kuzak, Mateusz; Psomopoulos, Fotis; Arcila, Ricardo; Attwood, Teresa K; Beard, Niall; Carvalho-Silva, Denise; Dimopoulos, Alexandros C; Del Angel, Victoria Dominguez; Dumontier, Michel; Gurwitz, Kim T; Krause, Roland; McQuilton, Peter; Le Pera, Loredana; Morgan, Sarah L; Rauste, Päivi; Via, Allegra; Kahlem, Pascal; Rustici, Gabriella; van Gelder, Celia W G; Palagi, Patricia M.

PLoS Comput Biol ; 16(5): e1007854, 2020 05.

Article in English | MEDLINE | ID: mdl-32437350

ABSTRACT

Everything we do today is becoming more and more reliant on the use of computers. The field of biology is no exception; but most biologists receive little or no formal preparation for the increasingly computational aspects of their discipline. In consequence, informal training courses are often needed to plug the gaps; and the demand for such training is growing worldwide. To meet this demand, some training programs are being expanded, and new ones are being developed. Key to both scenarios is the creation of new course materials. Rather than starting from scratch, however, it's sometimes possible to repurpose materials that already exist. Yet finding suitable materials online can be difficult: They're often widely scattered across the internet or hidden in their home institutions, with no systematic way to find them. This is a common problem for all digital objects. The scientific community has attempted to address this issue by developing a set of rules (which have been called the Findable, Accessible, Interoperable and Reusable [FAIR] principles) to make such objects more findable and reusable. Here, we show how to apply these rules to help make training materials easier to find, (re)use, and adapt, for the benefit of all.

Subject(s)

Computer-Assisted Instruction/standards , Guidelines as Topic , Biology/education , Computational Biology , Humans , Information Storage and Retrieval

Illuminating the druggable genome through patent bioactivity data.

Magariños, Maria P; Gaulton, Anna; Félix, Eloy; Kiziloren, Tevfik; Arcila, Ricardo; Oprea, Tudor I; Leach, Andrew R.

PeerJ ; 11: e15153, 2023.

Article in English | MEDLINE | ID: mdl-37151295

ABSTRACT

The patent literature is a potentially valuable source of bioactivity data. In this article we describe a process to prioritise 3.7 million life science relevant patents obtained from the SureChEMBL database (https://www.surechembl.org/), according to how likely they were to contain bioactivity data for potent small molecules on less-studied targets, based on the classification developed by the Illuminating the Druggable Genome (IDG) project. The overall goal was to select a smaller number of patents that could be manually curated and incorporated into the ChEMBL database. Using relatively simple annotation and filtering pipelines, we have been able to identify a substantial number of patents containing quantitative bioactivity data for understudied targets that had not previously been reported in the peer-reviewed medicinal chemistry literature. We quantify the added value of such methods in terms of the numbers of targets that are so identified, and provide some specific illustrative examples. Our work underlines the potential value in searching the patent corpus in addition to the more traditional peer-reviewed literature. The small molecules found in these patents, together with their measured activity against the targets, are now accessible via the ChEMBL database.

Subject(s)

Chemistry, Pharmaceutical , Drug Discovery , Drug Discovery/methods , Databases, Factual

MAIP: a web service for predicting blood-stage malaria inhibitors.

Bosc, Nicolas; Felix, Eloy; Arcila, Ricardo; Mendez, David; Saunders, Martin R; Green, Darren V S; Ochoada, Jason; Shelat, Anang A; Martin, Eric J; Iyer, Preeti; Engkvist, Ola; Verras, Andreas; Duffy, James; Burrows, Jeremy; Gardner, J Mark F; Leach, Andrew R.

J Cheminform ; 13(1): 13, 2021 Feb 22.

Article in English | MEDLINE | ID: mdl-33618772

ABSTRACT

Malaria is a disease affecting hundreds of millions of people across the world, mainly in developing countries and especially in sub-Saharan Africa. It is the cause of hundreds of thousands of deaths each year and there is an ever-present need to identify and develop effective new therapies to tackle the disease and overcome increasing drug resistance. Here, we extend a previous study in which a number of partners collaborated to develop a consensus in silico model that can be used to identify novel molecules that may have antimalarial properties. The performance of machine learning methods generally improves with the number of data points available for training. One practical challenge in building large training sets is that the data are often proprietary and cannot be straightforwardly integrated. Here, this was addressed by sharing QSAR models, each built on a private data set. We describe the development of an open-source software platform for creating such models, a comprehensive evaluation of methods to create a single consensus model and a web platform called MAIP available at https://www.ebi.ac.uk/chembl/maip/ . MAIP is freely available for the wider community to make large-scale predictions of potential malaria inhibiting compounds. This project also highlights some of the practical challenges in reproducing published computational methods and the opportunities that open-source software can offer to the community.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL