Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
NPJ Syst Biol Appl ; 9(1): 63, 2023 Dec 18.
Artículo en Inglés | MEDLINE | ID: mdl-38110446

RESUMEN

Assessing the mutagenicity of chemicals is an essential task in the drug development process. Usually, databases and other structured sources for AMES mutagenicity exist, which have been carefully and laboriously curated from scientific publications. As knowledge accumulates over time, updating these databases is always an overhead and impractical. In this paper, we first propose the problem of predicting the mutagenicity of chemicals from textual information in scientific publications. More simply, given a chemical and evidence in the natural language form from publications where the mutagenicity of the chemical is described, the goal of the model/algorithm is to predict if it is potentially mutagenic or not. For this, we first construct a golden standard data set and then propose MutaPredBERT, a prediction model fine-tuned on BioLinkBERT based on a question-answering formulation of the problem. We leverage transfer learning and use the help of large transformer-based models to achieve a Macro F1 score of >0.88 even with relatively small data for fine-tuning. Our work establishes the utility of large language models for the construction of structured sources of knowledge bases directly from scientific publications.


Asunto(s)
Mutágenos , Mutágenos/toxicidad , Bases de Datos Factuales
2.
Mutagenesis ; 37(3-4): 191-202, 2022 10 26.
Artículo en Inglés | MEDLINE | ID: mdl-35554560

RESUMEN

Assessing a compound's mutagenicity using machine learning is an important activity in the drug discovery and development process. Traditional methods of mutagenicity detection, such as Ames test, are expensive and time and labor intensive. In this context, in silico methods that predict a compound mutagenicity with high accuracy are important. Recently, machine-learning (ML) models are increasingly being proposed to improve the accuracy of mutagenicity prediction. While these models are used in practice, there is further scope to improve the accuracy of these models. We hypothesize that choosing the right features to train the model can further lead to better accuracy. We systematically consider and evaluate a combination of novel structural and molecular features which have the maximal impact on the accuracy of models. We rigorously evaluate these features against multiple classification models (from classical ML models to deep neural network models). The performance of the models was assessed using 5- and 10-fold cross-validation and we show that our approach using the molecule structure, molecular properties, and structural alerts as feature sets successfully outperform the state-of-the-art methods for mutagenicity prediction for the Hansen et al. benchmark dataset with an area under the receiver operating characteristic curve of 0.93. More importantly, our framework shows how combining features could benefit model accuracy improvements.


Asunto(s)
Aprendizaje Automático , Mutágenos , Mutágenos/toxicidad , Mutágenos/química , Redes Neurales de la Computación , Mutagénesis
3.
Data Brief ; 29: 105383, 2020 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-32195305

RESUMEN

Intrinsic Disorder Proteins (IDPs) have become a hot topic since their characterisation in the 90s. The data presented in this article are related to our research entitled "A structural entropy index to analyse local conformations in Intrinsically Disordered Proteins" published in Journal of Structural Biology [1]. In this study, we quantified, for the first time, continuum from rigidity to flexibility and finally disorder. Non-disordered regions were also highlighted in the ensemble of disordered proteins. This work was done using the Protein Ensemble Database (PED), which is a useful database collecting series of protein structures considered as IDPs. The data set consists of a collection of cleaned protein files in classical pdb format that can be readily used as an input with most automatic analysis software. The accompanying data include the coding of all structural information in terms of a structural alphabet, namely Protein Blocks (PBs). An entropy index derived from PBs that allows apprehending the continuum between protein rigidity to flexibility to disorder is included, with information from secondary structure assignment, protein accessibility and prediction of disorder from the sequences. The data may be used for further structural bioinformatics studies of IDPs. It can also be used as a benchmark for evaluating disorder prediction methods.

4.
Int J Mol Sci ; 21(6)2020 Mar 24.
Artículo en Inglés | MEDLINE | ID: mdl-32213914

RESUMEN

The number of available protein structures in the Protein Data Bank (PDB) has considerably increased in recent years. Thanks to the growth of structures and complexes, numerous large-scale studies have been done in various research areas, e.g., protein-protein, protein-DNA, or in drug discovery. While protein redundancy was only simply managed using simple protein sequence identity threshold, the similarity of protein-ligand complexes should also be considered from a structural perspective. Hence, the protein-ligand duplicates in the PDB are widely known, but were never quantitatively assessed, as they are quite complex to analyze and compare. Here, we present a specific clustering of protein-ligand structures to avoid bias found in different studies. The methodology is based on binding site superposition, and a combination of weighted Root Mean Square Deviation (RMSD) assessment and hierarchical clustering. Repeated structures of proteins of interest are highlighted and only representative conformations were conserved for a non-biased view of protein distribution. Three types of cases are described based on the number of distinct conformations identified for each complex. Defining these categories decreases by 3.84-fold the number of complexes, and offers more refined results compared to a protein sequence-based method. Widely distinct conformations were analyzed using normalized B-factors. Furthermore, a non-redundant dataset was generated for future molecular interactions analysis or virtual screening studies.


Asunto(s)
Bases de Datos de Proteínas , Simulación del Acoplamiento Molecular/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Sitios de Unión , Humanos , Ligandos , Unión Proteica
5.
J Struct Biol ; 210(1): 107464, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31978465

RESUMEN

Sequence - structure - function paradigm has been revolutionized by the discovery of disordered regions and disordered proteins more than two decades ago. While the definition of rigidity is simple with X-ray structures, the notion of flexibility is linked to high experimental B-factors. The definition of disordered regions is more complex as in these same X-ray structures; it is associated to the position of missing residues. Thus a continuum so seems to exist between rigidity, flexibility and disorder. However, it had not been precisely described. In this study, we used an ensemble of disordered proteins (or regions) and, we applied a structural alphabet to analyse their local conformation. This structural alphabet, namely Protein Blocks, had been efficiently used to highlight rigid local domains within flexible regions and so discriminates deformability and mobility concepts. Using an entropy index derived from this structural alphabet, we underlined its interest to measure these local dynamics, and to quantify, for the first time, continuum states from rigidity to flexibility and finally disorder. We also highlight non-disordered regions in the ensemble of disordered proteins in our study.


Asunto(s)
Proteínas Intrínsecamente Desordenadas/química , Entropía , Conformación Proteica
6.
J Biomol Struct Dyn ; 38(10): 2988-3002, 2020 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-31361191

RESUMEN

Protein structures are highly dynamic macromolecules. This dynamics is often analysed through experimental and/or computational methods only for an isolated or a limited number of proteins. Here, we explore large-scale protein dynamics simulation to observe dynamics of local protein conformations using different perspectives. We analysed molecular dynamics to investigate protein flexibility locally, using classical approaches such as RMSf, solvent accessibility, but also innovative approaches such as local entropy. First, we focussed on classical secondary structures and analysed specifically how ß-strand, ß-turns, and bends evolve during molecular simulations. We underlined interesting specific bias between ß-turns and bends, which are considered as the same category, while their dynamics show differences. Second, we used a structural alphabet that is able to approximate every part of the protein structures conformations, namely protein blocks (PBs) to analyse (i) how each initial local protein conformations evolve during dynamics and (ii) if some exchange can exist among these PBs. Interestingly, the results are largely complex than simple regular/rigid and coil/flexible exchange. AbbreviationsNeqnumber of equivalentPBProtein BlocksPDBProtein DataBankRMSfroot mean square fluctuationsCommunicated by Ramaswamy H. Sarma.


Asunto(s)
Simulación de Dinámica Molecular , Proteínas , Entropía , Conformación Proteica , Estructura Secundaria de Proteína , Proteínas/genética
7.
J Med Chem ; 62(21): 9341-9356, 2019 11 14.
Artículo en Inglés | MEDLINE | ID: mdl-31117513

RESUMEN

Halogen atoms have been at the center of many rational medicinal chemistry applications in drug design. While fluorine and chlorine atoms are often added to enhance physicochemical properties, bromine and iodine elements are generally inserted to improve selectivity. Favorable halogen interactions such as halogen bond have been thoroughly studied through quantum mechanics and statistical analyses. Although most of the studies focus on halogen interaction through its σ-hole, hydrogen bonding also has a significant impact. Here, we present an analysis describing the interacting environment of halogen atoms in protein-ligand context. With consideration of structural redundancy in the PDB, tendencies toward specific molecular interactions consideration have been refined and implications for rational drug design with halogens further discussed. Finally, we highlight the moderate occurrence of halogen bonding and present the other roles of halogen in protein-ligand complexes, completing the medicinal chemistry guide to rational halogen interactions.


Asunto(s)
Diseño de Fármacos , Halógenos/química , Proteínas/metabolismo , Bases de Datos de Proteínas , Ligandos , Unión Proteica , Proteínas/química
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...