Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Bioinformatics ; 38(19): 4466-4473, 2022 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-35929780

RESUMO

MOTIVATION: Whole-genome sequencing has revolutionized biosciences by providing tools for constructing complete DNA sequences of individuals. With entire genomes at hand, scientists can pinpoint DNA fragments responsible for oncogenesis and predict patient responses to cancer treatments. Machine learning plays a paramount role in this process. However, the sheer volume of whole-genome data makes it difficult to encode the characteristics of genomic variants as features for learning algorithms. RESULTS: In this article, we propose three feature extraction methods that facilitate classifier learning from sets of genomic variants. The core contributions of this work include: (i) strategies for determining features using variant length binning, clustering and density estimation; (ii) a programing library for automating distribution-based feature extraction in machine learning pipelines. The proposed methods have been validated on five real-world datasets using four different classification algorithms and a clustering approach. Experiments on genomes of 219 ovarian, 61 lung and 929 breast cancer patients show that the proposed approaches automatically identify genomic biomarkers associated with cancer subtypes and clinical response to oncological treatment. Finally, we show that the extracted features can be used alongside unsupervised learning methods to analyze genomic samples. AVAILABILITY AND IMPLEMENTATION: The source code of the presented algorithms and reproducible experimental scripts are available on Github at https://github.com/MNMdiagnostics/dbfe. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Genoma , Software , Humanos , Genômica/métodos , Algoritmos , Aprendizado de Máquina
2.
Expert Syst Appl ; 1742021 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-34366575

RESUMO

Our understanding of life is based upon the interpretation of macromolecular structures and their dynamics. Almost 90% of currently known macromolecular models originated from electron density maps constructed using X-ray diffraction images. Even though diffraction images are critical for structure determination, due to their vast amounts and noisy, non-intuitive nature, their quality is rarely inspected. In this paper, we use recent advances in machine learning to automatically detect seven types of anomalies in X-ray diffraction images. For this purpose, we utilize a novel X-ray beam center detection algorithm, propose three different image representations, and compare the predictive performance of general-purpose classifiers and deep convolutional neural networks (CNNs). In benchmark tests on a set of 6,311 X-ray diffraction images, the proposed CNN achieved between 87% and 99% accuracy depending on the type of anomaly. Experimental results show that the proposed anomaly detection system can be considered suitable for early detection of sub-optimal data collection conditions and malfunctions at X-ray experimental stations.

3.
IUCrJ ; 8(Pt 3): 395-407, 2021 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-33953926

RESUMO

As part of the global mobilization to combat the present pandemic, almost 100 000 COVID-19-related papers have been published and nearly a thousand models of macromolecules encoded by SARS-CoV-2 have been deposited in the Protein Data Bank within less than a year. The avalanche of new structural data has given rise to multiple resources dedicated to assessing the correctness and quality of structural data and models. Here, an approach to evaluate the massive amounts of such data using the resource https://covid19.bioreproducibility.org is described, which offers a template that could be used in large-scale initiatives undertaken in response to future biomedical crises. Broader use of the described methodology could considerably curtail information noise and significantly improve the reproducibility of biomedical research.

4.
Nucleic Acids Res ; 49(W1): W86-W92, 2021 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-33905501

RESUMO

Structure-guided drug design depends on the correct identification of ligands in crystal structures of protein complexes. However, the interpretation of the electron density maps is challenging and often burdened with confirmation bias. Ligand identification can be aided by automatic methods such as CheckMyBlob, a machine learning algorithm that learns to generalize ligand descriptions from sets of moieties deposited in the Protein Data Bank. Here, we present the CheckMyBlob web server, a platform that can identify ligands in unmodeled fragments of electron density maps or validate ligands in existing models. The server processes PDB/mmCIF and MTZ files and returns a ranking of 10 most likely ligands for each detected electron density blob along with interactive 3D visualizations. Additionally, for each prediction/validation, a plugin script is generated that enables users to conduct a detailed analysis of the server results in Coot. The CheckMyBlob web server is available at https://checkmyblob.bioreproducibility.org.


Assuntos
Ligantes , Software , Análise por Conglomerados , Cristalografia , Bases de Dados de Proteínas , Aprendizado de Máquina , Metais/química , Peptídeos/química , Água/química
5.
IUCrJ ; 8(Pt 2): 238-256, 2021 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-33708401

RESUMO

The appearance at the end of 2019 of the new SARS-CoV-2 coronavirus led to an unprecedented response by the structural biology community, resulting in the rapid determination of many hundreds of structures of proteins encoded by the virus. As part of an effort to analyze and, if necessary, remediate these structures as deposited in the Protein Data Bank (PDB), this work presents a detailed analysis of 81 crystal structures of the main protease 3CLpro, an important target for the design of drugs against COVID-19. The structures of the unliganded enzyme and its complexes with a number of inhibitors were determined by multiple research groups using different experimental approaches and conditions; the resulting structures span 13 different polymorphs representing seven space groups. The structures of the enzyme itself, all determined by molecular replacement, are highly similar, with the exception of one polymorph with a different inter-domain orientation. However, a number of complexes with bound inhibitors were found to pose significant problems. Some of these could be traced to faulty definitions of geometrical restraints for ligands and to the general problem of a lack of such information in the PDB depositions. Several problems with ligand definition in the PDB itself were also noted. In several cases extensive corrections to the models were necessary to adhere to the evidence of the electron-density maps. Taken together, this analysis of a large number of structures of a single, medically important protein, all determined within less than a year using modern experimental tools, should be useful in future studies of other systems of high interest to the biomedical community.

6.
Nucl Instrum Methods Phys Res B ; 489: 30-40, 2021 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-33603257

RESUMO

Intense X-rays available at powerful synchrotron beamlines provide macromolecular crystallographers with an incomparable tool for investigating biological phenomena on an atomic scale. The resulting insights into the mechanism's underlying biological processes have played an essential role and shaped biomedical sciences during the last 30 years, considered the "golden age" of structural biology. In this review, we analyze selected aspects of the impact of synchrotron radiation on structural biology. Synchrotron beamlines have been used to determine over 70% of all macromolecular structures deposited into the Protein Data Bank (PDB). These structures were deposited by over 13,000 different research groups. Interestingly, despite the impressive advances in synchrotron technologies, the median resolution of macromolecular structures determined using synchrotrons has remained constant throughout the last 30 years, at about 2 Å. Similarly, the median times from the data collection to the deposition and release have not changed significantly. We describe challenges to reproducibility related to recording all relevant data and metadata during the synchrotron experiments, including diffraction images. Finally, we discuss some of the recent opinions suggesting a diminishing importance of X-ray crystallography due to impressive advances in Cryo-EM and theoretical modeling. We believe that synchrotrons of the future will increasingly evolve towards a life science center model, where X-ray crystallography, Cryo-EM, and other experimental and computational resources and knowledge are encompassed within a versatile research facility. The recent response of crystallographers to the COVID-19 pandemic suggests that X-ray crystallography conducted at synchrotron beamlines will continue to play an essential role in structural biology and drug discovery for years to come.

7.
Protein Sci ; 30(1): 115-124, 2021 01.
Artigo em Inglês | MEDLINE | ID: mdl-32981130

RESUMO

The COVID-19 pandemic has triggered numerous scientific activities aimed at understanding the SARS-CoV-2 virus and ultimately developing treatments. Structural biologists have already determined hundreds of experimental X-ray, cryo-EM, and NMR structures of proteins and nucleic acids related to this coronavirus, and this number is still growing. To help biomedical researchers, who may not necessarily be experts in structural biology, navigate through the flood of structural models, we have created an online resource, covid19.bioreproducibility.org, that aggregates expert-verified information about SARS-CoV-2-related macromolecular models. In this article, we describe this web resource along with the suite of tools and methodologies used for assessing the structures presented therein.


Assuntos
COVID-19/genética , Internet , SARS-CoV-2/ultraestrutura , Proteínas Virais/ultraestrutura , COVID-19/virologia , Bases de Dados de Compostos Químicos , Humanos , Modelos Estruturais , Pandemias , Pesquisa , SARS-CoV-2/genética , SARS-CoV-2/patogenicidade , Proteínas Virais/química , Proteínas Virais/genética
8.
IUCrJ ; 7(Pt 6)2020 Sep 29.
Artigo em Inglês | MEDLINE | ID: mdl-33063792

RESUMO

Dexamethasone, a widely used corticosteroid, has recently been reported as the first drug to increase the survival chances of patients with severe COVID-19. Therapeutic agents, including dexamethasone, are mostly transported through the body by binding to serum albumin. Here, the first structure of serum albumin in complex with dexamethasone is reported. Dexamethasone binds to drug site 7, which is also the binding site for commonly used nonsteroidal anti-inflammatory drugs and testosterone, suggesting potentially problematic binding competition. This study bridges structural findings with an analysis of publicly available clinical data from Wuhan and suggests that an adjustment of the dexamethasone regimen should be further investigated as a strategy for patients affected by two major COVID-19 risk factors: low albumin levels and diabetes.

9.
bioRxiv ; 2020 Jul 23.
Artigo em Inglês | MEDLINE | ID: mdl-32743572

RESUMO

Dexamethasone, a widely used corticosteroid, has recently been reported as the first drug to increase the survival chances of patients with severe COVID-19. Therapeutic agents, including dexamethasone, are mostly transported through the body by binding to serum albumin. Herein, we report the first structure of serum albumin in complex with dexamethasone. We show that it binds to Drug Site 7, which is also the binding site for commonly used nonsteroidal anti-inflammatory drugs and testosterone, suggesting potentially problematic binding competition. This study bridges structural findings with our analysis of publicly available clinical data from Wuhan and suggests that an adjustment of dexamethasone regimen should be considered for patients affected by two major COVID-19 risk-factors: low albumin levels and diabetes.

10.
FEBS J ; 287(17): 3703-3718, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32418327

RESUMO

A bright spot in the SARS-CoV-2 (CoV-2) coronavirus pandemic has been the immediate mobilization of the biomedical community, working to develop treatments and vaccines for COVID-19. Rational drug design against emerging threats depends on well-established methodology, mainly utilizing X-ray crystallography, to provide accurate structure models of the macromolecular drug targets and of their complexes with candidates for drug development. In the current crisis, the structural biological community has responded by presenting structure models of CoV-2 proteins and depositing them in the Protein Data Bank (PDB), usually without time embargo and before publication. Since the structures from the first-line research are produced in an accelerated mode, there is an elevated chance of mistakes and errors, with the ultimate risk of hindering, rather than speeding up, drug development. In the present work, we have used model-validation metrics and examined the electron density maps for the deposited models of CoV-2 proteins and a sample of related proteins available in the PDB as of April 1, 2020. We present these results with the aim of helping the biomedical community establish a better-validated pool of data. The proteins are divided into groups according to their structure and function. In most cases, no major corrections were necessary. However, in several cases significant revisions in the functionally sensitive area of protein-inhibitor complexes or for bound ions justified correction, re-refinement, and eventually reversioning in the PDB. The re-refined coordinate files and a tool for facilitating model comparisons are available at https://covid-19.bioreproducibility.org. DATABASE: Validated models of CoV-2 proteins are available in a dedicated, publicly accessible web service https://covid-19.bioreproducibility.org.


Assuntos
Enzima de Conversão de Angiotensina 2/química , Antivirais/química , Proteases 3C de Coronavírus/química , Receptores Virais/química , SARS-CoV-2/química , Glicoproteína da Espícula de Coronavírus/química , Enzima de Conversão de Angiotensina 2/antagonistas & inibidores , Enzima de Conversão de Angiotensina 2/genética , Enzima de Conversão de Angiotensina 2/metabolismo , Antivirais/farmacologia , Sítios de Ligação , COVID-19/virologia , Proteases 3C de Coronavírus/antagonistas & inibidores , Proteases 3C de Coronavírus/genética , Proteases 3C de Coronavírus/metabolismo , Microscopia Crioeletrônica , Cristalografia por Raios X , Bases de Dados de Proteínas/normas , Desenho de Fármacos , Humanos , Ligantes , Modelos Moleculares , Inibidores de Proteases/química , Inibidores de Proteases/farmacologia , Ligação Proteica , Conformação Proteica em alfa-Hélice , Conformação Proteica em Folha beta , Domínios e Motivos de Interação entre Proteínas , Receptores Virais/antagonistas & inibidores , Receptores Virais/genética , Receptores Virais/metabolismo , Glicoproteína da Espícula de Coronavírus/antagonistas & inibidores , Glicoproteína da Espícula de Coronavírus/genética , Glicoproteína da Espícula de Coronavírus/metabolismo , Termodinâmica
11.
FEBS J ; 287(13): 2685-2698, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32311227

RESUMO

Crystallographic models of biological macromolecules have been ranked using the quality criteria associated with them in the Protein Data Bank (PDB). The outcomes of this quality analysis have been correlated with time and with the journals that published papers based on those models. The results show that the overall quality of PDB structures has substantially improved over the last ten years, but this period of progress was preceded by several years of stagnation or even depression. Moreover, the study shows that the historically observed negative correlation between journal impact and the quality of structural models presented therein seems to disappear as time progresses.


Assuntos
Biologia Computacional/métodos , Bases de Dados de Proteínas/normas , Substâncias Macromoleculares/química , Modelos Moleculares , Proteínas/química , Controle de Qualidade , Algoritmos , Conformação Proteica , Domínios Proteicos
12.
IEEE Trans Neural Netw Learn Syst ; 31(8): 2868-2878, 2020 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-30892237

RESUMO

As each imbalanced classification problem comes with its own set of challenges, the measure used to evaluate classifiers must be individually selected. To help researchers make this decision in an informed manner, experimental and theoretical investigations compare general properties of measures. However, existing studies do not analyze changes in measure behavior imposed by different imbalance ratios. Moreover, several characteristics of imbalanced data streams, such as the effect of dynamically changing class proportions, have not been thoroughly investigated from the perspective of different metrics. In this paper, we study measure dynamics by analyzing changes of measure values, distributions, and gradients with diverging class proportions. For this purpose, we visualize measure probability mass functions and gradients. In addition, we put forward a histogram-based normalization method that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions. The results of analyzing eight popular classification measures show that the effect class proportions have on each measure is different and should be taken into account when evaluating classifiers. Apart from highlighting imbalance-related properties of each measure, our study shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.

13.
Nucleic Acids Res ; 48(2): 962-973, 2020 01 24.
Artigo em Inglês | MEDLINE | ID: mdl-31799624

RESUMO

Stereochemical restraints are commonly used to aid the refinement of macromolecular structures obtained by experimental methods at lower resolution. The standard restraint library for nucleic acids has not been updated for over two decades and needs revision. In this paper, geometrical restraints for nucleic acids sugars are derived using information from high-resolution crystal structures in the Cambridge Structural Database. In contrast to the existing restraints, this work shows that different parts of the sugar moiety form groups of covalent geometry dependent on various chemical and conformational factors, such as the type of ribose or the attached nucleobase, and ring puckering or rotamers of the glycosidic (χ) or side-chain (γ) torsion angles. Moreover, the geometry of the glycosidic link and the endocyclic ribose bond angles are functionally dependent on χ and sugar pucker amplitude (τm), respectively. The proposed restraints have been positively validated against data from the Nucleic Acid Database, compared with an ultrahigh-resolution Z-DNA structure in the Protein Data Bank, and tested by re-refining hundreds of crystal structures in the Protein Data Bank. The conformation-dependent sugar restraints presented in this work are publicly available in REFMAC, PHENIX and SHELXL format through a dedicated RestraintLib web server with an API function.


Assuntos
Ácidos Nucleicos/química , Polinucleotídeos/química , Proteínas/química , Açúcares/química , Cristalografia por Raios X , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Modelos Moleculares , Estrutura Molecular , Ácidos Nucleicos/genética , Conformação Proteica , Proteínas/classificação , Software
14.
Bioinformatics ; 35(3): 452-461, 2019 02 01.
Artigo em Inglês | MEDLINE | ID: mdl-30016407

RESUMO

Motivation: The correct identification of ligands in crystal structures of protein complexes is the cornerstone of structure-guided drug design. However, cognitive bias can sometimes mislead investigators into modeling fictitious compounds without solid support from the electron density maps. Ligand identification can be aided by automatic methods, but existing approaches are based on time-consuming iterative fitting. Results: Here we report a new machine learning algorithm called CheckMyBlob that identifies ligands from experimental electron density maps. In benchmark tests on portfolios of up to 219 931 ligand binding sites containing the 200 most popular ligands found in the Protein Data Bank, CheckMyBlob markedly outperforms the existing automatic methods for ligand identification, in some cases doubling the recognition rates, while requiring significantly less time. Our work shows that machine learning can improve the automation of structure modeling and significantly accelerate the drug screening process of macromolecule-ligand complexes. Availability and implementation: Code and data are available on GitHub at https://github.com/dabrze/CheckMyBlob. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Elétrons , Ligantes , Aprendizado de Máquina , Ligação Proteica , Algoritmos , Sítios de Ligação
15.
Acta Crystallogr B Struct Sci Cryst Eng Mater ; 75(Pt 2): 235-245, 2019 Apr 01.
Artigo em Inglês | MEDLINE | ID: mdl-32830749

RESUMO

Geometrical restraints provide key structural information for the determination of biomolecular structures at lower resolution by experimental methods such as crystallography or cryo-electron microscopy. In this work, restraint targets for nucleic acids bases are derived from three different sources and compared: small-molecule crystal structures in the Cambridge Structural Database (CSD), ultrahigh-resolution structures in the Protein Data Bank (PDB) and quantum-mechanical (QM) calculations. The best parameters are those based on CSD structures. After over two decades, the standard library of Parkinson et al. [(1996), Acta Cryst. D52, 57-64] is still valid, but improvements are possible with the use of the current CSD database. The CSD-derived geometry is fully compatible with Watson-Crick base pairs, as comparisons with QM results for isolated and paired bases clearly show that the CSD targets closely correspond to proper base pairing. While the QM results are capable of distinguishing between single and paired bases, their level of accuracy is, on average, nearly two times lower than for the CSD-derived targets when gauged by root-mean-square deviations from ultrahigh-resolution structures in the PDB. Nevertheless, the accuracy of QM results appears sufficient to provide stereochemical targets for synthetic base pairs where no reliable experimental structural information is available. To enable future tests for this approach, QM calculations are provided for isocytosine, isoguanine and the iCiG base pair.

16.
Nucleic Acids Res ; 44(17): 8479-89, 2016 09 30.
Artigo em Inglês | MEDLINE | ID: mdl-27521371

RESUMO

The refinement of macromolecular structures is usually aided by prior stereochemical knowledge in the form of geometrical restraints. Such restraints are also used for the flexible sugar-phosphate backbones of nucleic acids. However, recent highly accurate structural studies of DNA suggest that the phosphate bond angles may have inadequate description in the existing stereochemical dictionaries. In this paper, we analyze the bonding deformations of the phosphodiester groups in the Cambridge Structural Database, cluster the studied fragments into six conformation-related categories and propose a revised set of restraints for the O-P-O bond angles and distances. The proposed restraints have been positively validated against data from the Nucleic Acid Database and an ultrahigh-resolution Z-DNA structure in the Protein Data Bank. Additionally, the manual classification of PO4 geometry is compared with geometrical clusters automatically discovered by machine learning methods. The machine learning cluster analysis provides useful insights and a practical example for general applications of clustering algorithms for automatic discovery of hidden patterns of molecular geometry. Finally, we describe the implementation and application of a public-domain web server for automatic generation of the proposed restraints.


Assuntos
Ésteres/química , Conformação de Ácido Nucleico , Polinucleotídeos/química , Bases de Dados de Ácidos Nucleicos , Bases de Dados de Proteínas , Internet , Reprodutibilidade dos Testes , Coloração e Rotulagem
17.
IEEE Trans Neural Netw Learn Syst ; 25(1): 81-94, 2014 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-24806646

RESUMO

Data stream mining has been receiving increased attention due to its presence in a wide range of applications, such as sensor networks, banking, and telecommunication. One of the most important challenges in learning from data streams is reacting to concept drift, i.e., unforeseen changes of the stream's underlying data distribution. Several classification algorithms that cope with concept drift have been put forward, however, most of them specialize in one type of change. In this paper, we propose a new data stream classifier, called the Accuracy Updated Ensemble (AUE2), which aims at reacting equally well to different types of drift. AUE2 combines accuracy-based weighting mechanisms known from block-based ensembles with the incremental nature of Hoeffding Trees. The proposed algorithm is experimentally compared with 11 state-of-the-art stream methods, including single classifiers, block-based and online ensembles, and hybrid approaches in different drift scenarios. Out of all the compared algorithms, AUE2 provided best average classification accuracy while proving to be less memory consuming than other ensemble approaches. Experimental results show that AUE2 can be considered suitable for scenarios, involving many types of drift as well as static environments.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA