RESUMO
Larger training datasets have been shown to improve the accuracy of machine learning (ML)-based scoring functions (SFs) for structure-based virtual screening (SBVS). In addition, massive test sets for SBVS, known as ultra-large compound libraries, have been demonstrated to enable the fast discovery of selective drug leads with low-nanomolar potency. This proof-of-concept was carried out on two targets using a single docking tool along with its SF. It is thus unclear whether this high level of performance would generalise to other targets, docking tools and SFs. We found that screening a larger compound library results in more potent actives being identified in all six additional targets using a different docking tool along with its classical SF. Furthermore, we established that a way to improve the potency of the retrieved molecules further is to rank them with more accurate ML-based SFs (we found this to be true in four of the six targets; the difference was not significant in the remaining two targets). A 3-fold increase in average hit rate across targets was also achieved by the ML-based SFs. Lastly, we observed that classical and ML-based SFs often find different actives, which supports using both types of SFs on those targets.
Assuntos
Bases de Dados de Proteínas , Aprendizado de Máquina , Simulação de Acoplamento Molecular , Proteínas/química , Proteínas/genéticaRESUMO
A central goal of precision oncology is to administer an optimal drug treatment to each cancer patient. A common preclinical approach to tackle this problem has been to characterize the tumors of patients at the molecular and drug response levels, and employ the resulting datasets for predictive in silico modeling (mostly using machine learning). Understanding how and why the different variants of these datasets are generated is an important component of this process. This review focuses on providing such introduction aimed at scientists with little previous exposure to this research area.
Assuntos
Biomarcadores Tumorais , Biologia Computacional/métodos , Neoplasias/etiologia , Neoplasias/metabolismo , Farmacogenética/métodos , Animais , Antineoplásicos/farmacologia , Antineoplásicos/uso terapêutico , Biópsia , Linhagem Celular Tumoral , Bases de Dados Genéticas , Modelos Animais de Doenças , Resistencia a Medicamentos Antineoplásicos , Epigenômica/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Ensaios de Triagem em Larga Escala , Humanos , Neoplasias/tratamento farmacológico , Neoplasias/patologia , Medicina de Precisão/métodos , Proteômica/métodosRESUMO
We discuss how data unbiasing and simple methods such as protein-ligand Interaction FingerPrint (IFP) can overestimate virtual screening performance. We also show that IFP is strongly outperformed by target-specific machine-learning scoring functions, which were not considered in a recent report concluding that simple methods were better than machine-learning scoring functions at virtual screening.
Assuntos
Ligantes , Proteínas , Proteínas/química , Aprendizado de MáquinaRESUMO
BACKGROUND: Despite limited clinical evidence of its efficacy, cannabis use has been commonly reported for the management of various mental health concerns in naturalistic field studies. The aim of the current study was to use machine learning methods to investigate predictors of perceived symptom change across various mental health symptoms with acute cannabis use in a large naturalistic sample. METHODS: Data from 68,819 unique observations of cannabis use from 1307 individuals using cannabis to manage mental health symptoms were analyzed. Data were extracted from Strainprint®, a mobile app that allows users to monitor their cannabis use for therapeutic purposes. Machine learning models were employed to predict self-perceived symptom change after cannabis use, and SHapley Additive exPlanations (SHAP) value plots were used to assess feature importance of individual predictors in the model. Interaction effects of symptom severity pre-scores of anxiety, depression, insomnia, and gender were also examined. RESULTS: The factors that were most strongly associated with perceived symptom change following acute cannabis use were pre-symptom severity, age, gender, and the ratio of CBD to THC. Further examination on the impact of baseline severity for the most commonly reported symptoms revealed distinct responses, with cannabis being reported to more likely benefit individuals with lower pre-symptom severity for depression, and higher pre-symptom severity for insomnia. Responses to cannabis use also differed between genders. CONCLUSIONS: Findings from this study highlight the importance of several factors in predicting perceived symptom change with acute cannabis use for mental health symptom management. Mental health profiles and baseline symptom severity may play a large role in perceived responses to cannabis. Distinct response patterns were also noted across commonly reported mental health symptoms, emphasizing the need for placebo-controlled cannabis trials for specific user profiles.
Assuntos
Cannabis , Distúrbios do Início e da Manutenção do Sono , Humanos , Masculino , Feminino , Saúde Mental , Ansiedade/terapia , Transtornos de AnsiedadeRESUMO
Májovský and colleagues have investigated the important issue of ChatGPT being used for the complete generation of scientific works, including fake data and tables. The issues behind why ChatGPT poses a significant concern to research reach far beyond the model itself. Once again, the lack of reproducibility and visibility of scientific works creates an environment where fraudulent or inaccurate work can thrive. What are some of the ways in which we can handle this new situation?
Assuntos
Inteligência Artificial , Software , Humanos , Reprodutibilidade dos TestesRESUMO
BACKGROUND: Patients with obsessive-compulsive disorder (OCD) are at increased risk for suicide attempt (SA) compared to the general population. However, the significant risk factors for SA in this population remains unclear - whether these factors are associated with the disorder itself or related to extrinsic factors, such as comorbidities and sociodemographic variables. This study aimed to identify predictors of SA in OCD patients using a machine learning algorithm. METHODS: A total of 959 outpatients with OCD were included. An elastic net model was performed to recognize the predictors of SA among OCD patients, using clinical and sociodemographic variables. RESULTS: The prevalence of SA in our sample was 10.8%. Relevant predictors of SA founded by the elastic net algorithm were the following: previous suicide planning, previous suicide thoughts, lifetime depressive episode, and intermittent explosive disorder. Our elastic net model had a good performance and found an area under the curve of 0.95. CONCLUSIONS: This is the first study to evaluate risk factors for SA among OCD patients using machine learning algorithms. Our results demonstrate an accurate risk algorithm can be created using clinical and sociodemographic variables. All aspects of suicidal phenomena need to be carefully investigated by clinicians in every evaluation of OCD patients. Particular attention should be given to comorbidity with depressive symptoms.
Assuntos
Transtorno Obsessivo-Compulsivo , Tentativa de Suicídio , Comorbidade , Humanos , Aprendizado de Máquina , Transtorno Obsessivo-Compulsivo/diagnóstico , Transtorno Obsessivo-Compulsivo/epidemiologia , Prevalência , Ideação SuicidaRESUMO
BACKGROUND: There is still little knowledge of objective suicide risk stratification. METHODS: This study aims to develop models using machine-learning approaches to predict suicide attempt (1) among survey participants in a nationally representative sample and (2) among participants with lifetime major depressive episodes. We used a cohort called the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC) that was conducted in two waves and included a nationally representative sample of the adult population in the United States. Wave 1 involved 43 093 respondents and wave 2 involved 34 653 completed face-to-face reinterviews with wave 1 participants. Predictor variables included clinical, stressful life events, and sociodemographic variables from wave 1; outcome included suicide attempt between wave 1 and wave 2. RESULTS: The model built with elastic net regularization distinguished individuals who had attempted suicide from those who had not with an area under the ROC curve (AUC) of 0.89, balanced accuracy 81.86%, specificity 89.22%, and sensitivity 74.51% for the general population. For participants with lifetime major depressive episodes, AUC was 0.89, balanced accuracy 81.64%, specificity 85.86%, and sensitivity 77.42%. The most important predictor variables were a diagnosis of borderline personality disorder, post-traumatic stress disorder, and being of Asian descent for the model in all participants; and previous suicide attempt, borderline personality disorder, and overnight stay in hospital because of depressive symptoms for the model in participants with lifetime major depressive episodes. Random forest and artificial neural networks had similar performance. CONCLUSIONS: Risk for suicide attempt can be estimated with high accuracy.
Assuntos
Transtornos Relacionados ao Uso de Álcool , Transtorno Depressivo Maior , Transtornos de Estresse Pós-Traumáticos , Adulto , Humanos , Estados Unidos/epidemiologia , Tentativa de Suicídio , Transtorno Depressivo Maior/epidemiologia , Transtorno Depressivo Maior/diagnóstico , Estudos Prospectivos , Transtornos Relacionados ao Uso de Álcool/epidemiologia , Fatores de RiscoRESUMO
BACKGROUND: The clinical effects of smartphone-based interventions for bipolar disorder (BD) have yet to be established. OBJECTIVES: To examine the efficacy of smartphone-based interventions in BD and how the included studies reported user-engagement indicators. METHODS: We conducted a systematic search on January 24, 2022, in PubMed, Scopus, Embase, APA PsycINFO, and Web of Science. We used random-effects meta-analysis to calculate the standardized difference (Hedges' g) in pre-post change scores between smartphone intervention and control conditions. The study was pre-registered with PROSPERO (CRD42021226668). RESULTS: The literature search identified 6034 studies. Thirteen articles fulfilled the selection criteria. We included seven RCTs and performed meta-analyses comparing the pre-post change in depressive and (hypo)manic symptom severity, functioning, quality of life, and perceived stress between smartphone interventions and control conditions. There was significant heterogeneity among studies and no meta-analysis reached statistical significance. Results were also inconclusive regarding affective relapses and psychiatric readmissions. All studies reported positive user-engagement indicators. CONCLUSION: We did not find evidence to support that smartphone interventions may reduce the severity of depressive or manic symptoms in BD. The high heterogeneity of studies supports the need for expert consensus to establish ideally how studies should be designed and the use of more sensitive outcomes, such as affective relapses and psychiatric hospitalizations, as well as the quantification of mood instability. The ISBD Big Data Task Force provides preliminary recommendations to reduce the heterogeneity and achieve more valid evidence in the field.
Assuntos
Transtorno Bipolar , Smartphone , Big Data , Transtorno Bipolar/psicologia , Humanos , Qualidade de Vida , RecidivaRESUMO
OBJECTIVE: To evaluate whether accelerated brain aging occurs in individuals with mood or psychotic disorders. METHODS: A systematic review following PRISMA guidelines was conducted. A meta-analysis was then performed to assess neuroimaging-derived brain age gap in three independent groups: (1) schizophrenia and first-episode psychosis, (2) major depressive disorder, and (3) bipolar disorder. RESULTS: A total of 18 papers were included. The random-effects model meta-analysis showed a significantly increased neuroimaging-derived brain age gap relative to age-matched controls for the three major psychiatric disorders, with schizophrenia (3.08; 95%CI [2.32; 3.85]; p < 0.01) presenting the largest effect, followed by bipolar disorder (1.93; [0.53; 3.34]; p < 0.01) and major depressive disorder (1.12; [0.41; 1.83]; p < 0.01). The brain age gap was larger in older compared to younger individuals. CONCLUSION: Individuals with mood and psychotic disorders may undergo a process of accelerated brain aging reflected in patterns captured by neuroimaging data. The brain age gap tends to be more pronounced in older individuals, indicating a possible cumulative biological effect of illness burden.
Assuntos
Transtorno Bipolar , Transtorno Depressivo Maior , Transtornos Psicóticos , Esquizofrenia , Idoso , Transtorno Bipolar/diagnóstico por imagem , Transtorno Bipolar/epidemiologia , Encéfalo/diagnóstico por imagem , Transtorno Depressivo Maior/diagnóstico por imagem , Transtorno Depressivo Maior/epidemiologia , Humanos , Transtornos Psicóticos/diagnóstico por imagem , Transtornos Psicóticos/epidemiologia , Esquizofrenia/diagnóstico por imagemRESUMO
MOTIVATION: Studies have shown that the accuracy of random forest (RF)-based scoring functions (SFs), such as RF-Score-v3, increases with more training samples, whereas that of classical SFs, such as X-Score, does not. Nevertheless, the impact of the similarity between training and test samples on this matter has not been studied in a systematic manner. It is therefore unclear how these SFs would perform when only trained on protein-ligand complexes that are highly dissimilar or highly similar to the test set. It is also unclear whether SFs based on machine learning algorithms other than RF can also improve accuracy with increasing training set size and to what extent they learn from dissimilar or similar training complexes. RESULTS: We present a systematic study to investigate how the accuracy of classical and machine-learning SFs varies with protein-ligand complex similarities between training and test sets. We considered three types of similarity metrics, based on the comparison of either protein structures, protein sequences or ligand structures. Regardless of the similarity metric, we found that incorporating a larger proportion of similar complexes to the training set did not make classical SFs more accurate. In contrast, RF-Score-v3 was able to outperform X-Score even when trained on just 32% of the most dissimilar complexes, showing that its superior performance owes considerably to learning from dissimilar training complexes to those in the test set. In addition, we generated the first SF employing Extreme Gradient Boosting (XGBoost), XGB-Score, and observed that it also improves with training set size while outperforming the rest of SFs. Given the continuous growth of training datasets, the development of machine-learning SFs has become very appealing. AVAILABILITY AND IMPLEMENTATION: https://github.com/HongjianLi/MLSF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Aprendizado de Máquina , Ligantes , Ligação Proteica , ProteínasRESUMO
BACKGROUND: Essential workers have been shown to present a higher prevalence of positive screenings for anxiety and depression during the COVID-19 pandemic. Individuals from countries with socioeconomic inequalities may be at increased risk for mental health disorders. OBJECTIVE: We aimed to assess the prevalence and predictors of depression, anxiety, and their comorbidity among essential workers in Brazil and Spain during the COVID-19 pandemic. METHODS: A web survey was conducted between April and May 2020 in both countries. The main outcome was a positive screening for depression only, anxiety only, or both. Lifestyle was measured using a lifestyle multidimensional scale adapted for the COVID-19 pandemic (Short Multidimensional Inventory Lifestyle Evaluation-Confinement). A multinomial logistic regression model was performed to evaluate the factors associated with depression, anxiety, and the presence of both conditions. RESULTS: From the 22,786 individuals included in the web survey, 3745 self-reported to be essential workers. Overall, 8.3% (n=311), 11.6% (n=434), and 27.4% (n=1027) presented positive screenings for depression, anxiety, and both, respectively. After adjusting for confounding factors, the multinomial model showed that an unhealthy lifestyle increased the likelihood of depression (adjusted odds ratio [AOR] 4.00, 95% CI 2.72-5.87), anxiety (AOR 2.39, 95% CI 1.80-3.20), and both anxiety and depression (AOR 8.30, 95% CI 5.90-11.7). Living in Brazil was associated with increased odds of depression (AOR 2.89, 95% CI 2.07-4.06), anxiety (AOR 2.81, 95%CI 2.11-3.74), and both conditions (AOR 5.99, 95% CI 4.53-7.91). CONCLUSIONS: Interventions addressing lifestyle may be useful in dealing with symptoms of common mental disorders during the strain imposed among essential workers by the COVID-19 pandemic. Essential workers who live in middle-income countries with higher rates of inequality may face additional challenges. Ensuring equitable treatment and support may be an important challenge ahead, considering the possible syndemic effect of the social determinants of health.
Assuntos
Ansiedade/epidemiologia , Infecções por Coronavirus/epidemiologia , Depressão/epidemiologia , Emprego/economia , Emprego/estatística & dados numéricos , Inquéritos Epidemiológicos , Estilo de Vida , Saúde Mental/estatística & dados numéricos , Pneumonia Viral/epidemiologia , Adulto , Brasil/epidemiologia , COVID-19 , Infecções por Coronavirus/psicologia , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Razão de Chances , Pandemias , Pneumonia Viral/psicologia , Prevalência , Autorrelato , Fatores Socioeconômicos , Espanha/epidemiologiaRESUMO
Recent advances in deep learning methods have redefined the state-of-the-art for many medical imaging applications, surpassing previous approaches and sometimes even competing with human judgment in several tasks. Those models, however, when trained to reduce the empirical risk on a single domain, fail to generalize when applied to other domains, a very common scenario in medical imaging due to the variability of images and anatomical structures, even across the same imaging modality. In this work, we extend the method of unsupervised domain adaptation using self-ensembling for the semantic segmentation task and explore multiple facets of the method on a small and realistic publicly-available magnetic resonance (MRI) dataset. Through an extensive evaluation, we show that self-ensembling can indeed improve the generalization of the models even when using a small amount of unlabeled data.
Assuntos
Diagnóstico por Imagem/métodos , Processamento de Imagem Assistida por Computador/métodos , Aprendizado de Máquina não Supervisionado , Humanos , Imageamento por Ressonância Magnética/métodosRESUMO
OBJECTIVES: The International Society for Bipolar Disorders Big Data Task Force assembled leading researchers in the field of bipolar disorder (BD), machine learning, and big data with extensive experience to evaluate the rationale of machine learning and big data analytics strategies for BD. METHOD: A task force was convened to examine and integrate findings from the scientific literature related to machine learning and big data based studies to clarify terminology and to describe challenges and potential applications in the field of BD. We also systematically searched PubMed, Embase, and Web of Science for articles published up to January 2019 that used machine learning in BD. RESULTS: The results suggested that big data analytics has the potential to provide risk calculators to aid in treatment decisions and predict clinical prognosis, including suicidality, for individual patients. This approach can advance diagnosis by enabling discovery of more relevant data-driven phenotypes, as well as by predicting transition to the disorder in high-risk unaffected subjects. We also discuss the most frequent challenges that big data analytics applications can face, such as heterogeneity, lack of external validation and replication of some studies, cost and non-stationary distribution of the data, and lack of appropriate funding. CONCLUSION: Machine learning-based studies, including atheoretical data-driven big data approaches, provide an opportunity to more accurately detect those who are at risk, parse-relevant phenotypes as well as inform treatment selection and prognosis. However, several methodological challenges need to be addressed in order to translate research findings to clinical settings.
Assuntos
Big Data , Transtorno Bipolar/terapia , Tomada de Decisão Clínica , Aprendizado de Máquina , Ideação Suicida , Comitês Consultivos , Transtorno Bipolar/epidemiologia , Ciência de Dados , Humanos , Fenótipo , Prognóstico , Medição de RiscoRESUMO
Interest in docking technologies has grown parallel to the ever increasing number and diversity of 3D models for macromolecular therapeutic targets. Structure-Based Virtual Screening (SBVS) aims at leveraging these experimental structures to discover the necessary starting points for the drug discovery process. It is now established that Machine Learning (ML) can strongly enhance the predictive accuracy of scoring functions for SBVS by exploiting large datasets from targets, molecules and their associations. However, with greater choice, the question of which ML-based scoring function is the most suitable for prospective use on a given target has gained importance. Here we analyse two approaches to select an existing scoring function for the target along with a third approach consisting in generating a scoring function tailored to the target. These analyses required discussing the limitations of popular SBVS benchmarks, the alternatives to benchmark scoring functions for SBVS and how to generate them or use them using freely-available software.
Assuntos
Descoberta de Drogas , Aprendizado de Máquina , Preparações Farmacêuticas/química , Relação Estrutura-Atividade , HumanosRESUMO
Ligand-based Virtual Screening (VS) methods aim at identifying molecules with a similar activity profile across phenotypic and macromolecular targets to that of a query molecule used as search template. VS using 3D similarity methods have the advantage of biasing this search toward active molecules with innovative chemical scaffolds, which are highly sought after in drug design to provide novel leads with improved properties over the query molecule (e.g. patentable, of lower toxicity or increased potency). Ultrafast Shape Recognition (USR) has demonstrated excellent performance in the discovery of molecules with previously-unknown phenotypic or target activity, with retrospective studies suggesting that its pharmacophoric extension (USRCAT) should obtain even better hit rates once it is used prospectively. Here we present USR-VS (http://usr.marseille.inserm.fr/), the first web server using these two validated ligand-based 3D methods for large-scale prospective VS. In about 2 s, 93.9 million 3D conformers, expanded from 23.1 million purchasable molecules, are screened and the 100 most similar molecules among them in terms of 3D shape and pharmacophoric properties are shown. USR-VS functionality also provides interactive visualization of the similarity of the query molecule against the hit molecules as well as vendor information to purchase selected hits in order to be experimentally tested.
Assuntos
Avaliação Pré-Clínica de Medicamentos/métodos , Internet , Preparações Farmacêuticas/análise , Preparações Farmacêuticas/química , Software , Desenho de Fármacos , Fluspirileno/química , Indóis/química , Ligantes , Reprodutibilidade dos Testes , Sulfonamidas/química , VemurafenibRESUMO
BACKGROUND: Pose generation error is usually quantified as the difference between the geometry of the pose generated by the docking software and that of the same molecule co-crystallised with the considered protein. Surprisingly, the impact of this error on binding affinity prediction is yet to be systematically analysed across diverse protein-ligand complexes. RESULTS: Against commonly-held views, we have found that pose generation error has generally a small impact on the accuracy of binding affinity prediction. This is also true for large pose generation errors and it is not only observed with machine-learning scoring functions, but also with classical scoring functions such as AutoDock Vina. Furthermore, we propose a procedure to correct a substantial part of this error which consists of calibrating the scoring functions with re-docked, rather than co-crystallised, poses. In this way, the relationship between Vina-generated protein-ligand poses and their binding affinities is directly learned. As a result, test set performance after this error-correcting procedure is much closer to that of predicting the binding affinity in the absence of pose generation error (i.e. on crystal structures). We evaluated several strategies, obtaining better results for those using a single docked pose per ligand than those using multiple docked poses per ligand. CONCLUSIONS: Binding affinity prediction is often carried out on the docked pose of a known binder rather than its co-crystallised pose. Our results suggest than pose generation error is in general far less damaging for binding affinity prediction than it is currently believed. Another contribution of our study is the proposal of a procedure that largely corrects for this error. The resulting machine-learning scoring function is freely available at http://istar.cse.cuhk.edu.hk/rf-score-4.tgz and http://ballester.marseille.inserm.fr/rf-score-4.tgz .
Assuntos
Simulação de Acoplamento Molecular/normas , Proteínas Nucleares/metabolismo , Pirazinas/metabolismo , Software , Fatores de Transcrição/metabolismo , Humanos , Ligantes , Proteínas Nucleares/química , Ligação Proteica , Conformação Proteica , Pirazinas/química , Fatores de Transcrição/químicaRESUMO
Docking scoring functions can be used to predict the strength of protein-ligand binding. It is widely believed that training a scoring function with low-quality data is detrimental for its predictive performance. Nevertheless, there is a surprising lack of systematic validation experiments in support of this hypothesis. In this study, we investigated to which extent training a scoring function with data containing low-quality structural and binding data is detrimental for predictive performance. We actually found that low-quality data is not only non-detrimental, but beneficial for the predictive performance of machine-learning scoring functions, though the improvement is less important than that coming from high-quality data. Furthermore, we observed that classical scoring functions are not able to effectively exploit data beyond an early threshold, regardless of its quality. This demonstrates that exploiting a larger data volume is more important for the performance of machine-learning scoring functions than restricting to a smaller set of higher data quality.