RESUMEN
Applications of machine learning in the biomedical sciences are growing rapidly. This growth has been spurred by diverse cross-institutional and interdisciplinary collaborations, public availability of large datasets, an increase in the accessibility of analytic routines, and the availability of powerful computing resources. With this increased access and exposure to machine learning comes a responsibility for education and a deeper understanding of its bases and bounds, borne equally by data scientists seeking to ply their analytic wares in medical research and by biomedical scientists seeking to harness such methods to glean knowledge from data. This article provides an accessible and critical review of machine learning for a biomedically informed audience, as well as its applications in psychiatry. The review covers definitions and expositions of commonly used machine learning methods, and historical trends of their use in psychiatry. We also provide a set of standards, namely Guidelines for REporting Machine Learning Investigations in Neuropsychiatry (GREMLIN), for designing and reporting studies that use machine learning as a primary data-analysis approach. Lastly, we propose the establishment of the Machine Learning in Psychiatry (MLPsych) Consortium, enumerate its objectives, and identify areas of opportunity for future applications of machine learning in biological psychiatry. This review serves as a cautiously optimistic primer on machine learning for those on the precipice as they prepare to dive into the field, either as methodological practitioners or well-informed consumers.
Asunto(s)
Psiquiatría Biológica , Aprendizaje Automático , Humanos , Psiquiatría Biológica/métodos , Psiquiatría/métodos , Investigación Biomédica/métodosRESUMEN
BACKGROUND: Suicide is a major public health concern globally. Accurately predicting suicidal behavior remains challenging. This study aimed to use machine learning approaches to examine the potential of the Swedish national registry data for prediction of suicidal behavior. METHODS AND FINDINGS: The study sample consisted of 541,300 inpatient and outpatient visits by 126,205 Sweden-born patients (54% female and 46% male) aged 18 to 39 (mean age at the visit: 27.3) years to psychiatric specialty care in Sweden between January 1, 2011 and December 31, 2012. The most common psychiatric diagnoses at the visit were anxiety disorders (20.0%), major depressive disorder (16.9%), and substance use disorders (13.6%). A total of 425 candidate predictors covering demographic characteristics, socioeconomic status (SES), electronic medical records, criminality, as well as family history of disease and crime were extracted from the Swedish registry data. The sample was randomly split into an 80% training set containing 433,024 visits and a 20% test set containing 108,276 visits. Models were trained separately for suicide attempt/death within 90 and 30 days following a visit using multiple machine learning algorithms. Model discrimination and calibration were both evaluated. Among all eligible visits, 3.5% (18,682) were followed by a suicide attempt/death within 90 days and 1.7% (9,099) within 30 days. The final models were based on ensemble learning that combined predictions from elastic net penalized logistic regression, random forest, gradient boosting, and a neural network. The area under the receiver operating characteristic (ROC) curves (AUCs) on the test set were 0.88 (95% confidence interval [CI] = 0.87-0.89) and 0.89 (95% CI = 0.88-0.90) for the outcome within 90 days and 30 days, respectively, both being significantly better than chance (i.e., AUC = 0.50) (p < 0.01). Sensitivity, specificity, and predictive values were reported at different risk thresholds. A limitation of our study is that our models have not yet been externally validated, and thus, the generalizability of the models to other populations remains unknown. CONCLUSIONS: By combining the ensemble method of multiple machine learning algorithms and high-quality data solely from the Swedish registers, we developed prognostic models to predict short-term suicide attempt/death with good discrimination and calibration. Whether novel predictors can improve predictive performance requires further investigation.
Asunto(s)
Trastorno Depresivo Mayor/psicología , Aprendizaje Automático , Valor Predictivo de las Pruebas , Intento de Suicidio/psicología , Adulto , Trastorno Depresivo Mayor/diagnóstico , Femenino , Humanos , Masculino , Sistema de Registros , Medición de Riesgo/estadística & datos numéricos , Factores de Riesgo , Ideación Suicida , Suecia , Adulto JovenRESUMEN
Cambodia, in which both Plasmodium vivax and Plasmodium falciparum are endemic, has been the focus of numerous malaria-control interventions, resulting in a marked decline in overall malaria incidence. Despite this decline, the number of P vivax cases has actually increased. To understand better the factors underlying this resilience, we compared the genetic responses of the two species to recent selective pressures. We sequenced and studied the genomes of 70 P vivax and 80 P falciparum isolates collected between 2009 and 2013. We found that although P falciparum has undergone population fracturing, the coendemic P vivax population has grown undisrupted, resulting in a larger effective population size, no discernable population structure, and frequent multiclonal infections. Signatures of selection suggest recent, species-specific evolutionary differences. Particularly, in contrast to P falciparum, P vivax transcription factors, chromatin modifiers, and histone deacetylases have undergone strong directional selection, including a particularly strong selective sweep at an AP2 transcription factor. Together, our findings point to different population-level adaptive mechanisms used by P vivax and P falciparum parasites. Although population substructuring in P falciparum has resulted in clonal outgrowths of resistant parasites, P vivax may use a nuanced transcriptional regulatory approach to population maintenance, enabling it to preserve a larger, more diverse population better suited to facing selective threats. We conclude that transcriptional control may underlie P vivax's resilience to malaria control measures. Novel strategies to target such processes are likely required to eradicate P vivax and achieve malaria elimination.
Asunto(s)
Malaria Vivax/prevención & control , Malaria Vivax/parasitología , Plasmodium vivax/genética , Cambodia/epidemiología , Variaciones en el Número de Copia de ADN , ADN Protozoario/genética , Resistencia a Medicamentos/genética , Enfermedades Endémicas/prevención & control , Variación Genética , Genoma de Protozoos , Haplotipos , Humanos , Malaria Falciparum/epidemiología , Malaria Falciparum/parasitología , Malaria Falciparum/prevención & control , Malaria Vivax/epidemiología , Plasmodium falciparum/genética , Polimorfismo de Nucleótido Simple , Selección Genética , Especificidad de la Especie , Transcripción GenéticaRESUMEN
BACKGROUND: Cotrimoxazole preventive therapy (CPT) is recommended for all human immunodeficiency virus (HIV)-exposed infants to avoid opportunistic infections. Cotrimoxazole has antimalarial effects and appears to reduce clinical malaria infections, but the impact on asymptomatic malaria infections is unknown. METHODS: We conducted an observational cohort study using data and dried blood spots (DBSs) from the Breastfeeding, Antiretrovirals and Nutrition study to evaluate the impact of CPT on malaria infection during peak malaria season in Lilongwe, Malawi. We compared malaria incidence 1 year before and after CPT implementation (292 and 682 CPT-unexposed and CPT-exposed infants, respectively), including only infants who remained HIV negative by 36 weeks of age. Malaria was defined as clinical, asymptomatic (using DBSs at 12, 24, and 36 weeks), or a composite outcome of clinical or asymptomatic. Linear and binomial regression with generalized estimating equations were used to estimate the association between CPT and malaria. Differences in characteristics of parasitemias and drug resistance polymorphisms by CPT status were also assessed in the asymptomatic infections. RESULTS: CPT was associated with a 70% (95% confidence interval, 53%-81%) relative reduction in the risk of asymptomatic infection between 6 and 36 weeks of age. CPT appeared to provide temporary protection against clinical malaria and more sustained protection against asymptomatic infections, with no difference in parasitemia characteristics. CONCLUSIONS: CPT appears to reduce overall malaria infections, with more prolonged impacts on asymptomatic infections. Asymptomatic infections are potentially important reservoirs for malaria transmission. Therefore, CPT prophylaxis may have important individual and public health benefits.
Asunto(s)
Antimaláricos/uso terapéutico , Malaria/tratamiento farmacológico , Malaria/epidemiología , Combinación Trimetoprim y Sulfametoxazol/uso terapéutico , Antimaláricos/administración & dosificación , Antimaláricos/farmacología , Infecciones Asintomáticas , Resistencia a Medicamentos , Femenino , Infecciones por VIH , Humanos , Lactante , Malaria/parasitología , Malaui/epidemiología , Masculino , Parasitemia/tratamiento farmacológico , Parasitemia/epidemiología , Parasitemia/parasitología , Plasmodium falciparum/efectos de los fármacos , Plasmodium falciparum/genética , Distribución Aleatoria , Combinación Trimetoprim y Sulfametoxazol/administración & dosificación , Combinación Trimetoprim y Sulfametoxazol/farmacologíaRESUMEN
Many studies have been conducted with the goal of correctly predicting diagnostic status of a disorder using the combination of genomic data and machine learning. It is often hard to judge which components of a study led to better results and whether better reported results represent a true improvement or an uncorrected bias inflating performance. We extracted information about the methods used and other differentiating features in genomic machine learning models. We used these features in linear regressions predicting model performance. We tested for univariate and multivariate associations as well as interactions between features. Of the models reviewed, 46% used feature selection methods that can lead to data leakage. Across our models, the number of hyperparameter optimizations reported, data leakage due to feature selection, model type, and modeling an autoimmune disorder were significantly associated with an increase in reported model performance. We found a significant, negative interaction between data leakage and training size. Our results suggest that methods susceptible to data leakage are prevalent among genomic machine learning research, resulting in inflated reported performance. Best practice guidelines that promote the avoidance and recognition of data leakage may help the field avoid biased results.
Asunto(s)
Genómica , Aprendizaje AutomáticoRESUMEN
In the past decade, significant advances have been made in finding genomic risk loci for schizophrenia (SCZ). This, in turn, has enabled the search for SCZ resilience loci that mitigate the impact of SCZ risk genes. Recently, we discovered the first genomic resilience profile for SCZ, completely independent from the established risk loci for SCZ. We posited that these resilience loci protect against SCZ for those having a heighted genomic risk for SCZ. Nevertheless, our understanding of genetic resilience remains limited. It remains unclear whether resilience loci foster protection against adverse states associated with SCZ risk related to clinical, cognitive, and brain-structural phenotypes. To address this knowledge gap, we analyzed data from 487,409 participants from the UK Biobank, and found that resilience loci for SCZ afforded protection against lifetime psychiatric (schizophrenia, bipolar disorder, anxiety, and depression) and non-psychiatric medical disorders (such as asthma, cardiovascular disease, digestive disorders, metabolic disorders, and external causes of morbidity and mortality). Resilience loci also protected against self-harm behaviors, improved fluid intelligence, and larger whole-brain and brain-regional sizes. Overall, this study sheds light on the range of phenotypes that are significantly associated with resilience loci within the general population, revealing distinct patterns separate from those associated with SCZ risk loci. Our findings indicate that resilience loci may offer protection against serious psychiatric and medical outcomes, co-morbidities, and cognitive impairment. Therefore, it is conceivable that resilience loci facilitate adaptive processes linked to improved health and life expectancy.
RESUMEN
Objective: Mood disorders often co-occur with attention-deficit/hyperactive disorder (ADHD), disruptive behavior disorders (DBDs), and aggression. We aimed to determine if polygenic risk scores (PRSs) based on external genome-wide association studies (GWASs) of these disorders could improve genetic identification of mood disorders.Methods: We combined 6 independent family studies that had genetic data and diagnoses for mood disorders that were made using different editions of the Diagnostic and Statistical Manual of Mental Disorders (DSM). We identified mood disorders, either concurrently or in the future, in participants between 6 and 17 years of age using PRSs calculated using summary statistics of GWASs for ADHD, ADHD with DBD, major depressive disorder (MDD), bipolar disorder (BPD), and aggression to compute PRSs.Results: In our sample of 485 youths, 356 (73%) developed a subthreshold or full mood disorder and 129 (27%) did not. The cross-validated mean areas under the receiver operating characteristic curve (AUCs) for the 7 models identifying participants with any mood disorder ranged from 0.552 in the base model of age and sex to 0.648 in the base model + all 5 PRSs. When included in the base model individually, the ADHD PRS (OR = 1.65, P < .001), Aggression PRS (OR = 1.27, P = .02), and MDD PRS (OR = 1.23, P = .047) were significantly associated with the development of any mood disorder.Conclusions: Using PRSs for ADHD, MDD, BPD, DBDs, and aggression, we could modestly identify the presence of mood disorders. These findings extend evidence for transdiagnostic genetic components of psychiatric illness and demonstrate that PRSs calculated using traditional diagnostic boundaries can be useful within a transdiagnostic framework.
Asunto(s)
Trastorno por Déficit de Atención con Hiperactividad , Trastorno Depresivo Mayor , Adolescente , Trastorno por Déficit de Atención con Hiperactividad/diagnóstico , Trastorno por Déficit de Atención con Hiperactividad/epidemiología , Trastorno por Déficit de Atención con Hiperactividad/genética , Niño , Trastorno Depresivo Mayor/diagnóstico , Trastorno Depresivo Mayor/epidemiología , Trastorno Depresivo Mayor/genética , Estudio de Asociación del Genoma Completo , Humanos , Trastornos del Humor/diagnóstico , Trastornos del Humor/epidemiología , Trastornos del Humor/genética , Herencia Multifactorial/genética , Factores de RiesgoRESUMEN
Plasmodium falciparum in western Cambodia has developed resistance to artemisinin and its partner drugs, causing frequent treatment failure. Understanding this evolution can inform the deployment of new therapies. We investigated the genetic architecture of 78 falciparum isolates using whole-genome sequencing, correlating results to in vivo and ex vivo drug resistance and exploring the relationship between population structure, demographic history, and partner drug resistance. Principle component analysis, network analysis and demographic inference identified a diverse central population with three clusters of clonally expanding parasite populations, each associated with specific K13 artemisinin resistance alleles and partner drug resistance profiles which were consistent with the sequential deployment of artemisinin combination therapies in the region. One cluster displayed ex vivo piperaquine resistance and mefloquine sensitivity with a high rate of in vivo failure of dihydroartemisinin-piperaquine. Another cluster displayed ex vivo mefloquine resistance and piperaquine sensitivity with high in vivo efficacy of dihydroartemisinin-piperaquine. The final cluster was clonal and displayed intermediate sensitivity to both drugs. Variations in recently described piperaquine resistance markers did not explain the difference in mean IC90 or clinical failures between the high and intermediate piperaquine resistance groups, suggesting additional loci may be involved in resistance. The results highlight an important role for partner drug resistance in shaping the P. falciparum genetic landscape in Southeast Asia and suggest that further work is needed to evaluate for other mutations that drive piperaquine resistance.