RESUMO
Machine learning assembles a broad set of methods and techniques to solve a wide range of problems, such as identifying individuals with substance use disorders (SUD), finding patterns in neuroimages, understanding SUD prognostic factors and their association, or determining addiction genetic underpinnings. However, the addiction research field underuses machine learning. This two-part narrative review focuses on machine learning tools and concepts, providing an introductory insight into their capabilities to facilitate their understanding and acquisition by addiction researchers. This first part presents supervised and unsupervised methods such as linear models, naive Bayes, support vector machines, artificial neural networks, and k-means. We illustrate each technique with examples of its use in current addiction research. We also present some open-source programming tools and methodological good practices that facilitate using these techniques. Throughout this work, we emphasize a continuum between applied statistics and machine learning, we show their commonalities, and provide sources for further reading to deepen the understanding of these methods. This two-part review is a primer for the next generation of addiction researchers incorporating machine learning in their projects. Researchers will find a bridge between applied statistics and machine learning, ways to expand their analytical toolkit, recommendations to incorporate well-established good practices in addiction data analysis (e.g., stating the rationale for using newer analytical tools, calculating sample size, improving reproducibility), and the vocabulary to enhance collaboration between researchers who do not conduct data analyses and those who do.
Assuntos
Comportamento Aditivo , Transtornos Relacionados ao Uso de Substâncias , Teorema de Bayes , Comportamento Aditivo/diagnóstico , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Máquina de Vetores de SuporteRESUMO
In a continuum with applied statistics, machine learning offers a wide variety of tools to explore, analyze, and understand addiction data. These tools include algorithms that can leverage useful information from data to build models; these models can solve particular tasks to answer addiction scientific questions. In this second part of a two-part review on machine learning, we explain how to apply machine learning methods to addiction research. Like other analytical tools, machine learning methods require a careful implementation to carry out a reproducible and transparent research process with reliable results. This review describes a workflow to guide the application of machine learning in addiction research, detailing study design, data collection, data pre-processing, modeling, and results communication. How to train, validate, and test a model, detect and characterize overfitting, and determine an adequate sample size are some of the key issues when applying machine learning. We also illustrate the process and particular nuances with examples of how researchers in addiction have applied machine learning techniques with different goals, study designs, or data sources as well as explain the main limitations of machine learning approaches and how to best address them. A good use of machine learning enriches the addiction research toolkit.