Pesquisa | Secretaria de Estado da Saúde

Practical foundations of machine learning for addiction research. Part I. Methods and techniques.

Cresta Morgado, Pablo; Carusso, Martín; Alonso Alemany, Laura; Acion, Laura.

Am J Drug Alcohol Abuse ; 48(3): 260-271, 2022 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-35389305

RESUMO

Machine learning assembles a broad set of methods and techniques to solve a wide range of problems, such as identifying individuals with substance use disorders (SUD), finding patterns in neuroimages, understanding SUD prognostic factors and their association, or determining addiction genetic underpinnings. However, the addiction research field underuses machine learning. This two-part narrative review focuses on machine learning tools and concepts, providing an introductory insight into their capabilities to facilitate their understanding and acquisition by addiction researchers. This first part presents supervised and unsupervised methods such as linear models, naive Bayes, support vector machines, artificial neural networks, and k-means. We illustrate each technique with examples of its use in current addiction research. We also present some open-source programming tools and methodological good practices that facilitate using these techniques. Throughout this work, we emphasize a continuum between applied statistics and machine learning, we show their commonalities, and provide sources for further reading to deepen the understanding of these methods. This two-part review is a primer for the next generation of addiction researchers incorporating machine learning in their projects. Researchers will find a bridge between applied statistics and machine learning, ways to expand their analytical toolkit, recommendations to incorporate well-established good practices in addiction data analysis (e.g., stating the rationale for using newer analytical tools, calculating sample size, improving reproducibility), and the vocabulary to enhance collaboration between researchers who do not conduct data analyses and those who do.

Assuntos

Comportamento Aditivo , Transtornos Relacionados ao Uso de Substâncias , Teorema de Bayes , Comportamento Aditivo/diagnóstico , Humanos , Aprendizado de Máquina , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte

Practical foundations of machine learning for addiction research. Part II. Workflow and use cases.

Cresta Morgado, Pablo; Carusso, Martín; Alonso Alemany, Laura; Acion, Laura.

Am J Drug Alcohol Abuse ; 48(3): 272-283, 2022 05 04.

Artigo em Inglês | MEDLINE | ID: mdl-35390266

RESUMO

In a continuum with applied statistics, machine learning offers a wide variety of tools to explore, analyze, and understand addiction data. These tools include algorithms that can leverage useful information from data to build models; these models can solve particular tasks to answer addiction scientific questions. In this second part of a two-part review on machine learning, we explain how to apply machine learning methods to addiction research. Like other analytical tools, machine learning methods require a careful implementation to carry out a reproducible and transparent research process with reliable results. This review describes a workflow to guide the application of machine learning in addiction research, detailing study design, data collection, data pre-processing, modeling, and results communication. How to train, validate, and test a model, detect and characterize overfitting, and determine an adequate sample size are some of the key issues when applying machine learning. We also illustrate the process and particular nuances with examples of how researchers in addiction have applied machine learning techniques with different goals, study designs, or data sources as well as explain the main limitations of machine learning approaches and how to best address them. A good use of machine learning enriches the addiction research toolkit.

Assuntos

Aprendizado de Máquina , Coleta de Dados , Humanos , Fluxo de Trabalho

Insights ling³ísticos relativos a la normalización léxica de contenidos generados por usuarios / Linguistic insights on the lexical normalizati on of user-generated content

Alonso Alemany, Laura.

Subj. procesos cogn ; 14(2): 20-31, dic. 2010. tab

Artigo em Espanhol | BINACIS | ID: bin-125399

RESUMO

Presentamos trabajo en progreso acerca de la normalización de palabras para contenidos generados por usuarios. El enfoque es simple y ayuda a reducir el volumen de anotaciones manuales características de enfoques más clásicos. Primero, agrupamos las variantes ortográficas de una palabra, mayormente las abreviaturas. De estos ejemplos agrupados manualmente aprendemos un clasificador automático que, dada una palabra no vista anteriormente, determina si es una variación ortográfica de una palabra conocida o si es una palabra totalmente nueva. Para lograr eso, calculamos la similitud entre la palabra no vista y todas las palabras conocidas, y clasificamos la nueva palabra como una variante ortográfica de su palabra más similar. El clasificador aplica una medida de similitud de secuencia de caracteres basada en la distancia de edición Levenshtein. Para mejorar la exactitud de esta medida, le asignamos a las operaciones de edición un costo basado en el error. Este esquema de asignación de costos apunta a maximizar la distancia entre secuencias similares que son variantes de diferentes palabras. Esta medida establecida de similitud alcanza una exactitud de .68, una importante mejoría si la comparamos con el .54 obtenido por la distancia Levenshtein.(AU)

We present work in progress on word normalization for user-generated content. The approach is simple and helps in reducing the amount of manual annotation characteristic of more classical approaches. First, orthographic variants of a word, mostly abbreviations, are grouped together. From these manually grouped examples,we learn an automated classifier that, given a previously unseen word, determines whether it is an orthographic variant of a known word or an entirely new word. To do that, we calculate the similarity between the unseen word and all known words, and classify the new word as an orthographic variant of its most similar word. The classifier applies a string similarity measure based on the Levenshtein edit distance. To improve the accuracy of this measure, we assign edit operations an error-based cost. This scheme of cost assigning aims to maximize the distance between similarstrings that are variants of different words. This custom similarity measure achieves an accuracy of .68, an important improvement if we compare it with the .54 obtained by the Levenshtein distance.(AU)

Assuntos

Psicologia , Idioma , Fala , Reconhecimento Automatizado de Padrão

Insights lingüísticos relativos a la normalización léxica de contenidos generados por usuarios / Linguistic insights on the lexical normalizati on of user-generated content

Alonso Alemany, Laura.

Subj. procesos cogn ; 14(2): 20-31, dic. 2010. tab

Artigo em Espanhol | LILACS | ID: lil-576373

RESUMO

Assuntos

Fala , Idioma , Psicologia , Reconhecimento Automatizado de Padrão

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa