RESUMO
We present an ensemble transfer learning method to predict suicide from Veterans Affairs (VA) electronic medical records (EMR). A diverse set of base models was trained to predict a binary outcome constructed from reported suicide, suicide attempt, and overdose diagnoses with varying choices of study design and prediction methodology. Each model used twenty cross-sectional and 190 longitudinal variables observed in eight time intervals covering 7.5 years prior to the time of prediction. Ensembles of seven base models were created and fine-tuned with ten variables expected to change with study design and outcome definition in order to predict suicide and combined outcome in a prospective cohort. The ensemble models achieved c-statistics of 0.73 on 2-year suicide risk and 0.83 on the combined outcome when predicting on a prospective cohort of [Formula: see text] 4.2 M veterans. The ensembles rely on nonlinear base models trained using a matched retrospective nested case-control (Rcc) study cohort and show good calibration across a diversity of subgroups, including risk strata, age, sex, race, and level of healthcare utilization. In addition, a linear Rcc base model provided a rich set of biological predictors, including indicators of suicide, substance use disorder, mental health diagnoses and treatments, hypoxia and vascular damage, and demographics.
Assuntos
Carcinoma de Células Renais , Neoplasias Renais , Veteranos , Humanos , Veteranos/psicologia , Estudos Retrospectivos , Estudos Transversais , Estudos Prospectivos , Tentativa de Suicídio , Aprendizado de MáquinaRESUMO
OBJECTIVE: To determine whether automated, electronic alerts increased referrals for epilepsy surgery. METHODS: We conducted a prospective, randomized controlled trial of a natural language processing-based clinical decision support system embedded in the electronic health record (EHR) at 14 pediatric neurology outpatient clinic sites. Children with epilepsy and at least two prior neurology visits were screened by the system prior to their scheduled visit. Patients classified as a potential surgical candidate were randomized 2:1 for their provider to receive an alert or standard of care (no alert). The primary outcome was referral for a neurosurgical evaluation. The likelihood of referral was estimated using a Cox proportional hazards regression model. RESULTS: Between April 2017 and April 2019, at total of 4858 children were screened by the system, and 284 (5.8%) were identified as potential surgical candidates. Two hundred four patients received an alert, and 96 patients received standard care. Median follow-up time was 24 months (range: 12-36 months). Compared to the control group, patients whose provider received an alert were more likely to be referred for a presurgical evaluation (3.1% vs 9.8%; adjusted hazard ratio [HR] = 3.21, 95% confidence interval [CI]: 0.95-10.8; one-sided p = .03). Nine patients (4.4%) in the alert group underwent epilepsy surgery, compared to none (0%) in the control group (one-sided p = .03). SIGNIFICANCE: Machine learning-based automated alerts may improve the utilization of referrals for epilepsy surgery evaluations.
Assuntos
Registros Eletrônicos de Saúde , Epilepsia , Humanos , Criança , Estudos Prospectivos , Aprendizado de Máquina , Epilepsia/diagnóstico , Epilepsia/cirurgia , Encaminhamento e ConsultaRESUMO
OBJECTIVES: Epilepsy surgery is underutilized. Automating the identification of potential surgical candidates may facilitate earlier intervention. Our objective was to develop site-specific machine learning (ML) algorithms to identify candidates before they undergo surgery. MATERIALS & METHODS: In this multicenter, retrospective, longitudinal cohort study, ML algorithms were trained on n-grams extracted from free-text neurology notes, EEG and MRI reports, visit codes, medications, procedures, laboratories, and demographic information. Site-specific algorithms were developed at two epilepsy centers: one pediatric and one adult. Cases were defined as patients who underwent resective epilepsy surgery, and controls were patients with epilepsy with no history of surgery. The output of the ML algorithms was the estimated likelihood of candidacy for resective epilepsy surgery. Model performance was assessed using 10-fold cross-validation. RESULTS: There were 5880 children (n = 137 had surgery [2.3%]) and 7604 adults with epilepsy (n = 56 had surgery [0.7%]) included in the study. Pediatric surgical patients could be identified 2.0 years (range: 0-8.6 years) before beginning their presurgical evaluation with AUC =0.76 (95% CI: 0.70-0.82) and PR-AUC =0.13 (95% CI: 0.07-0.18). Adult surgical patients could be identified 1.0 year (range: 0-5.4 years) before beginning their presurgical evaluation with AUC =0.85 (95% CI: 0.78-0.93) and PR-AUC =0.31 (95% CI: 0.14-0.48). By the time patients began their presurgical evaluation, the ML algorithms identified pediatric and adult surgical patients with AUC =0.93 and 0.95, respectively. The mean squared error of the predicted probability of surgical candidacy (Brier scores) was 0.018 in pediatrics and 0.006 in adults. CONCLUSIONS: Site-specific machine learning algorithms can identify candidates for epilepsy surgery early in the disease course in diverse practice settings.
Assuntos
Algoritmos , Epilepsia/diagnóstico por imagem , Epilepsia/cirurgia , Aprendizado de Máquina , Adolescente , Adulto , Criança , Pré-Escolar , Estudos de Coortes , Diagnóstico Precoce , Eletroencefalografia/métodos , Epilepsia/fisiopatologia , Feminino , Humanos , Estudos Longitudinais , Imageamento por Ressonância Magnética/métodos , Masculino , Pessoa de Meia-Idade , Estudos Retrospectivos , Adulto JovemRESUMO
BACKGROUND: As adolescent suicide rates continue to rise, innovation in risk identification is warranted. Machine learning can identify suicidal individuals based on their language samples. This feasibility pilot was conducted to explore this technology's use in adolescent therapy sessions and assess machine learning model performance. METHOD: Natural language processing machine learning models to identify level of suicide risk using a smartphone app were tested in outpatient therapy sessions. Data collection included language samples, depression and suicidality standardized scale scores, and therapist impression of the client's mental state. Previously developed models were used to predict suicidal risk. RESULTS: 267 interviews were collected from 60 students in eight schools by ten therapists, with 29 students indicating suicide or self-harm risk. During external validation, models were trained on suicidal speech samples collected from two separate studies. We found that support vector machines (AUC: 0.75; 95% CI: 0.69-0.81) and logistic regression (AUC: 0.76; 95% CI: 0.70-0.82) lead to good discriminative ability, with an extreme gradient boosting model performing the best (AUC: 0.78; 95% CI: 0.72-0.84). CONCLUSION: Voice collection technology and associated procedures can be integrated into mental health therapists' workflow. Collected language samples could be classified with good discrimination using machine learning methods.
Assuntos
Comportamento Autodestrutivo , Prevenção do Suicídio , Adolescente , Estudos de Viabilidade , Humanos , Aprendizado de Máquina , Masculino , Ideação SuicidaRESUMO
OBJECTIVE: With early identification and intervention, many suicidal deaths are preventable. Tools that include machine learning methods have been able to identify suicidal language. This paper examines the persistence of this suicidal language up to 30 days after discharge from care. METHOD: In a multi-center study, 253 subjects were enrolled into either suicidal or control cohorts. Their responses to standardized instruments and interviews were analyzed using machine learning algorithms. Subjects were re-interviewed approximately 30 days later, and their language was compared to the original language to determine the presence of suicidal ideation. RESULTS: The results show that language characteristics used to classify suicidality at the initial encounter are still present in the speech 30 days later (AUC = 89% (95% CI: 85-95%), p < .0001) and that algorithms trained on the second interviews could also identify the subjects that produced the first interviews (AUC = 85% (95% CI: 81-90%), p < .0001). CONCLUSIONS: This approach explores the stability of suicidal language. When using advanced computational methods, the results show that a patient's language is similar 30 days after first captured, while responses to standard measures change. This can be useful when developing methods that identify the data-based phenotype of a subject.
Assuntos
Idioma , Ideação Suicida , Algoritmos , Humanos , Aprendizado de Máquina , Medição de RiscoRESUMO
OBJECTIVE: People with epilepsy are at increased risk for mental health comorbidities. Machine-learning methods based on spoken language can detect suicidality in adults. This study's purpose was to use spoken words to create machine-learning classifiers that identify current or lifetime history of comorbid psychiatric conditions in teenagers and young adults with epilepsy. MATERIALS AND METHODS: Eligible participants were >12 years old with epilepsy. All participants were interviewed using the Mini International Neuropsychiatric Interview (MINI) or the MINI Kid Tracking and asked five open-ended conversational questions. N-grams and Linguistic Inquiry and Word Count (LIWC) word categories were used to construct machine learning classification models from language harvested from interviews. Data were analyzed for four individual MINI identified disorders and for three mutually exclusive groups: participants with no psychiatric disorders, participants with non-suicidal psychiatric disorders, and participants with any degree of suicidality. Performance was measured using areas under the receiver operating characteristic curve (AROCs). RESULTS: Classifiers were constructed from 227 interviews with 122 participants (7.5 ± 3.1 minutes and 454 ± 299 words). AROCs for models differentiating the non-overlapping groups and individual disorders ranged 57%-78% (many with P < .02). DISCUSSION AND CONCLUSION: Machine-learning classifiers of spoken language can reliably identify current or lifetime history of suicidality and depression in people with epilepsy. Data suggest identification of anxiety and bipolar disorders may be achieved with larger data sets. Machine-learning analysis of spoken language can be promising as a useful screening alternative when traditional approaches are unwieldy (eg, telephone calls, primary care offices, school health clinics).
Assuntos
Epilepsia/psicologia , Aprendizado de Máquina , Transtornos Mentais/diagnóstico , Transtornos Mentais/epidemiologia , Adolescente , Criança , Comorbidade , Feminino , Humanos , Idioma , Masculino , Transtornos Mentais/etiologia , Escalas de Graduação Psiquiátrica , Adulto JovemRESUMO
OBJECTIVE: Delay to resective epilepsy surgery results in avoidable disease burden and increased risk of mortality. The objective was to prospectively validate a natural language processing (NLP) application that uses provider notes to assign epilepsy surgery candidacy scores. METHODS: The application was trained on notes from (1) patients with a diagnosis of epilepsy and a history of resective epilepsy surgery and (2) patients who were seizure-free without surgery. The testing set included all patients with unknown surgical candidacy status and an upcoming neurology visit. Training and testing sets were updated weekly for 1 year. One- to three-word phrases contained in patients' notes were used as features. Patients prospectively identified by the application as candidates for surgery were manually reviewed by two epileptologists. Performance metrics were defined by comparing NLP-derived surgical candidacy scores with surgical candidacy status from expert chart review. RESULTS: The training set was updated weekly and included notes from a mean of 519 ± 67 patients. The area under the receiver operating characteristic curve (AUC) from 10-fold cross-validation was 0.90 ± 0.04 (range = 0.83-0.96) and improved by 0.002 per week (P < .001) as new patients were added to the training set. Of the 6395 patients who visited the neurology clinic, 4211 (67%) were evaluated by the model. The prospective AUC on this test set was 0.79 (95% confidence interval [CI] = 0.62-0.96). Using the optimal surgical candidacy score threshold, sensitivity was 0.80 (95% CI = 0.29-0.99), specificity was 0.77 (95% CI = 0.64-0.88), positive predictive value was 0.25 (95% CI = 0.07-0.52), and negative predictive value was 0.98 (95% CI = 0.87-1.00). The number needed to screen was 5.6. SIGNIFICANCE: An electronic health record-integrated NLP application can accurately assign surgical candidacy scores to patients in a clinical setting.
Assuntos
Registros Eletrônicos de Saúde , Epilepsia/cirurgia , Aprendizado de Máquina , Processamento de Linguagem Natural , Seleção de Pacientes , Adolescente , Adulto , Criança , Pré-Escolar , Sistemas de Apoio a Decisões Clínicas , Feminino , Humanos , Lactente , Recém-Nascido , Masculino , Pessoa de Meia-Idade , Estudos Prospectivos , Adulto JovemRESUMO
Racial disparities in the utilization of epilepsy surgery are well documented, but it is unknown whether a natural language processing (NLP) algorithm trained on physician notes would produce biased recommendations for epilepsy presurgical evaluations. To assess this, an NLP algorithm was trained to identify potential surgical candidates using 1097 notes from 175 epilepsy patients with a history of resective epilepsy surgery and 268 patients who achieved seizure freedom without surgery (total N = 443 patients). The model was tested on 8340 notes from 3776 patients with epilepsy whose surgical candidacy status was unknown (2029 male, 1747 female, median age = 9 years; age range = 0-60 years). Multiple linear regression using demographic variables as covariates was used to test for correlations between patient race and surgical candidacy scores. After accounting for other demographic and socioeconomic variables, patient race, gender, and primary language did not influence surgical candidacy scores (P > .35 for all). Higher scores were given to patients >18 years old who traveled farther to receive care, and those who had a higher family income and public insurance (P < .001, .001, .001, and .01, respectively). Demographic effects on surgical candidacy scores appeared to reflect patterns in patient referrals.
Assuntos
Epilepsia/cirurgia , Disparidades em Assistência à Saúde , Aprendizado de Máquina , Seleção de Pacientes , Preconceito , Adolescente , Adulto , Fatores Etários , Algoritmos , Criança , Pré-Escolar , Eletroencefalografia , Humanos , Lactente , Pessoa de Meia-Idade , Encaminhamento e Consulta , Adulto JovemRESUMO
BACKGROUND: Probabilistic assessments of clinical care are essential for quality care. Yet, machine learning, which supports this care process has been limited to categorical results. To maximize its usefulness, it is important to find novel approaches that calibrate the ML output with a likelihood scale. Current state-of-the-art calibration methods are generally accurate and applicable to many ML models, but improved granularity and accuracy of such methods would increase the information available for clinical decision making. This novel non-parametric Bayesian approach is demonstrated on a variety of data sets, including simulated classifier outputs, biomedical data sets from the University of California, Irvine (UCI) Machine Learning Repository, and a clinical data set built to determine suicide risk from the language of emergency department patients. RESULTS: The method is first demonstrated on support-vector machine (SVM) models, which generally produce well-behaved, well understood scores. The method produces calibrations that are comparable to the state-of-the-art Bayesian Binning in Quantiles (BBQ) method when the SVM models are able to effectively separate cases and controls. However, as the SVM models' ability to discriminate classes decreases, our approach yields more granular and dynamic calibrated probabilities comparing to the BBQ method. Improvements in granularity and range are even more dramatic when the discrimination between the classes is artificially degraded by replacing the SVM model with an ad hoc k-means classifier. CONCLUSIONS: The method allows both clinicians and patients to have a more nuanced view of the output of an ML model, allowing better decision making. The method is demonstrated on simulated data, various biomedical data sets and a clinical data set, to which diverse ML methods are applied. Trivially extending the method to (non-ML) clinical scores is also discussed.