Search | VHL Regional Portal

1.

Identifying Mentions of Pain in Mental Health Records Text: A Natural Language Processing Approach.

Chaturvedi, Jaya; Velupillai, Sumithra; Stewart, Robert; Roberts, Angus.

Stud Health Technol Inform ; 310: 695-699, 2024 Jan 25.

Article in English | MEDLINE | ID: mdl-38269898

ABSTRACT

Pain is a common reason for accessing healthcare resources and is a growing area of research, especially in its overlap with mental health. Mental health electronic health records are a good data source to study this overlap. However, much information on pain is held in the free text of these records, where mentions of pain present a unique natural language processing problem due to its ambiguous nature. This project uses data from an anonymised mental health electronic health records database. A machine learning based classification algorithm is trained to classify sentences as discussing patient pain or not. This will facilitate the extraction of relevant pain information from large databases. 1,985 documents were manually triple-annotated for creation of gold standard training data, which was used to train four classification algorithms. The best performing model achieved an F1-score of 0.98 (95% CI 0.98-0.99).

Subject(s)

Mental Health , Natural Language Processing , Humans , Algorithms , Databases, Factual , Pain

2.

Identifying features of risk periods for suicide attempts using document frequency and language use in electronic health records.

Dutta, Rina; Gkotsis, George; Velupillai, Sumithra U; Downs, Johnny; Roberts, Angus; Stewart, Robert; Hotopf, Matthew.

Front Psychiatry ; 14: 1217649, 2023.

Article in English | MEDLINE | ID: mdl-38152362

ABSTRACT

Background: Individualising mental healthcare at times when a patient is most at risk of suicide involves shifting research emphasis from static risk factors to those that may be modifiable with interventions. Currently, risk assessment is based on a range of extensively reported stable risk factors, but critical to dynamic suicide risk assessment is an understanding of each individual patient's health trajectory over time. The use of electronic health records (EHRs) and analysis using machine learning has the potential to accelerate progress in developing early warning indicators. Setting: EHR data from the South London and Maudsley NHS Foundation Trust (SLaM) which provides secondary mental healthcare for 1.8 million people living in four South London boroughs. Objectives: To determine whether the time window proximal to a hospitalised suicide attempt can be discriminated from a distal period of lower risk by analysing the documentation and mental health clinical free text data from EHRs and (i) investigate whether the rate at which EHR documents are recorded per patient is associated with a suicide attempt; (ii) compare document-level word usage between documents proximal and distal to a suicide attempt; and (iii) compare n-gram frequency related to third-person pronoun use proximal and distal to a suicide attempt using machine learning. Methods: The Clinical Record Interactive Search (CRIS) system allowed access to de-identified information from the EHRs. CRIS has been linked with Hospital Episode Statistics (HES) data for Admitted Patient Care. We analysed document and event data for patients who had at some point between 1 April 2006 and 31 March 2013 been hospitalised with a HES ICD-10 code related to attempted suicide (X60-X84; Y10-Y34; Y87.0/Y87.2). Findings: n = 8,247 patients were identified to have made a hospitalised suicide attempt. Of these, n = 3,167 (39.8%) of patients had at least one document available in their EHR prior to their first suicide attempt. N = 1,424 (45.0%) of these patients had been "monitored" by mental healthcare services in the past 30 days. From 60 days prior to a first suicide attempt, there was a rapid increase in the monitoring level (document recording of the past 30 days) increasing from 35.1 to 45.0%. Documents containing words related to prescribed medications/drugs/overdose/poisoning/addiction had the highest odds of being a risk indicator used proximal to a suicide attempt (OR 1.88; precision 0.91 and recall 0.93), and documents with words citing a care plan were associated with the lowest risk for a suicide attempt (OR 0.22; precision 1.00 and recall 1.00). Function words, word sequence, and pronouns were most common in all three representations (uni-, bi-, and tri-gram). Conclusion: EHR documentation frequency and language use can be used to distinguish periods distal from and proximal to a suicide attempt. However, in our study 55.0% of patients with documentation, prior to their first suicide attempt, did not have a record in the preceding 30 days, meaning that there are a high number who are not seen by services at their most vulnerable point.

3.

Development of a Corpus Annotated With Mentions of Pain in Mental Health Records: Natural Language Processing Approach.

Chaturvedi, Jaya; Chance, Natalia; Mirza, Luwaiza; Vernugopan, Veshalee; Velupillai, Sumithra; Stewart, Robert; Roberts, Angus.

JMIR Form Res ; 7: e45849, 2023 Jun 26.

Article in English | MEDLINE | ID: mdl-37358897

ABSTRACT

BACKGROUND: Pain is a widespread issue, with 20% of adults (1 in 5) experiencing it globally. A strong association has been demonstrated between pain and mental health conditions, and this association is known to exacerbate disability and impairment. Pain is also known to be strongly related to emotions, which can lead to damaging consequences. As pain is a common reason for people to access health care facilities, electronic health records (EHRs) are a potential source of information on this pain. Mental health EHRs could be particularly beneficial since they can show the overlap of pain with mental health. Most mental health EHRs contain the majority of their information within the free-text sections of the records. However, it is challenging to extract information from free text. Natural language processing (NLP) methods are therefore required to extract this information from the text. OBJECTIVE: This research describes the development of a corpus of manually labeled mentions of pain and pain-related entities from the documents of a mental health EHR database, for use in the development and evaluation of future NLP methods. METHODS: The EHR database used, Clinical Record Interactive Search, consists of anonymized patient records from The South London and Maudsley National Health Service Foundation Trust in the United Kingdom. The corpus was developed through a process of manual annotation where pain mentions were marked as relevant (ie, referring to physical pain afflicting the patient), negated (ie, indicating absence of pain), or not relevant (ie, referring to pain affecting someone other than the patient, or metaphorical and hypothetical mentions). Relevant mentions were also annotated with additional attributes such as anatomical location affected by pain, pain character, and pain management measures, if mentioned. RESULTS: A total of 5644 annotations were collected from 1985 documents (723 patients). Over 70% (n=4028) of the mentions found within the documents were annotated as relevant, and about half of these mentions also included the anatomical location affected by the pain. The most common pain character was chronic pain, and the most commonly mentioned anatomical location was the chest. Most annotations (n=1857, 33%) were from patients who had a primary diagnosis of mood disorders (International Classification of Diseases-10th edition, chapter F30-39). CONCLUSIONS: This research has helped better understand how pain is mentioned within the context of mental health EHRs and provided insight into the kind of information that is typically mentioned around pain in such a data source. In future work, the extracted information will be used to develop and evaluate a machine learning-based NLP application to automatically extract relevant pain information from EHR databases.

4.

Development of a Knowledge Graph Embeddings Model for Pain.

Chaturvedi, Jaya; Wang, Tao; Velupillai, Sumithra; Stewart, Robert; Roberts, Angus.

AMIA Annu Symp Proc ; 2023: 299-308, 2023.

Article in English | MEDLINE | ID: mdl-38222382

ABSTRACT

Pain is a complex concept that can interconnect with other concepts such as a disorder that might cause pain, a medication that might relieve pain, and so on. To fully understand the context of pain experienced by either an individual or across a population, we may need to examine all concepts related to pain and the relationships between them. This is especially useful when modeling pain that has been recorded in electronic health records. Knowledge graphs represent concepts and their relations by an interlinked network, enabling semantic and context-based reasoning in a computationally tractable form. These graphs can, however, be too large for efficient computation. Knowledge graph embeddings help to resolve this by representing the graphs in a low-dimensional vector space. These embeddings can then be used in various downstream tasks such as classification and link prediction. The various relations associated with pain which are required to construct such a knowledge graph can be obtained from external medical knowledge bases such as SNOMED CT, a hierarchical systematic nomenclature of medical terms. A knowledge graph built in this way could be further enriched with real-world examples of pain and its relations extracted from electronic health records. This paper describes the construction of such knowledge graph embedding models of pain concepts, extracted from the unstructured text of mental health electronic health records, combined with external knowledge created from relations described in SNOMED CT, and their evaluation on a subject-object link prediction task. The performance of the models was compared with other baseline models.

Subject(s)

Pattern Recognition, Automated , Systematized Nomenclature of Medicine , Humans , Knowledge Bases , Semantics , Electronic Health Records

5.

Assessing machine learning for fair prediction of ADHD in school pupils using a retrospective cohort study of linked education and healthcare data.

Ter-Minassian, Lucile; Viani, Natalia; Wickersham, Alice; Cross, Lauren; Stewart, Robert; Velupillai, Sumithra; Downs, Johnny.

BMJ Open ; 12(12): e058058, 2022 12 05.

Article in English | MEDLINE | ID: mdl-36576182

ABSTRACT

OBJECTIVES: Attention deficit hyperactivity disorder (ADHD) is a prevalent childhood disorder, but often goes unrecognised and untreated. To improve access to services, accurate predictions of populations at high risk of ADHD are needed for effective resource allocation. Using a unique linked health and education data resource, we examined how machine learning (ML) approaches can predict risk of ADHD. DESIGN: Retrospective population cohort study. SETTING: South London (2007-2013). PARTICIPANTS: n=56 258 pupils with linked education and health data. PRIMARY OUTCOME MEASURES: Using area under the curve (AUC), we compared the predictive accuracy of four ML models and one neural network for ADHD diagnosis. Ethnic group and language biases were weighted using a fair pre-processing algorithm. RESULTS: Random forest and logistic regression prediction models provided the highest predictive accuracy for ADHD in population samples (AUC 0.86 and 0.86, respectively) and clinical samples (AUC 0.72 and 0.70). Precision-recall curve analyses were less favourable. Sociodemographic biases were effectively reduced by a fair pre-processing algorithm without loss of accuracy. CONCLUSIONS: ML approaches using linked routinely collected education and health data offer accurate, low-cost and scalable prediction models of ADHD. These approaches could help identify areas of need and inform resource allocation. Introducing 'fairness weighting' attenuates some sociodemographic biases which would otherwise underestimate ADHD risk within minority groups.

Subject(s)

Attention Deficit Disorder with Hyperactivity , Humans , Child , Attention Deficit Disorder with Hyperactivity/diagnosis , Attention Deficit Disorder with Hyperactivity/epidemiology , Retrospective Studies , Cohort Studies , Schools , Delivery of Health Care , Machine Learning

6.

Evaluating physical urban features in several mental illnesses using electronic health record data.

Mahabadi, Zahra; Mahabadi, Maryam; Velupillai, Sumithra; Roberts, Angus; McGuire, Philip; Ibrahim, Zina; Patel, Rashmi.

Front Digit Health ; 4: 874237, 2022.

Article in English | MEDLINE | ID: mdl-36158997

ABSTRACT

Objectives: Understanding the potential impact of physical characteristics of the urban environment on clinical outcomes on several mental illnesses. Materials and Methods: Physical features of the urban environment were examined as predictors for affective and non-affective several mental illnesses (SMI), the number and length of psychiatric hospital admissions, and the number of short and long-acting injectable antipsychotic prescriptions. In addition, the urban features with the greatest weight in the predicted model were determined. The data included 28 urban features and 6 clinical variables obtained from 30,210 people with SMI receiving care from the South London and Maudsley NHS Foundation Trust (SLaM) using the Clinical Record Interactive Search (CRIS) tool. Five machine learning regression models were evaluated for the highest prediction accuracy followed by the Self-Organising Map (SOM) to represent the results visually. Results: The prevalence of SMI, number and duration of psychiatric hospital admission, and antipsychotic prescribing were greater in urban areas. However, machine learning analysis was unable to accurately predict clinical outcomes using urban environmental data. Discussion: The urban environment is associated with an increased prevalence of SMI. However, urban features alone cannot explain the variation observed in psychotic disorder prevalence or clinical outcomes measured through psychiatric hospitalisation or exposure to antipsychotic treatments. Conclusion: Urban areas are associated with a greater prevalence of SMI but clinical outcomes are likely to depend on a combination of urban and individual patient-level factors. Future mental healthcare service planning should focus on providing appropriate resources to people with SMI in urban environments.

7.

Autism spectrum disorders as a risk factor for adolescent self-harm: a retrospective cohort study of 113,286 young people in the UK.

Widnall, Emily; Epstein, Sophie; Polling, Catherine; Velupillai, Sumithra; Jewell, Amelia; Dutta, Rina; Simonoff, Emily; Stewart, Robert; Gilbert, Ruth; Ford, Tamsin; Hotopf, Matthew; Hayes, Richard D; Downs, Johnny.

BMC Med ; 20(1): 137, 2022 04 29.

Article in English | MEDLINE | ID: mdl-35484575

ABSTRACT

BACKGROUND: Individuals with autism spectrum disorder (ASD) are at particularly high risk of suicide and suicide attempts. Presentation to a hospital with self-harm is one of the strongest risk factors for later suicide. We describe the use of a novel data linkage between routinely collected education data and child and adolescent mental health data to examine whether adolescents with ASD are at higher risk than the general population of presenting to emergency care with self-harm. METHODS: A retrospective cohort study was conducted on the population aged 11-17 resident in four South London boroughs between January 2009 and March 2013, attending state secondary schools, identified in the National Pupil Database (NPD). Exposure data on ASD status were derived from the NPD. We used Cox regression to model time to first self-harm presentation to the Emergency Department (ED). RESULTS: One thousand twenty adolescents presented to the ED with self-harm, and 763 matched to the NPD. The sample for analysis included 113,286 adolescents (2.2% with ASD). For boys only, there was an increased risk of self-harm associated with ASD (adjusted hazard ratio 2·79, 95% CI 1·40-5·57, P<0·01). Several other factors including school absence, exclusion from school and having been in foster care were also associated with a higher risk of self-harm. CONCLUSIONS: This study provides evidence that ASD in boys, and other educational, social and clinical factors, are risk factors for emergency presentation with self-harm in adolescents. These findings are an important step in developing early recognition and prevention programmes.

Subject(s)

Autism Spectrum Disorder , Self-Injurious Behavior , Adolescent , Autism Spectrum Disorder/epidemiology , Child , Humans , Male , Retrospective Studies , Risk Factors , Self-Injurious Behavior/epidemiology , Self-Injurious Behavior/psychology , United Kingdom/epidemiology

8.

Can natural language processing models extract and classify instances of interpersonal violence in mental healthcare electronic records: an applied evaluative study.

Botelle, Riley; Bhavsar, Vishal; Kadra-Scalzo, Giouliana; Mascio, Aurelie; Williams, Marcus V; Roberts, Angus; Velupillai, Sumithra; Stewart, Robert.

BMJ Open ; 12(2): e052911, 2022 02 16.

Article in English | MEDLINE | ID: mdl-35172999

ABSTRACT

OBJECTIVE: This paper evaluates the application of a natural language processing (NLP) model for extracting clinical text referring to interpersonal violence using electronic health records (EHRs) from a large mental healthcare provider. DESIGN: A multidisciplinary team iteratively developed guidelines for annotating clinical text referring to violence. Keywords were used to generate a dataset which was annotated (ie, classified as affirmed, negated or irrelevant) for: presence of violence, patient status (ie, as perpetrator, witness and/or victim of violence) and violence type (domestic, physical and/or sexual). An NLP approach using a pretrained transformer model, BioBERT (Bidirectional Encoder Representations from Transformers for Biomedical Text Mining) was fine-tuned on the annotated dataset and evaluated using 10-fold cross-validation. SETTING: We used the Clinical Records Interactive Search (CRIS) database, comprising over 500 000 de-identified EHRs of patients within the South London and Maudsley NHS Foundation Trust, a specialist mental healthcare provider serving an urban catchment area. PARTICIPANTS: Searches of CRIS were carried out based on 17 predefined keywords. Randomly selected text fragments were taken from the results for each keyword, amounting to 3771 text fragments from the records of 2832 patients. OUTCOME MEASURES: We estimated precision, recall and F1 score for each NLP model. We examined sociodemographic and clinical variables in patients giving rise to the text data, and frequencies for each annotated violence characteristic. RESULTS: Binary classification models were developed for six labels (violence presence, perpetrator, victim, domestic, physical and sexual). Among annotations affirmed for the presence of any violence, 78% (1724) referred to physical violence, 61% (1350) referred to patients as perpetrator and 33% (731) to domestic violence. NLP models' precision ranged from 89% (perpetrator) to 98% (sexual); recall ranged from 89% (victim, perpetrator) to 97% (sexual). CONCLUSIONS: State of the art NLP models can extract and classify clinical text on violence from EHRs at acceptable levels of scale, efficiency and accuracy.

Subject(s)

Mental Health Services , Natural Language Processing , Electronic Health Records , Electronics , Humans , Plant Extracts , Violence

9.

Portability of natural language processing methods to detect suicidality from clinical text in US and UK electronic health records.

Cusick, Marika; Velupillai, Sumithra; Downs, Johnny; Campion, Thomas R; Sholle, Evan T; Dutta, Rina; Pathak, Jyotishman.

J Affect Disord Rep ; 102022 Dec.

Article in English | MEDLINE | ID: mdl-36644339

ABSTRACT

Background: In the global effort to prevent death by suicide, many academic medical institutions are implementing natural language processing (NLP) approaches to detect suicidality from unstructured clinical text in electronic health records (EHRs), with the hope of targeting timely, preventative interventions to individuals most at risk of suicide. Despite the international need, the development of these NLP approaches in EHRs has been largely local and not shared across healthcare systems. Methods: In this study, we developed a process to share NLP approaches that were individually developed at King's College London (KCL), UK and Weill Cornell Medicine (WCM), US - two academic medical centers based in different countries with vastly different healthcare systems. We tested and compared the algorithms' performance on manually annotated clinical notes (KCL: n = 4,911 and WCM = 837). Results: After a successful technical porting of the NLP approaches, our quantitative evaluation determined that independently developed NLP approaches can detect suicidality at another healthcare organization with a different EHR system, clinical documentation processes, and culture, yet do not achieve the same level of success as at the institution where the NLP algorithm was developed (KCL approach: F1-score 0.85 vs. 0.68, WCM approach: F1-score 0.87 vs. 0.72). Limitations: Independent NLP algorithm development and patient cohort selection at the two institutions comprised direct comparability. Conclusions: Shared use of these NLP approaches is a critical step forward towards improving data-driven algorithms for early suicide risk identification and timely prevention.

10.

Development of a Lexicon for Pain.

Chaturvedi, Jaya; Mascio, Aurelie; Velupillai, Sumithra U; Roberts, Angus.

Front Digit Health ; 3: 778305, 2021.

Article in English | MEDLINE | ID: mdl-34966903

ABSTRACT

Pain has been an area of growing interest in the past decade and is known to be associated with mental health issues. Due to the ambiguous nature of how pain is described in text, it presents a unique natural language processing (NLP) challenge. Understanding how pain is described in text and utilizing this knowledge to improve NLP tasks would be of substantial clinical importance. Not much work has previously been done in this space. For this reason, and in order to develop an English lexicon for use in NLP applications, an exploration of pain concepts within free text was conducted. The exploratory text sources included two hospital databases, a social media platform (Twitter), and an online community (Reddit). This exploration helped select appropriate sources and inform the construction of a pain lexicon. The terms within the final lexicon were derived from three sources-literature, ontologies, and word embedding models. This lexicon was validated by two clinicians as well as compared to an existing 26-term pain sub-ontology and MeSH (Medical Subject Headings) terms. The final validated lexicon consists of 382 terms and will be used in downstream NLP tasks by helping select appropriate pain-related documents from electronic health record (EHR) databases, as well as pre-annotating these words to help in development of an NLP application for classification of mentions of pain within the documents. The lexicon and the code used to generate the embedding models have been made publicly available.

11.

A Review of Recent Work in Transfer Learning and Domain Adaptation for Natural Language Processing of Electronic Health Records.

Laparra, Egoitz; Mascio, Aurelie; Velupillai, Sumithra; Miller, Timothy.

Yearb Med Inform ; 30(1): 239-244, 2021 Aug.

Article in English | MEDLINE | ID: mdl-34479396

ABSTRACT

OBJECTIVES: We survey recent work in biomedical NLP on building more adaptable or generalizable models, with a focus on work dealing with electronic health record (EHR) texts, to better understand recent trends in this area and identify opportunities for future research. METHODS: We searched PubMed, the Institute of Electrical and Electronics Engineers (IEEE), the Association for Computational Linguistics (ACL) anthology, the Association for the Advancement of Artificial Intelligence (AAAI) proceedings, and Google Scholar for the years 2018-2020. We reviewed abstracts to identify the most relevant and impactful work, and manually extracted data points from each of these papers to characterize the types of methods and tasks that were studied, in which clinical domains, and current state-of-the-art results. RESULTS: The ubiquity of pre-trained transformers in clinical NLP research has contributed to an increase in domain adaptation and generalization-focused work that uses these models as the key component. Most recently, work has started to train biomedical transformers and to extend the fine-tuning process with additional domain adaptation techniques. We also highlight recent research in cross-lingual adaptation, as a special case of adaptation. CONCLUSIONS: While pre-trained transformer models have led to some large performance improvements, general domain pre-training does not always transfer adequately to the clinical domain due to its highly specialized language. There is also much work to be done in showing that the gains obtained by pre-trained transformers are beneficial in real world use cases. The amount of work in domain adaptation and transfer learning is limited by dataset availability and creating datasets for new domains is challenging. The growing body of research in languages other than English is encouraging, and more collaboration between researchers across the language divide would likely accelerate progress in non-English clinical NLP.

Subject(s)

Electronic Health Records , Machine Learning , Natural Language Processing , Datasets as Topic , Language

12.

Temporal and diurnal variation in social media posts to a suicide support forum.

Dutta, Rina; Gkotsis, George; Velupillai, Sumithra; Bakolis, Ioannis; Stewart, Robert.

BMC Psychiatry ; 21(1): 259, 2021 05 19.

Article in English | MEDLINE | ID: mdl-34011346

ABSTRACT

BACKGROUND: Rates of suicide attempts and deaths are highest on Mondays and these occur more frequently in the morning or early afternoon, suggesting weekly temporal and diurnal variation in suicidal behaviour. It is unknown whether there are similar time trends on social media, of posts relevant to suicide. We aimed to determine temporal and diurnal variation in posting patterns on the Reddit forum SuicideWatch, an online community for individuals who might be at risk of, or who know someone at risk of suicide. METHODS: We used time series analysis to compare date and time stamps of 90,518 SuicideWatch posts from 1st December 2008 to 31st August 2015 to (i) 6,616,431 posts on the most commonly subscribed general subreddit, AskReddit and (ii) 66,934 of these AskReddit posts, which were posted by the SuicideWatch authors. RESULTS: Mondays showed the highest proportion of posts on SuicideWatch. Clear diurnal variation was observed, with a peak in the early morning (2:00-5:00 h), and a subsequent decrease to a trough in late morning/early afternoon (11:00-14:00 h). Conversely, the highest volume of posts in the control data was between 20:00-23:00 h. CONCLUSIONS: Posts on SuicideWatch occurred most frequently on Mondays: the day most associated with suicide risk. The early morning peak in SuicideWatch posts precedes the time of day during which suicide attempts and deaths most commonly occur. Further research of these weekly and diurnal rhythms should help target populations with support and suicide prevention interventions when needed most.

Subject(s)

Social Media , Circadian Rhythm , Humans , Suicidal Ideation

13.

Using General-purpose Sentiment Lexicons for Suicide Risk Assessment in Electronic Health Records: Corpus-Based Analysis.

Bittar, André; Velupillai, Sumithra; Roberts, Angus; Dutta, Rina.

JMIR Med Inform ; 9(4): e22397, 2021 Apr 13.

Article in English | MEDLINE | ID: mdl-33847595

ABSTRACT

BACKGROUND: Suicide is a serious public health issue, accounting for 1.4% of all deaths worldwide. Current risk assessment tools are reported as performing little better than chance in predicting suicide. New methods for studying dynamic features in electronic health records (EHRs) are being increasingly explored. One avenue of research involves using sentiment analysis to examine clinicians' subjective judgments when reporting on patients. Several recent studies have used general-purpose sentiment analysis tools to automatically identify negative and positive words within EHRs to test correlations between sentiment extracted from the texts and specific medical outcomes (eg, risk of suicide or in-hospital mortality). However, little attention has been paid to analyzing the specific words identified by general-purpose sentiment lexicons when applied to EHR corpora. OBJECTIVE: This study aims to quantitatively and qualitatively evaluate the coverage of six general-purpose sentiment lexicons against a corpus of EHR texts to ascertain the extent to which such lexical resources are fit for use in suicide risk assessment. METHODS: The data for this study were a corpus of 198,451 EHR texts made up of two subcorpora drawn from a 1:4 case-control study comparing clinical notes written over the period leading up to a suicide attempt (cases, n=2913) with those not preceding such an attempt (controls, n=14,727). We calculated word frequency distributions within each subcorpus to identify representative keywords for both the case and control subcorpora. We quantified the relative coverage of the 6 lexicons with respect to this list of representative keywords in terms of weighted precision, recall, and F score. RESULTS: The six lexicons achieved reasonable precision (0.53-0.68) but very low recall (0.04-0.36). Many of the most representative keywords in the suicide-related (case) subcorpus were not identified by any of the lexicons. The sentiment-bearing status of these keywords for this use case is thus doubtful. CONCLUSIONS: Our findings indicate that these 6 sentiment lexicons are not optimal for use in suicide risk assessment. We propose a set of guidelines for the creation of more suitable lexical resources for distinguishing suicide-related from non-suicide-related EHR texts.

14.

A natural language processing approach for identifying temporal disease onset information from mental healthcare text.

Viani, Natalia; Botelle, Riley; Kerwin, Jack; Yin, Lucia; Patel, Rashmi; Stewart, Robert; Velupillai, Sumithra.

Sci Rep ; 11(1): 757, 2021 01 12.

Article in English | MEDLINE | ID: mdl-33436814

ABSTRACT

Receiving timely and appropriate treatment is crucial for better health outcomes, and research on the contribution of specific variables is essential. In the mental health domain, an important research variable is the date of psychosis symptom onset, as longer delays in treatment are associated with worse intervention outcomes. The growing adoption of electronic health records (EHRs) within mental health services provides an invaluable opportunity to study this problem at scale retrospectively. However, disease onset information is often only available in open text fields, requiring natural language processing (NLP) techniques for automated analyses. Since this variable can be documented at different points during a patient's care, NLP methods that model clinical and temporal associations are needed. We address the identification of psychosis onset by: 1) manually annotating a corpus of mental health EHRs with disease onset mentions, 2) modelling the underlying NLP problem as a paragraph classification approach, and 3) combining multiple onset paragraphs at the patient level to generate a ranked list of likely disease onset dates. For 22/31 test patients (71%) the correct onset date was found among the top-3 NLP predictions. The proposed approach was also applied at scale, allowing an onset date to be estimated for 2483 patients.

Subject(s)

Electronic Health Records/statistics & numerical data , Mental Health Services/statistics & numerical data , Natural Language Processing , Psychotic Disorders/diagnosis , Symptom Assessment/methods , Humans , Mental Health , Retrospective Studies

15.

Applied natural language processing in mental health big data.

Stewart, Robert; Velupillai, Sumithra.

Neuropsychopharmacology ; 46(1): 252-253, 2021 01.

Article in English | MEDLINE | ID: mdl-32895453

Subject(s)

Big Data , Natural Language Processing , Electronic Health Records , Mental Health

16.

Using natural language processing to extract self-harm and suicidality data from a clinical sample of patients with eating disorders: a retrospective cohort study.

Cliffe, Charlotte; Seyedsalehi, Aida; Vardavoulia, Katerina; Bittar, André; Velupillai, Sumithra; Shetty, Hitesh; Schmidt, Ulrike; Dutta, Rina.

BMJ Open ; 11(12): e053808, 2021 12 31.

Article in English | MEDLINE | ID: mdl-34972768

ABSTRACT

OBJECTIVES: The objective of this study was to determine risk factors for those diagnosed with eating disorders who report self-harm and suicidality. DESIGN AND SETTING: This study was a retrospective cohort study within a secondary mental health service, South London and Maudsley National Health Service Trust. PARTICIPANTS: All diagnosed with an F50 diagnosis of eating disorder from January 2009 to September 2019 were included. INTERVENTION AND MEASURES: Electronic health records (EHRs) for these patients were extracted and two natural language processing tools were used to determine documentation of self-harm and suicidality in their clinical notes. These tools were validated manually for attribute agreement scores within this study. RESULTS: The attribute agreements for precision of positive mentions of self-harm were 0.96 and for suicidality were 0.80; this demonstrates a 'near perfect' and 'strong' agreement and highlights the reliability of the tools in identifying the EHRs reporting self-harm or suicidality. There were 7434 patients with EHRs available and diagnosed with eating disorders included in the study from the dates January 2007 to September 2019. Of these, 4591 (61.8%) had a mention of self-harm within their records and 4764 (64.0%) had a mention of suicidality; 3899 (52.4%) had mentions of both. Patients reporting either self-harm or suicidality were more likely to have a diagnosis of anorexia nervosa (AN) (self-harm, AN OR=3.44, 95% CI 1.05 to 11.3, p=0.04; suicidality, AN OR=8.20, 95% CI 2.17 to 30.1; p=0.002). They were also more likely to have a diagnosis of borderline personality disorder (p≤0.001), bipolar disorder (p<0.001) or substance misuse disorder (p<0.001). CONCLUSION: A high percentage of patients (>60%) diagnosed with eating disorders report either self-harm or suicidal thoughts. Relative to other eating disorders, those diagnosed with AN were more likely to report either self-harm or suicidal thoughts. Psychiatric comorbidity, in particular borderline personality disorder and substance misuse, was also associated with an increase risk in self-harm and suicidality. Therefore, risk assessment among patients diagnosed with eating disorders is crucial.

Subject(s)

Feeding and Eating Disorders , Self-Injurious Behavior , Suicide , Feeding and Eating Disorders/epidemiology , Humans , Natural Language Processing , Reproducibility of Results , Retrospective Studies , Self-Injurious Behavior/epidemiology , Self-Injurious Behavior/psychology , State Medicine , Suicidal Ideation , Suicide/psychology

17.

Reviewing a Decade of Research Into Suicide and Related Behaviour Using the South London and Maudsley NHS Foundation Trust Clinical Record Interactive Search (CRIS) System.

Bittar, André; Velupillai, Sumithra; Downs, Johnny; Sedgwick, Rosemary; Dutta, Rina.

Front Psychiatry ; 11: 553463, 2020.

Article in English | MEDLINE | ID: mdl-33329090

ABSTRACT

Suicide is a serious public health issue worldwide, yet current clinical methods for assessing a person's risk of taking their own life remain unreliable and new methods for assessing suicide risk are being explored. The widespread adoption of electronic health records (EHRs) has opened up new possibilities for epidemiological studies of suicide and related behaviour amongst those receiving healthcare. These types of records capture valuable information entered by healthcare practitioners at the point of care. However, much recent work has relied heavily on the structured data of EHRs, whilst much of the important information about a patient's care pathway is recorded in the unstructured text of clinical notes. Accessing and structuring text data for use in clinical research, and particularly for suicide and self-harm research, is a significant challenge that is increasingly being addressed using methods from the fields of natural language processing (NLP) and machine learning (ML). In this review, we provide an overview of the range of suicide-related studies that have been carried out using the Clinical Records Interactive Search (CRIS): a database for epidemiological and clinical research that contains de-identified EHRs from the South London and Maudsley NHS Foundation Trust. We highlight the variety of clinical research questions, cohorts and techniques that have been explored for suicide and related behaviour research using CRIS, including the development of NLP and ML approaches. We demonstrate how EHR data provides comprehensive material to study prevalence of suicide and self-harm in clinical populations. Structured data alone is insufficient and NLP methods are needed to more accurately identify relevant information from EHR data. We also show how the text in clinical notes provide signals for ML approaches to suicide risk assessment. We envision increased progress in the decades to come, particularly in externally validating findings across multiple sites and countries, both in terms of clinical evidence and in terms of NLP and machine learning method transferability.

18.

User Perspectives of Mood-Monitoring Apps Available to Young People: Qualitative Content Analysis.

Widnall, Emily; Grant, Claire Ellen; Wang, Tao; Cross, Lauren; Velupillai, Sumithra; Roberts, Angus; Stewart, Robert; Simonoff, Emily; Downs, Johnny.

JMIR Mhealth Uhealth ; 8(10): e18140, 2020 10 10.

Article in English | MEDLINE | ID: mdl-33037875

ABSTRACT

BACKGROUND: Mobile health apps are increasingly available and used in a clinical context to monitor young people's mood and mental health. Despite the benefits of accessibility and cost-effectiveness, consumer engagement remains a hurdle for uptake and continued use. Hundreds of mood-monitoring apps are publicly available to young people on app stores; however, few studies have examined consumer perspectives. App store reviews held on Google and Apple platforms provide a large, rich source of naturally generated, publicly available user reviews. Although commercial developers use these data to modify and improve their apps, to date, there has been very little in-depth evaluation of app store user reviews within scientific research, and our current understanding of what makes apps engaging and valuable to young people is limited. OBJECTIVE: This study aims to gain a better understanding of what app users consider useful to encourage frequent and prolonged use of mood-monitoring apps appropriate for young people. METHODS: A systematic approach was applied to the selection of apps and reviews. We identified mood-monitoring apps (n=53) by a combination of automated application programming interface (API) methods. We only included apps appropriate for young people based on app store age categories (apps available to those younger than 18 years). We subsequently downloaded all available user reviews via API data scraping methods and selected a representative subsample of reviews (n=1803) for manual qualitative content analysis. RESULTS: The qualitative content analysis revealed 8 main themes: accessibility (34%), flexibility (21%), recording and representation of mood (18%), user requests (17%), reflecting on mood (16%), technical features (16%), design (13%), and health promotion (11%). A total of 6 minor themes were also identified: notification and reminders; recommendation; privacy, security, and transparency; developer; adverts; and social/community. CONCLUSIONS: Users value mood-monitoring apps that can be personalized to their needs, have a simple and intuitive design, and allow accurate representation and review of complex and fluctuating moods. App store reviews are a valuable repository of user engagement feedback and provide a wealth of information about what users value in an app and what user needs are not being met. Users perceive mood-monitoring apps positively, but over 20% of reviews identified the need for improvement.

Subject(s)

Mobile Applications , Adolescent , Health Promotion , Humans , Mental Health , Privacy

19.

Enhancing predictions of patient conveyance using emergency call handler free text notes for unconscious and fainting incidents reported to the London Ambulance Service.

Tollinton, Liam; Metcalf, Alexander M; Velupillai, Sumithra.

Int J Med Inform ; 141: 104179, 2020 09.

Article in English | MEDLINE | ID: mdl-32663739

ABSTRACT

OBJECTIVE: Pre-hospital emergency medical services use clinical decision support systems (CDSS) to triage calls. Call handlers often supplement this by making free text notes covering key incident information. We investigate whether machine learning approaches using features from such free text notes can improve prediction of unconscious patients who require conveyance. MATERIALS AND METHODS: We analysed a subset of all London Ambulance Service calls that were triaged through the Medical Priority Dispatch System (MPDS) as involving an unconscious or fainting patient in 2018. We use and compare two machine learning algorithms: random forest (RF) and gradient boosting machine (GBM). For each incident, we predict whether the patient will be conveyed to a hospital emergency department or equivalent using as features 1) the MPDS code, 2) the free text notes and 3) the two together. We evaluate model performance using the area under the curve (AUC) metric. Given the imbalance of outcomes (patient conveyed 71 %, not conveyed 29 %), we also consider sensitivity and specificity. RESULTS: Using only the MPDS code resulted in an AUC of 0.57. Using the text notes gave an improved AUC score of 0.63 and combining the two gave an AUC score of 0.64 (scores were similar for RF and GBM). GBM models scored better on sensitivity (0.93 vs 0.62 for RF in the combined model), but specificity was lower (0.17 vs. 0.56 for RF in the combined model). CONCLUSIONS: Using information contained in the free text notes made by call handlers in combination with MPDS improves prediction of unconscious and fainting patients requiring conveyance to a hospital emergency department (or equivalent) when compared with machine learning models using MPDS codes only. This suggests there is some useful information in unstructured data captured by emergency call handlers that complements MPDS codes. Quantifying this gain can help inform emergency medical service policy when evaluating the decision to expand or augment existing CDSS.

Subject(s)

Ambulances , Emergency Medical Service Communication Systems , Emergency Service, Hospital , Humans , London , Retrospective Studies , Syncope , Triage

20.

The association between neighbourhood characteristics and physical victimisation in men and women with mental disorders.

Bhavsar, Vishal; Sanyal, Jyoti; Patel, Rashmi; Shetty, Hitesh; Velupillai, Sumithra; Stewart, Robert; Broadbent, Matthew; MacCabe, James H; Das-Munshi, Jayati; Howard, Louise M.

BJPsych Open ; 6(4): e73, 2020 Jul 16.

Article in English | MEDLINE | ID: mdl-32669154

ABSTRACT

BACKGROUND: How neighbourhood characteristics affect the physical safety of people with mental illness is unclear. AIMS: To examine neighbourhood effects on physical victimisation towards people using mental health services. METHOD: We developed and evaluated a machine-learning-derived free-text-based natural language processing (NLP) algorithm to ascertain clinical text referring to physical victimisation. This was applied to records on all patients attending National Health Service mental health services in Southeast London. Sociodemographic and clinical data, and diagnostic information on use of acute hospital care (from Hospital Episode Statistics, linked to Clinical Record Interactive Search), were collected in this group, defined as 'cases' and concurrently sampled controls. Multilevel logistic regression models estimated associations (odds ratios, ORs) between neighbourhood-level fragmentation, crime, income deprivation, and population density and physical victimisation. RESULTS: Based on a human-rated gold standard, the NLP algorithm had a positive predictive value of 0.92 and sensitivity of 0.98 for (clinically recorded) physical victimisation. A 1 s.d. increase in neighbourhood crime was accompanied by a 7% increase in odds of physical victimisation in women and an 13% increase in men (adjusted OR (aOR) for women: 1.07, 95% CI 1.01-1.14, aOR for men: 1.13, 95% CI 1.06-1.21, P for gender interaction, 0.218). Although small, adjusted associations for neighbourhood fragmentation appeared greater in magnitude for women (aOR = 1.05, 95% CI 1.01-1.11) than men, where this association was not statistically significant (aOR = 1.00, 95% CI 0.95-1.04, P for gender interaction, 0.096). Neighbourhood income deprivation was associated with victimisation in men and women with similar magnitudes of association. CONCLUSIONS: Neighbourhood factors influencing safety, as well as individual characteristics including gender, may be relevant to understanding pathways to physical victimisation towards people with mental illness.

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL