Pesquisa | Biblioteca Virtual em Saúde

1.

Cohort profile: The Social media, smartphone use and Self-harm in Young People (3S-YP) study-A prospective, observational cohort study of young people in contact with mental health services.

Bye, Amanda; Carter, Ben; Leightley, Daniel; Trevillion, Kylee; Liakata, Maria; Branthonne-Foster, Stella; Cross, Samantha; Zenasni, Zohra; Carr, Ewan; Williamson, Grace; Vega Viyuela, Alba; Dutta, Rina.

PLoS One ; 19(5): e0299059, 2024.

Artigo em Inglês | MEDLINE | ID: mdl-38776261

RESUMO

OBJECTIVES: The Social media, Smartphone use and Self-Harm (3S-YP) study is a prospective observational cohort study to investigate the mechanisms underpinning associations between social media and smartphone use and self-harm in a clinical youth sample. We present here a comprehensive description of the cohort from baseline data and an overview of data available from baseline and follow-up assessments. METHODS: Young people aged 13-25 years were recruited from a mental health trust in England and followed up for 6 months. Self-report data was collected at baseline and monthly during follow-up and linked with electronic health records (EHR) and user-generated data. FINDINGS: A total of 362 young people enrolled and provided baseline questionnaire data. Most participants had a history of self-harm according to clinical (n = 295, 81.5%) and broader definitions (n = 296, 81.8%). At baseline, there were high levels of current moderate/severe anxiety (n = 244; 67.4%), depression (n = 255; 70.4%) and sleep disturbance (n = 171; 47.2%). Over half used social media and smartphones after midnight on weekdays (n = 197, 54.4%; n = 215, 59.4%) and weekends (n = 241, 66.6%; n = 263, 72.7%), and half met the cut-off for problematic smartphone use (n = 177; 48.9%). Of the cohort, we have questionnaire data at month 6 from 230 (63.5%), EHR data from 345 (95.3%), social media data from 110 (30.4%) and smartphone data from 48 (13.3%). CONCLUSION: The 3S-YP study is the first prospective study with a clinical youth sample, for whom to investigate the impact of digital technology on youth mental health using novel data linkages. Baseline findings indicate self-harm, anxiety, depression, sleep disturbance and digital technology overuse are prevalent among clinical youth. Future analyses will explore associations between outcomes and exposures over time and compare self-report with user-generated data in this cohort.

Assuntos

Comportamento Autodestrutivo , Smartphone , Mídias Sociais , Humanos , Adolescente , Comportamento Autodestrutivo/epidemiologia , Comportamento Autodestrutivo/psicologia , Masculino , Feminino , Estudos Prospectivos , Adulto Jovem , Adulto , Serviços de Saúde Mental , Ansiedade/epidemiologia , Inquéritos e Questionários , Depressão/epidemiologia , Autorrelato , Inglaterra/epidemiologia , Estudos de Coortes

2.

Development and validation of open-source deep neural networks for comprehensive chest x-ray reading: a retrospective, multicentre study.

Cid, Yashin Dicente; Macpherson, Matthew; Gervais-Andre, Louise; Zhu, Yuanyi; Franco, Giuseppe; Santeramo, Ruggiero; Lim, Chee; Selby, Ian; Muthuswamy, Keerthini; Amlani, Ashik; Hopewell, Heath; Indrajeet, Das; Liakata, Maria; Hutchinson, Charles E; Goh, Vicky; Montana, Giovanni.

Lancet Digit Health ; 6(1): e44-e57, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38071118

RESUMO

BACKGROUND: Artificial intelligence (AI) systems for automated chest x-ray interpretation hold promise for standardising reporting and reducing delays in health systems with shortages of trained radiologists. Yet, there are few freely accessible AI systems trained on large datasets for practitioners to use with their own data with a view to accelerating clinical deployment of AI systems in radiology. We aimed to contribute an AI system for comprehensive chest x-ray abnormality detection. METHODS: In this retrospective cohort study, we developed open-source neural networks, X-Raydar and X-Raydar-NLP, for classifying common chest x-ray findings from images and their free-text reports. Our networks were developed using data from six UK hospitals from three National Health Service (NHS) Trusts (University Hospitals Coventry and Warwickshire NHS Trust, University Hospitals Birmingham NHS Foundation Trust, and University Hospitals Leicester NHS Trust) collectively contributing 2 513 546 chest x-ray studies taken from a 13-year period (2006-19), which yielded 1 940 508 usable free-text radiological reports written by the contemporary assessing radiologist (collectively referred to as the "historic reporters") and 1 896 034 frontal images. Chest x-rays were labelled using a taxonomy of 37 findings by a custom-trained natural language processing (NLP) algorithm, X-Raydar-NLP, from the original free-text reports. X-Raydar-NLP was trained on 23 230 manually annotated reports and tested on 4551 reports from all hospitals. 1 694 921 labelled images from the training set and 89 238 from the validation set were then used to train a multi-label image classifier. Our algorithms were evaluated on three retrospective datasets: a set of exams sampled randomly from the full NHS dataset reported during clinical practice and annotated using NLP (n=103 328); a consensus set sampled from all six hospitals annotated by three expert radiologists (two independent annotators for each image and a third consultant to facilitate disagreement resolution) under research conditions (n=1427); and an independent dataset, MIMIC-CXR, consisting of NLP-annotated exams (n=252 374). FINDINGS: X-Raydar achieved a mean AUC of 0·919 (SD 0·039) on the auto-labelled set, 0·864 (0·102) on the consensus set, and 0·842 (0·074) on the MIMIC-CXR test, demonstrating similar performance to the historic clinical radiologist reporters, as assessed on the consensus set, for multiple clinically important findings, including pneumothorax, parenchymal opacification, and parenchymal mass or nodules. On the consensus set, X-Raydar outperformed historical reporter balanced accuracy with significance on 27 of 37 findings, was non-inferior on nine, and inferior on one finding, resulting in an average improvement of 13·3% (SD 13·1) to 0·763 (0·110), including a mean 5·6% (13·2) improvement in critical findings to 0·826 (0·119). INTERPRETATION: Our study shows that automated classification of chest x-rays under a comprehensive taxonomy can achieve performance levels similar to those of historical reporters and exhibit robust generalisation to external data. The open-sourced neural networks can serve as foundation models for further research and are freely available to the research community. FUNDING: Wellcome Trust.

Assuntos

Inteligência Artificial , Interpretação de Imagem Assistida por Computador , Redes Neurais de Computação , Humanos , Estudos Retrospectivos , Raios X

3.

Observational prospective study of social media, smartphone use and self-harm in a clinical sample of young people: study protocol.

Bye, Amanda; Carter, Ben; Leightley, Daniel; Trevillion, Kylee; Liakata, Maria; Branthonne-Foster, Stella; Williamson, Grace; Zenasni, Zohra; Dutta, Rina.

BMJ Open ; 13(2): e069748, 2023 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-36725102

RESUMO

INTRODUCTION: Young people are the most frequent users of social media and smartphones and there has been an increasing speculation about the potential negative impacts of their use on mental health. This has coincided with a sharp increase in the levels of self-harm in young people. To date, studies researching this potential association are predominantly cross-sectional and reliant on self-report data, which precludes the ability to objectively analyse behaviour over time. This study is one of the first attempts to explore temporal patterns of real-world usage prior to self-harm, to identify whether there are usage patterns associated with an increased risk. METHODS AND ANALYSIS: To study the mechanisms by which social media and smartphone use underpin self-harm in a clinical sample of young people, the Social media, Smartphone use and Self-harm in Young People (3S-YP) study uses a prospective, observational study design. Up to 600 young people aged 13-25 years old from secondary mental health services will be recruited and followed for up to 6 months. Primary analysis will compare real-world data in the 7 days leading up to a participant or clinician recorded self-harm episode, to categorise patterns of problematic usage. Secondary analyses will explore potential mediating effects of anxiety, depression, sleep disturbance, loneliness and bullying. ETHICS AND DISSEMINATION: This study was approved by the National Research Ethics Service, London - Riverside, as well as by the Joint Research and Development Office of the Institute of Psychiatry, Psychology and Neuroscience and South London and Maudsley NHS Foundation Trust (SLaM), and the SLaM Clinical Research Interactive Search (CRIS) Oversight Committee. The findings from this study will be disseminated through peer-reviewed scientific journals, conferences, websites, social media and stakeholder engagement activities. TRIAL REGISTRATION NUMBER: NCT04601220.

Assuntos

Comportamento Autodestrutivo , Mídias Sociais , Humanos , Adolescente , Adulto Jovem , Adulto , Smartphone , Estudos Prospectivos , Estudos Transversais , Comportamento Autodestrutivo/epidemiologia , Comportamento Autodestrutivo/psicologia , Estudos Observacionais como Assunto

4.

Maximizing the positive and minimizing the negative: Social media data to study youth mental health with informed consent.

Leightley, Daniel; Bye, Amanda; Carter, Ben; Trevillion, Kylee; Branthonne-Foster, Stella; Liakata, Maria; Wood, Anthony; Ougrin, Dennis; Orben, Amy; Ford, Tamsin; Dutta, Rina.

Front Psychiatry ; 13: 1096253, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36704745

RESUMO

Social media usage impacts upon the mental health and wellbeing of young people, yet there is not enough evidence to determine who is affected, how and to what extent. While it has widened and strengthened communication networks for many, the dangers posed to at-risk youth are serious. Social media data offers unique insights into the minute details of a user's online life. Timely consented access to data could offer many opportunities to transform understanding of its effects on mental wellbeing in different contexts. However, limited data access by researchers is preventing such advances from being made. Our multidisciplinary authorship includes a lived experience adviser, academic and practicing psychiatrists, and academic psychology, as well as computational, statistical, and qualitative researchers. In this Perspective article, we propose a framework to support secure and confidential access to social media platform data for research to make progress toward better public mental health.

5.

Natural Language Processing markers in first episode psychosis and people at clinical high-risk.

Morgan, Sarah E; Diederen, Kelly; Vértes, Petra E; Ip, Samantha H Y; Wang, Bo; Thompson, Bethany; Demjaha, Arsime; De Micheli, Andrea; Oliver, Dominic; Liakata, Maria; Fusar-Poli, Paolo; Spencer, Tom J; McGuire, Philip.

Transl Psychiatry ; 11(1): 630, 2021 12 13.

Artigo em Inglês | MEDLINE | ID: mdl-34903724

RESUMO

Recent work has suggested that disorganised speech might be a powerful predictor of later psychotic illness in clinical high risk subjects. To that end, several automated measures to quantify disorganisation of transcribed speech have been proposed. However, it remains unclear which measures are most strongly associated with psychosis, how different measures are related to each other and what the best strategies are to collect speech data from participants. Here, we assessed whether twelve automated Natural Language Processing markers could differentiate transcribed speech excerpts from subjects at clinical high risk for psychosis, first episode psychosis patients and healthy control subjects (total N = 54). In-line with previous work, several measures showed significant differences between groups, including semantic coherence, speech graph connectivity and a measure of whether speech was on-topic, the latter of which outperformed the related measure of tangentiality. Most NLP measures examined were only weakly related to each other, suggesting they provide complementary information. Finally, we compared the ability of transcribed speech generated using different tasks to differentiate the groups. Speech generated from picture descriptions of the Thematic Apperception Test and a story re-telling task outperformed free speech, suggesting that choice of speech generation method may be an important consideration. Overall, quantitative speech markers represent a promising direction for future clinical applications.

Assuntos

Processamento de Linguagem Natural , Transtornos Psicóticos , Biomarcadores , Cognição , Humanos , Transtornos Psicóticos/diagnóstico , Fala

6.

How We Do Things With Words: Analyzing Text as Social and Cultural Data.

Nguyen, Dong; Liakata, Maria; DeDeo, Simon; Eisenstein, Jacob; Mimno, David; Tromble, Rebekah; Winters, Jane.

Front Artif Intell ; 3: 62, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33733179

RESUMO

In this article we describe our experiences with computational text analysis involving rich social and cultural concepts. We hope to achieve three primary goals. First, we aim to shed light on thorny issues not always at the forefront of discussions about computational text analysis methods. Second, we hope to provide a set of key questions that can guide work in this area. Our guidance is based on our own experiences and is therefore inherently imperfect. Still, given our diversity of disciplinary backgrounds and research practices, we hope to capture a range of ideas and identify commonalities that resonate for many. This leads to our final goal: to help promote interdisciplinary collaborations. Interdisciplinary insights and partnerships are essential for realizing the full potential of any computational text analysis involving social and cultural concepts, and the more we bridge these divides, the more fruitful we believe our work will be.

7.

Using clinical Natural Language Processing for health outcomes research: Overview and actionable suggestions for future advances.

Velupillai, Sumithra; Suominen, Hanna; Liakata, Maria; Roberts, Angus; Shah, Anoop D; Morley, Katherine; Osborn, David; Hayes, Joseph; Stewart, Robert; Downs, Johnny; Chapman, Wendy; Dutta, Rina.

J Biomed Inform ; 88: 11-19, 2018 12.

Artigo em Inglês | MEDLINE | ID: mdl-30368002

RESUMO

The importance of incorporating Natural Language Processing (NLP) methods in clinical informatics research has been increasingly recognized over the past years, and has led to transformative advances. Typically, clinical NLP systems are developed and evaluated on word, sentence, or document level annotations that model specific attributes and features, such as document content (e.g., patient status, or report type), document section types (e.g., current medications, past medical history, or discharge summary), named entities and concepts (e.g., diagnoses, symptoms, or treatments) or semantic attributes (e.g., negation, severity, or temporality). From a clinical perspective, on the other hand, research studies are typically modelled and evaluated on a patient- or population-level, such as predicting how a patient group might respond to specific treatments or patient monitoring over time. While some NLP tasks consider predictions at the individual or group user level, these tasks still constitute a minority. Owing to the discrepancy between scientific objectives of each field, and because of differences in methodological evaluation priorities, there is no clear alignment between these evaluation approaches. Here we provide a broad summary and outline of the challenging issues involved in defining appropriate intrinsic and extrinsic evaluation methods for NLP research that is to be used for clinical outcomes research, and vice versa. A particular focus is placed on mental health research, an area still relatively understudied by the clinical NLP research community, but where NLP methods are of notable relevance. Recent advances in clinical NLP method development have been significant, but we propose more emphasis needs to be placed on rigorous evaluation for the field to advance further. To enable this, we provide actionable suggestions, including a minimal protocol that could be used when reporting clinical NLP method development and its evaluation.

Assuntos

Registros Eletrônicos de Saúde , Informática Médica/métodos , Serviços de Saúde Mental/organização & administração , Processamento de Linguagem Natural , Semântica , Algoritmos , Coleta de Dados/métodos , Humanos , Informática Médica/tendências , Transtornos Mentais/terapia , Avaliação de Resultados em Cuidados de Saúde , Reprodutibilidade dos Testes

8.

Building and evaluating resources for sentiment analysis in the Greek language.

Tsakalidis, Adam; Papadopoulos, Symeon; Voskaki, Rania; Ioannidou, Kyriaki; Boididou, Christina; Cristea, Alexandra I; Liakata, Maria; Kompatsiaris, Yiannis.

Lang Resour Eval ; 52(4): 1021-1044, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30930705

RESUMO

Sentiment lexicons and word embeddings constitute well-established sources of information for sentiment analysis in online social media. Although their effectiveness has been demonstrated in state-of-the-art sentiment analysis and related tasks in the English language, such publicly available resources are much less developed and evaluated for the Greek language. In this paper, we tackle the problems arising when analyzing text in such an under-resourced language. We present and make publicly available a rich set of such resources, ranging from a manually annotated lexicon, to semi-supervised word embedding vectors and annotated datasets for different tasks. Our experiments using different algorithms and parameters on our resources show promising results over standard baselines; on average, we achieve a 24.9% relative improvement in F-score on the cross-domain sentiment analysis task when training the same algorithms with our resources, compared to training them on more traditional feature sources, such as n-grams. Importantly, while our resources were built with the primary focus on the cross-domain sentiment analysis task, they also show promising results in related tasks, such as emotion analysis and sarcasm detection.

9.

Corrigendum: Characterisation of mental health conditions in social media using Informed Deep Learning.

Gkotsis, George; Oellrich, Anika; Velupillai, Sumithra; Liakata, Maria; Hubbard, Tim J P; Dobson, Richard J B; Dutta, Rina.

Sci Rep ; 7: 46813, 2017 05 16.

Artigo em Inglês | MEDLINE | ID: mdl-28507325

10.

Characterisation of mental health conditions in social media using Informed Deep Learning.

Gkotsis, George; Oellrich, Anika; Velupillai, Sumithra; Liakata, Maria; Hubbard, Tim J P; Dobson, Richard J B; Dutta, Rina.

Sci Rep ; 7: 45141, 2017 03 22.

Artigo em Inglês | MEDLINE | ID: mdl-28327593

RESUMO

The number of people affected by mental illness is on the increase and with it the burden on health and social care use, as well as the loss of both productivity and quality-adjusted life-years. Natural language processing of electronic health records is increasingly used to study mental health conditions and risk behaviours on a large scale. However, narrative notes written by clinicians do not capture first-hand the patients' own experiences, and only record cross-sectional, professional impressions at the point of care. Social media platforms have become a source of 'in the moment' daily exchange, with topics including well-being and mental health. In this study, we analysed posts from the social media platform Reddit and developed classifiers to recognise and classify posts related to mental illness according to 11 disorder themes. Using a neural network and deep learning approach, we could automatically recognise mental illness-related posts in our balenced dataset with an accuracy of 91.08% and select the correct theme with a weighted average accuracy of 71.37%. We believe that these results are a first step in developing methods to characterise large amounts of user-generated content that could support content curation and targeted interventions.

11.

Measuring scientific impact beyond academia: An assessment of existing impact metrics and proposed improvements.

Ravenscroft, James; Liakata, Maria; Clare, Amanda; Duma, Daniel.

PLoS One ; 12(3): e0173152, 2017.

Artigo em Inglês | MEDLINE | ID: mdl-28278243

RESUMO

How does scientific research affect the world around us? Being able to answer this question is of great importance in order to appropriately channel efforts and resources in science. The impact by scientists in academia is currently measured by citation based metrics such as h-index, i-index and citation counts. These academic metrics aim to represent the dissemination of knowledge among scientists rather than the impact of the research on the wider world. In this work we are interested in measuring scientific impact beyond academia, on the economy, society, health and legislation (comprehensive impact). Indeed scientists are asked to demonstrate evidence of such comprehensive impact by authoring case studies in the context of the Research Excellence Framework (REF). We first investigate the extent to which existing citation based metrics can be indicative of comprehensive impact. We have collected all recent REF impact case studies from 2014 and we have linked these to papers in citation networks that we constructed and derived from CiteSeerX, arXiv and PubMed Central using a number of text processing and information retrieval techniques. We have demonstrated that existing citation-based metrics for impact measurement do not correlate well with REF impact results. We also consider metrics of online attention surrounding scientific works, such as those provided by the Altmetric API. We argue that in order to be able to evaluate wider non-academic impact we need to mine information from a much wider set of resources, including social media posts, press releases, news articles and political debates stemming from academic work. We also provide our data as a free and reusable collection for further analysis, including the PubMed citation network and the correspondence between REF case studies, grant applications and the academic literature.

Assuntos

Logro , Pesquisa Biomédica/normas , Fator de Impacto de Revistas , Modelos Estatísticos , Editoração/estatística & dados numéricos , Humanos , Ciência , Mídias Sociais

12.

Analysing How People Orient to and Spread Rumours in Social Media by Looking at Conversational Threads.

Zubiaga, Arkaitz; Liakata, Maria; Procter, Rob; Wong Sak Hoi, Geraldine; Tolmie, Peter.

PLoS One ; 11(3): e0150989, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-26943909

RESUMO

As breaking news unfolds people increasingly rely on social media to stay abreast of the latest updates. The use of social media in such situations comes with the caveat that new information being released piecemeal may encourage rumours, many of which remain unverified long after their point of release. Little is known, however, about the dynamics of the life cycle of a social media rumour. In this paper we present a methodology that has enabled us to collect, identify and annotate a dataset of 330 rumour threads (4,842 tweets) associated with 9 newsworthy events. We analyse this dataset to understand how users spread, support, or deny rumours that are later proven true or false, by distinguishing two levels of status in a rumour life cycle i.e., before and after its veracity status is resolved. The identification of rumours associated with each event, as well as the tweet that resolved each rumour as true or false, was performed by journalist members of the research team who tracked the events in real time. Our study shows that rumours that are ultimately proven true tend to be resolved faster than those that turn out to be false. Whilst one can readily see users denying rumours once they have been debunked, users appear to be less capable of distinguishing true from false rumours when their veracity remains in question. In fact, we show that the prevalent tendency for users is to support every unverified rumour. We also analyse the role of different types of users, finding that highly reputable users such as news organisations endeavour to post well-grounded statements, which appear to be certain and accompanied by evidence. Nevertheless, these often prove to be unverified pieces of information that give rise to false rumours. Our study reinforces the need for developing robust machine learning techniques that can provide assistance in real time for assessing the veracity of rumours. The findings of our study provide useful insights for achieving this aim.

Assuntos

Comunicação , Mídias Sociais , Negação em Psicologia , Apoio Social

13.

Biological network extraction from scientific literature: state of the art and challenges.

Li, Chen; Liakata, Maria; Rebholz-Schuhmann, Dietrich.

Brief Bioinform ; 15(5): 856-77, 2014 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-23434632

RESUMO

Networks of molecular interactions explain complex biological processes, and all known information on molecular events is contained in a number of public repositories including the scientific literature. Metabolic and signalling pathways are often viewed separately, even though both types are composed of interactions involving proteins and other chemical entities. It is necessary to be able to combine data from all available resources to judge the functionality, complexity and completeness of any given network overall, but especially the full integration of relevant information from the scientific literature is still an ongoing and complex task. Currently, the text-mining research community is steadily moving towards processing the full body of the scientific literature by making use of rich linguistic features such as full text parsing, to extract biological interactions. The next step will be to combine these with information from scientific databases to support hypothesis generation for the discovery of new knowledge and the extension of biological networks. The generation of comprehensive networks requires technologies such as entity grounding, coordination resolution and co-reference resolution, which are not fully solved and are required to further improve the quality of results. Here, we analyse the state of the art for the extraction of network information from the scientific literature and the evaluation of extraction methods against reference corpora, discuss challenges involved and identify directions for future research.

Assuntos

Mineração de Dados , Linguística , Transdução de Sinais

14.

Dynamic enhancement of drug product labels to support drug safety, efficacy, and effectiveness.

Boyce, Richard D; Horn, John R; Hassanzadeh, Oktie; Waard, Anita de; Schneider, Jodi; Luciano, Joanne S; Rastegar-Mojarad, Majid; Liakata, Maria.

J Biomed Semantics ; 4(1): 5, 2013 Jan 26.

Artigo em Inglês | MEDLINE | ID: mdl-23351881

RESUMO

Out-of-date or incomplete drug product labeling information may increase the risk of otherwise preventable adverse drug events. In recognition of these concerns, the United States Federal Drug Administration (FDA) requires drug product labels to include specific information. Unfortunately, several studies have found that drug product labeling fails to keep current with the scientific literature. We present a novel approach to addressing this issue. The primary goal of this novel approach is to better meet the information needs of persons who consult the drug product label for information on a drug's efficacy, effectiveness, and safety. Using FDA product label regulations as a guide, the approach links drug claims present in drug information sources available on the Semantic Web with specific product label sections. Here we report on pilot work that establishes the baseline performance characteristics of a proof-of-concept system implementing the novel approach. Claims from three drug information sources were linked to the Clinical Studies, Drug Interactions, and Clinical Pharmacology sections of the labels for drug products that contain one of 29 psychotropic drugs. The resulting Linked Data set maps 409 efficacy/effectiveness study results, 784 drug-drug interactions, and 112 metabolic pathway assertions derived from three clinically-oriented drug information sources (ClinicalTrials.gov, the National Drug File - Reference Terminology, and the Drug Interaction Knowledge Base) to the sections of 1,102 product labels. Proof-of-concept web pages were created for all 1,102 drug product labels that demonstrate one possible approach to presenting information that dynamically enhances drug product labeling. We found that approximately one in five efficacy/effectiveness claims were relevant to the Clinical Studies section of a psychotropic drug product, with most relevant claims providing new information. We also identified several cases where all of the drug-drug interaction claims linked to the Drug Interactions section for a drug were potentially novel. The baseline performance characteristics of the proof-of-concept will enable further technical and user-centered research on robust methods for scaling the approach to the many thousands of product labels currently on the market.

15.

Three hybrid classifiers for the detection of emotions in suicide notes.

Liakata, Maria; Kim, Jee-Hyub; Saha, Shyamasree; Hastings, Janna; Rebholz-Schuhmann, Dietrich.

Biomed Inform Insights ; 5(Suppl. 1): 175-84, 2012.

Artigo em Inglês | MEDLINE | ID: mdl-22879774

RESUMO

We describe our approach for creating a system able to detect emotions in suicide notes. Motivated by the sparse and imbalanced data as well as the complex annotation scheme, we have considered three hybrid approaches for distinguishing between the different categories. Each of the three approaches combines machine learning with manually derived rules, where the latter target very sparse emotion categories. The first approach considers the task as single label multi-class classification, where an SVM and a CRF classifier are trained to recognise fifteen different categories and their results are combined. Our second approach trains individual binary classifiers (SVM and CRF) for each of the fifteen sentence categories and returns the union of the classifiers as the final result. Finally, our third approach is a combination of binary and multi-class classifiers (SVM and CRF) trained on different subsets of the training data. We considered a number of different feature configurations. All three systems were tested on 300 unseen messages. Our second system had the best performance of the three, yielding an F1 score of 45.6% and a Precision of 60.1% whereas our best Recall (43.6%) was obtained using the third system.

16.

Automatic recognition of conceptualization zones in scientific articles and two life science applications.

Liakata, Maria; Saha, Shyamasree; Dobnik, Simon; Batchelor, Colin; Rebholz-Schuhmann, Dietrich.

Bioinformatics ; 28(7): 991-1000, 2012 Apr 01.

Artigo em Inglês | MEDLINE | ID: mdl-22321698

RESUMO

MOTIVATION: Scholarly biomedical publications report on the findings of a research investigation. Scientists use a well-established discourse structure to relate their work to the state of the art, express their own motivation and hypotheses and report on their methods, results and conclusions. In previous work, we have proposed ways to explicitly annotate the structure of scientific investigations in scholarly publications. Here we present the means to facilitate automatic access to the scientific discourse of articles by automating the recognition of 11 categories at the sentence level, which we call Core Scientific Concepts (CoreSCs). These include: Hypothesis, Motivation, Goal, Object, Background, Method, Experiment, Model, Observation, Result and Conclusion. CoreSCs provide the structure and context to all statements and relations within an article and their automatic recognition can greatly facilitate biomedical information extraction by characterizing the different types of facts, hypotheses and evidence available in a scientific publication. RESULTS: We have trained and compared machine learning classifiers (support vector machines and conditional random fields) on a corpus of 265 full articles in biochemistry and chemistry to automatically recognize CoreSCs. We have evaluated our automatic classifications against a manually annotated gold standard, and have achieved promising accuracies with 'Experiment', 'Background' and 'Model' being the categories with the highest F1-scores (76%, 62% and 53%, respectively). We have analysed the task of CoreSC annotation both from a sentence classification as well as sequence labelling perspective and we present a detailed feature evaluation. The most discriminative features are local sentence features such as unigrams, bigrams and grammatical dependencies while features encoding the document structure, such as section headings, also play an important role for some of the categories. We discuss the usefulness of automatically generated CoreSCs in two biomedical applications as well as work in progress. AVAILABILITY: A web-based tool for the automatic annotation of articles with CoreSCs and corresponding documentation is available online at http://www.sapientaproject.com/software http://www.sapientaproject.com also contains detailed information pertaining to CoreSC annotation and links to annotation guidelines as well as a corpus of manually annotated articles, which served as our training data. CONTACT: liakata@ebi.ac.uk SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Inteligência Artificial , Reconhecimento Automatizado de Padrão/métodos , Publicações Periódicas como Assunto/classificação , Máquina de Vetores de Suporte , Algoritmos , Internet , Software

17.

On the formalization and reuse of scientific research.

King, Ross D; Liakata, Maria; Lu, Chuan; Oliver, Stephen G; Soldatova, Larisa N.

J R Soc Interface ; 8(63): 1440-8, 2011 Oct 07.

Artigo em Inglês | MEDLINE | ID: mdl-21490004

RESUMO

The reuse of scientific knowledge obtained from one investigation in another investigation is basic to the advance of science. Scientific investigations should therefore be recorded in ways that promote the reuse of the knowledge they generate. The use of logical formalisms to describe scientific knowledge has potential advantages in facilitating such reuse. Here, we propose a formal framework for using logical formalisms to promote reuse. We demonstrate the utility of this framework by using it in a worked example from biology: demonstrating cycles of investigation formalization [F] and reuse [R] to generate new knowledge. We first used logic to formally describe a Robot scientist investigation into yeast (Saccharomyces cerevisiae) functional genomics [f(1)]. With Robot scientists, unlike human scientists, the production of comprehensive metadata about their investigations is a natural by-product of the way they work. We then demonstrated how this formalism enabled the reuse of the research in investigating yeast phenotypes [r(1) = R(f(1))]. This investigation found that the removal of non-essential enzymes generally resulted in enhanced growth. The phenotype investigation was then formally described using the same logical formalism as the functional genomics investigation [f(2) = F(r(1))]. We then demonstrated how this formalism enabled the reuse of the phenotype investigation to investigate yeast systems-biology modelling [r(2) = R(f(2))]. This investigation found that yeast flux-balance analysis models fail to predict the observed changes in growth. Finally, the systems biology investigation was formalized for reuse in future investigations [f(3) = F(r(2))]. These cycles of reuse are a model for the general reuse of scientific knowledge.

Assuntos

Pesquisa Biomédica/métodos , Genômica/métodos , Disseminação de Informação/métodos , Saccharomyces cerevisiae/genética , Simulação por Computador , Proteínas Fúngicas/genética , Proteínas Fúngicas/metabolismo , Regulação Fúngica da Expressão Gênica/fisiologia , Modelos Teóricos , Biologia de Sistemas/métodos

18.

A comparison and user-based evaluation of models of textual information structure in the context of cancer risk assessment.

Guo, Yufan; Korhonen, Anna; Liakata, Maria; Silins, Ilona; Hogberg, Johan; Stenius, Ulla.

BMC Bioinformatics ; 12: 69, 2011 Mar 08.

Artigo em Inglês | MEDLINE | ID: mdl-21385430

RESUMO

BACKGROUND: Many practical tasks in biomedicine require accessing specific types of information in scientific literature; e.g. information about the results or conclusions of the study in question. Several schemes have been developed to characterize such information in scientific journal articles. For example, a simple section-based scheme assigns individual sentences in abstracts under sections such as Objective, Methods, Results and Conclusions. Some schemes of textual information structure have proved useful for biomedical text mining (BIO-TM) tasks (e.g. automatic summarization). However, user-centered evaluation in the context of real-life tasks has been lacking. METHODS: We take three schemes of different type and granularity--those based on section names, Argumentative Zones (AZ) and Core Scientific Concepts (CoreSC)--and evaluate their usefulness for a real-life task which focuses on biomedical abstracts: Cancer Risk Assessment (CRA). We annotate a corpus of CRA abstracts according to each scheme, develop classifiers for automatic identification of the schemes in abstracts, and evaluate both the manual and automatic classifications directly as well as in the context of CRA. RESULTS: Our results show that for each scheme, the majority of categories appear in abstracts, although two of the schemes (AZ and CoreSC) were developed originally for full journal articles. All the schemes can be identified in abstracts relatively reliably using machine learning. Moreover, when cancer risk assessors are presented with scheme annotated abstracts, they find relevant information significantly faster than when presented with unannotated abstracts, even when the annotations are produced using an automatic classifier. Interestingly, in this user-based evaluation the coarse-grained scheme based on section names proved nearly as useful for CRA as the finest-grained CoreSC scheme. CONCLUSIONS: We have shown that existing schemes aimed at capturing information structure of scientific documents can be applied to biomedical abstracts and can be identified in them automatically with an accuracy which is high enough to benefit a real-life task in biomedicine.

Assuntos

Inteligência Artificial , Mineração de Dados , Processamento Eletrônico de Dados/métodos , Neoplasias , Indexação e Redação de Resumos/classificação , Biologia Computacional/métodos , Humanos , Medição de Risco

19.

Enhancement of plant metabolite fingerprinting by machine learning.

Scott, Ian M; Vermeer, Cornelia P; Liakata, Maria; Corol, Delia I; Ward, Jane L; Lin, Wanchang; Johnson, Helen E; Whitehead, Lynne; Kular, Baldeep; Baker, John M; Walsh, Sean; Dave, Anuja; Larson, Tony R; Graham, Ian A; Wang, Trevor L; King, Ross D; Draper, John; Beale, Michael H.

Plant Physiol ; 153(4): 1506-20, 2010 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-20566707

RESUMO

Metabolite fingerprinting of Arabidopsis (Arabidopsis thaliana) mutants with known or predicted metabolic lesions was performed by (1)H-nuclear magnetic resonance, Fourier transform infrared, and flow injection electrospray-mass spectrometry. Fingerprinting enabled processing of five times more plants than conventional chromatographic profiling and was competitive for discriminating mutants, other than those affected in only low-abundance metabolites. Despite their rapidity and complexity, fingerprints yielded metabolomic insights (e.g. that effects of single lesions were usually not confined to individual pathways). Among fingerprint techniques, (1)H-nuclear magnetic resonance discriminated the most mutant phenotypes from the wild type and Fourier transform infrared discriminated the fewest. To maximize information from fingerprints, data analysis was crucial. One-third of distinctive phenotypes might have been overlooked had data models been confined to principal component analysis score plots. Among several methods tested, machine learning (ML) algorithms, namely support vector machine or random forest (RF) classifiers, were unsurpassed for phenotype discrimination. Support vector machines were often the best performing classifiers, but RFs yielded some particularly informative measures. First, RFs estimated margins between mutant phenotypes, whose relations could then be visualized by Sammon mapping or hierarchical clustering. Second, RFs provided importance scores for the features within fingerprints that discriminated mutants. These scores correlated with analysis of variance F values (as did Kruskal-Wallis tests, true- and false-positive measures, mutual information, and the Relief feature selection algorithm). ML classifiers, as models trained on one data set to predict another, were ideal for focused metabolomic queries, such as the distinctiveness and consistency of mutant phenotypes. Accessible software for use of ML in plant physiology is highlighted.

Assuntos

Arabidopsis/metabolismo , Inteligência Artificial , Metabolômica , Algoritmos , Análise por Conglomerados , Espectroscopia de Ressonância Magnética , Espectrometria de Massas , Fenótipo , Análise de Componente Principal , Espectroscopia de Infravermelho com Transformada de Fourier

20.

Towards Robot Scientists for autonomous scientific discovery.

Sparkes, Andrew; Aubrey, Wayne; Byrne, Emma; Clare, Amanda; Khan, Muhammed N; Liakata, Maria; Markham, Magdalena; Rowland, Jem; Soldatova, Larisa N; Whelan, Kenneth E; Young, Michael; King, Ross D.

Autom Exp ; 2: 1, 2010 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-20119518

RESUMO

We review the main components of autonomous scientific discovery, and how they lead to the concept of a Robot Scientist. This is a system which uses techniques from artificial intelligence to automate all aspects of the scientific discovery process: it generates hypotheses from a computer model of the domain, designs experiments to test these hypotheses, runs the physical experiments using robotic systems, analyses and interprets the resulting data, and repeats the cycle. We describe our two prototype Robot Scientists: Adam and Eve. Adam has recently proven the potential of such systems by identifying twelve genes responsible for catalysing specific reactions in the metabolic pathways of the yeast Saccharomyces cerevisiae. This work has been formally recorded in great detail using logic. We argue that the reporting of science needs to become fully formalised and that Robot Scientists can help achieve this. This will make scientific information more reproducible and reusable, and promote the integration of computers in scientific reasoning. We believe the greater automation of both the physical and intellectual aspects of scientific investigations to be essential to the future of science. Greater automation improves the accuracy and reliability of experiments, increases the pace of discovery and, in common with conventional laboratory automation, removes tedious and repetitive tasks from the human scientist.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA