RESUMO
Machine learning (ML) algorithms are powerful tools that are increasingly being used for sepsis biomarker discovery in RNA-Seq data. RNA-Seq datasets contain multiple sources and types of noise (operator, technical and non-systematic) that may bias ML classification. Normalisation and independent gene filtering approaches described in RNA-Seq workflows account for some of this variability and are typically only targeted at differential expression analysis rather than ML applications. Pre-processing normalisation steps significantly reduce the number of variables in the data and thereby increase the power of statistical testing, but can potentially discard valuable and insightful classification features. A systematic assessment of applying transcript level filtering on the robustness and stability of ML based RNA-seq classification remains to be fully explored. In this report we examine the impact of filtering out low count transcripts and those with influential outliers read counts on downstream ML analysis for sepsis biomarker discovery using elastic net regularised logistic regression, L1-reguarlised support vector machines and random forests. We demonstrate that applying a systematic objective strategy for removal of uninformative and potentially biasing biomarkers representing up to 60% of transcripts in different sample size datasets, including two illustrative neonatal sepsis cohorts, leads to substantial improvements in classification performance, higher stability of the resulting gene signatures, and better agreement with previously reported sepsis biomarkers. We also demonstrate that the performance uplift from gene filtering depends on the ML classifier chosen, with L1-regularlised support vector machines showing the greatest performance improvements with our experimental data.
RESUMO
INTRODUCTION: Early recognition and appropriate management of paediatric sepsis are known to improve outcomes. A previous system's biology investigation of the systemic immune response in neonates to sepsis identified immune and metabolic markers that showed high accuracy for detecting bacterial infection. Further gene expression markers have also been reported previously in the paediatric age group for discriminating sepsis from control cases. More recently, specific gene signatures were identified to discriminate between COVID-19 and its associated inflammatory sequelae. Through the current prospective cohort study, we aim to evaluate immune and metabolic blood markers which discriminate between sepses (including COVID-19) from other acute illnesses in critically unwell children and young persons, up to 18 years of age. METHODS AND ANALYSIS: We describe a prospective cohort study for comparing the immune and metabolic whole-blood markers in patients with sepsis, COVID-19 and other illnesses. Clinical phenotyping and blood culture test results will provide a reference standard to evaluate the performance of blood markers from the research sample analysis. Serial sampling of whole blood (50 µL each) will be collected from children admitted to intensive care and with an acute illness to follow time dependent changes in biomarkers. An integrated lipidomics and RNASeq transcriptomics analyses will be conducted to evaluate immune-metabolic networks that discriminate sepsis and COVID-19 from other acute illnesses. This study received approval for deferred consent. ETHICS AND DISSEMINATION: The study has received research ethics committee approval from the Yorkshire and Humber Leeds West Research Ethics Committee 2 (reference 20/YH/0214; IRAS reference 250612). Submission of study results for publication will involve making available all anonymised primary and processed data on public repository sites. TRIAL REGISTRATION NUMBER: NCT04904523.
Assuntos
COVID-19 , Sepse , Adolescente , Criança , Humanos , Recém-Nascido , Doença Aguda , COVID-19/diagnóstico , Estudos Prospectivos , SARS-CoV-2 , Sepse/diagnósticoRESUMO
Social Media are sensors in the real world that can be used to measure the pulse of societies. However, the massive and unfiltered feed of messages posted in social media is a phenomenon that nowadays raises social alarms, especially when these messages contain hate speech targeted to a specific individual or group. In this context, governments and non-governmental organizations (NGOs) are concerned about the possible negative impact that these messages can have on individuals or on the society. In this paper, we present HaterNet, an intelligent system currently being used by the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that identifies and monitors the evolution of hate speech in Twitter. The contributions of this research are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification approaches based on different document representation strategies and text classification models. (4) The best approach consists of a combination of a LTSM+MLP neural network that takes as input the tweet's word, emoji, and expression tokens' embeddings enriched by the tf-idf, and obtains an area under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the literature.
RESUMO
[This corrects the article DOI: 10.1371/journal.pone.0217914.].
RESUMO
OBJECTIVES: This paper focuses on the issue of intimate partner violence and, specifically, on the distribution of femicides over time and the existence of copycat effects. This is the subject of an ongoing debate often triggered by the social alarm following multiple intimate partner homicides (IPHs) occurring in a short span of time. The aim of this research is to study the evolution of IPHs and provide a far-reaching answer by rigorously analyzing and searching for patterns in data on femicides. METHODS: The study analyzes an official dataset, provided by the system VioGén of the Secretaría de Estado de Seguridad (Spanish State Secretariat for Security), including all the femicides occurred in Spain in 2007-2017. A statistical methodology to identify temporal interdependencies in count time series is proposed and applied to the dataset. The same methodology can be applied to other contexts. RESULTS: There has been a decreasing trend in the number of femicides per year. No interdependencies among the temporal distribution of femicides are observed. Therefore, according to data, the existence of copycat effect in femicides cannot be claimed. CONCLUSIONS: Around 2011 there was a clear change in the average number of femicides which has not picked up. Results allow for an informed answer to the debate on copycat effect in Spanish femicides. The planning of femicides prevention activities should not be a reaction to a perceived increase in their occurrence. As a copycat effect is not detected in the studied time period, there is no evidence supporting the need to censor media reports on femicides.