Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
J Med Internet Res ; 25: e42734, 2023 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-37155236

RESUMO

BACKGROUND: The use of social media data to predict mental health outcomes has the potential to allow for the continuous monitoring of mental health and well-being and provide timely information that can supplement traditional clinical assessments. However, it is crucial that the methodologies used to create models for this purpose are of high quality from both a mental health and machine learning perspective. Twitter has been a popular choice of social media because of the accessibility of its data, but access to big data sets is not a guarantee of robust results. OBJECTIVE: This study aims to review the current methodologies used in the literature for predicting mental health outcomes from Twitter data, with a focus on the quality of the underlying mental health data and the machine learning methods used. METHODS: A systematic search was performed across 6 databases, using keywords related to mental health disorders, algorithms, and social media. In total, 2759 records were screened, of which 164 (5.94%) papers were analyzed. Information about methodologies for data acquisition, preprocessing, model creation, and validation was collected, as well as information about replicability and ethical considerations. RESULTS: The 164 studies reviewed used 119 primary data sets. There were an additional 8 data sets identified that were not described in enough detail to include, and 6.1% (10/164) of the papers did not describe their data sets at all. Of these 119 data sets, only 16 (13.4%) had access to ground truth data (ie, known characteristics) about the mental health disorders of social media users. The other 86.6% (103/119) of data sets collected data by searching keywords or phrases, which may not be representative of patterns of Twitter use for those with mental health disorders. The annotation of mental health disorders for classification labels was variable, and 57.1% (68/119) of the data sets had no ground truth or clinical input on this annotation. Despite being a common mental health disorder, anxiety received little attention. CONCLUSIONS: The sharing of high-quality ground truth data sets is crucial for the development of trustworthy algorithms that have clinical and research utility. Further collaboration across disciplines and contexts is encouraged to better understand what types of predictions will be useful in supporting the management and identification of mental health disorders. A series of recommendations for researchers in this field and for the wider research community are made, with the aim of enhancing the quality and utility of future outputs.


Assuntos
Saúde Mental , Mídias Sociais , Humanos , Algoritmos , Transtornos de Ansiedade , Aprendizado de Máquina
2.
Int J Epidemiol ; 52(3): 952-957, 2023 06 06.
Artigo em Inglês | MEDLINE | ID: mdl-36847716

RESUMO

MOTIVATION: Social media represent an unrivalled opportunity for epidemiological cohorts to collect large amounts of high-resolution time course data on mental health. Equally, the high-quality data held by epidemiological cohorts could greatly benefit social media research as a source of ground truth for validating digital phenotyping algorithms. However, there is currently a lack of software for doing this in a secure and acceptable manner. We worked with cohort leaders and participants to co-design an open-source, robust and expandable software framework for gathering social media data in epidemiological cohorts. IMPLEMENTATION: Epicosm is implemented as a Python framework that is straightforward to deploy and run inside a cohort's data safe haven. GENERAL FEATURES: The software regularly gathers Tweets from a list of accounts and stores them in a database for linking to existing cohort data. AVAILABILITY: This open-source software is freely available at [https://dynamicgenetics.github.io/Epicosm/].


Assuntos
Mídias Sociais , Humanos , Software , Algoritmos , Confiabilidade dos Dados , Bases de Dados Factuais
3.
Patterns (N Y) ; 3(7): 100537, 2022 Jul 08.
Artigo em Inglês | MEDLINE | ID: mdl-35845834

RESUMO

Awareness and management of ethical issues in data science are becoming crucial skills for data scientists. Discussion of contemporary issues in collaborative and interdisciplinary spaces is an engaging way to allow data-science work to be influenced by those with expertise in sociological fields and so improve the ability of data scientists to think critically about the ethics of their work. However, opportunities to do so are limited. Data Ethics Club is a fortnightly discussion group about data science and ethics whose community-generated resources are hosted publicly online. These include a collaborative list of materials around topics of interest and guides for leading an online data-ethics discussion group. Our meetings and resources are designed to reduce the barriers to learning, reflection, and critique on data science and ethics, with the broader aim of building ethics into the cultural fabric of quality data-science work.

4.
Int J Popul Data Sci ; 5(4): 1409, 2021 Apr 19.
Artigo em Inglês | MEDLINE | ID: mdl-34007892

RESUMO

BACKGROUND: Disasters such as the COVID-19 pandemic pose an overwhelming demand on resources that cannot always be met by official organisations. Limited resources and human response to crises can lead members of local communities to turn to one another to fulfil immediate needs. This spontaneous citizen-led response can be crucial to a community's ability to cope in a crisis. It is thus essential to understand the scope of such initiatives so that support can be provided where it is most needed. Nevertheless, quickly developing situations and varying definitions can make the community response challenging to measure. AIM: To create an accessible interactive map of the citizen-led community response to need during the COVID-19 pandemic in Wales, UK that combines information gathered from multiple data providers to reflect different interpretations of need and support. APPROACH: We gathered data from a combination of official data providers and community-generated sources to create 14 variables representative of need and support. These variables are derived by a reproducible data pipeline that enables flexible integration of new data. The interactive tool is available online (www.covidresponsemap.wales) and can map available data at two geographic resolutions. Users choose their variables of interest, and interpretation of the map is aided by a linked bee-swarm plot. DISCUSSION: The novel approach we developed enables people at all levels of community response to explore and analyse the distribution of need and support across Wales. While there can be limitations to the accuracy of community-generated data, we demonstrate that they can be effectively used alongside traditional data sources to maximise the understanding of community action. This adds to our overall aim to measure community response and resilience, as well as to make complex population health data accessible to a range of audiences. Future developments include the integration of other factors such as well-being.

5.
Sci Data ; 7(1): 234, 2020 07 13.
Artigo em Inglês | MEDLINE | ID: mdl-32661247

RESUMO

We introduce TAASRAD19, a high-resolution radar reflectivity dataset collected by the Civil Protection weather radar of the Trentino South Tyrol Region, in the Italian Alps. The dataset includes 894,916 timesteps of precipitation from more than 9 years of data, offering a novel resource to develop and benchmark analog ensemble models and machine learning solutions for precipitation nowcasting. Data are expressed as 2D images, considering the maximum reflectivity on the vertical section at 5 min sampling rate, covering an area of 240 km of diameter at 500 m horizontal resolution. The TAASRAD19 distribution also includes a curated set of 1,732 sequences, for a total of 362,233 radar images, labeled with precipitation type tags assigned by expert meteorologists. We validate TAASRAD19 as a benchmark for nowcasting methods by introducing a TrajGRU deep learning model to forecast reflectivity, and a procedure based on the UMAP dimensionality reduction algorithm for interactive exploration. Software methods for data pre-processing, model training and inference, and a pre-trained model are publicly available on GitHub ( https://github.com/MPBA/TAASRAD19 ) for study replication and reproducibility.

6.
PLoS Comput Biol ; 15(3): e1006269, 2019 03.
Artigo em Inglês | MEDLINE | ID: mdl-30917113

RESUMO

Artificial Intelligence is exponentially increasing its impact on healthcare. As deep learning is mastering computer vision tasks, its application to digital pathology is natural, with the promise of aiding in routine reporting and standardizing results across trials. Deep learning features inferred from digital pathology scans can improve validity and robustness of current clinico-pathological features, up to identifying novel histological patterns, e.g., from tumor infiltrating lymphocytes. In this study, we examine the issue of evaluating accuracy of predictive models from deep learning features in digital pathology, as an hallmark of reproducibility. We introduce the DAPPER framework for validation based on a rigorous Data Analysis Plan derived from the FDA's MAQC project, designed to analyze causes of variability in predictive biomarkers. We apply the framework on models that identify tissue of origin on 787 Whole Slide Images from the Genotype-Tissue Expression (GTEx) project. We test three different deep learning architectures (VGG, ResNet, Inception) as feature extractors and three classifiers (a fully connected multilayer, Support Vector Machine and Random Forests) and work with four datasets (5, 10, 20 or 30 classes), for a total of 53, 000 tiles at 512 × 512 resolution. We analyze accuracy and feature stability of the machine learning classifiers, also demonstrating the need for diagnostic tests (e.g., random labels) to identify selection bias and risks for reproducibility. Further, we use the deep features from the VGG model from GTEx on the KIMIA24 dataset for identification of slide of origin (24 classes) to train a classifier on 1, 060 annotated tiles and validated on 265 unseen ones. The DAPPER software, including its deep learning pipeline and the Histological Imaging-Newsy Tiles (HINT) benchmark dataset derived from GTEx, is released as a basis for standardization and validation initiatives in AI for digital pathology.


Assuntos
Algoritmos , Inteligência Artificial , Técnicas Histológicas/métodos , Interpretação de Imagem Assistida por Computador/métodos , Software , Humanos , Reprodutibilidade dos Testes
7.
PLoS One ; 13(12): e0208924, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30532223

RESUMO

We introduce the CDRP (Concatenated Diagnostic-Relapse Prognostic) architecture for multi-task deep learning that incorporates a clinical algorithm, e.g., a risk stratification schema to improve prognostic profiling. We present the first application to survival prediction in High-Risk (HR) Neuroblastoma from transcriptomics data, a task that studies from the MAQC consortium have shown to remain the hardest among multiple diagnostic and prognostic endpoints predictable from the same dataset. To obtain a more accurate risk stratification needed for appropriate treatment strategies, CDRP combines a first component (CDRP-A) synthesizing a diagnostic task and a second component (CDRP-N) dedicated to one or more prognostic tasks. The approach leverages the advent of semi-supervised deep learning structures that can flexibly integrate multimodal data or internally create multiple processing paths. CDRP-A is an autoencoder trained on gene expression on the HR/non-HR risk stratification by the Children's Oncology Group, obtaining a 64-node representation in the bottleneck layer. CDRP-N is a multi-task classifier for two prognostic endpoints, i.e., Event-Free Survival (EFS) and Overall Survival (OS). CDRP-A provides the HR embedding input to the CDRP-N shared layer, from which two branches depart to model EFS and OS, respectively. To control for selection bias, CDRP is trained and evaluated using a Data Analysis Protocol (DAP) developed within the MAQC initiative. CDRP was applied on Illumina RNA-Seq of 498 Neuroblastoma patients (HR: 176) from the SEQC study (12,464 Entrez genes) and on Affymetrix Human Exon Array expression profiles (17,450 genes) of 247 primary diagnostic Neuroblastoma of the TARGET NBL cohort. On the SEQC HR patients, CDRP achieves Matthews Correlation Coefficient (MCC) 0.38 for EFS and MCC = 0.19 for OS in external validation, improving over published SEQC models. We show that a CDRP-N embedding is indeed parametrically associated to increasing severity and the embedding can be used to better stratify patients' survival.


Assuntos
Aprendizado Profundo , Recidiva Local de Neoplasia/diagnóstico , Neuroblastoma/diagnóstico , Prognóstico , Algoritmos , Criança , Feminino , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica/genética , Humanos , Lactente , Masculino , Recidiva Local de Neoplasia/epidemiologia , Recidiva Local de Neoplasia/genética , Recidiva Local de Neoplasia/patologia , Neuroblastoma/epidemiologia , Neuroblastoma/genética , Neuroblastoma/patologia , Intervalo Livre de Progressão , Medição de Risco
8.
BMC Bioinformatics ; 19(Suppl 2): 49, 2018 03 08.
Artigo em Inglês | MEDLINE | ID: mdl-29536822

RESUMO

BACKGROUND: Convolutional Neural Networks can be effectively used only when data are endowed with an intrinsic concept of neighbourhood in the input space, as is the case of pixels in images. We introduce here Ph-CNN, a novel deep learning architecture for the classification of metagenomics data based on the Convolutional Neural Networks, with the patristic distance defined on the phylogenetic tree being used as the proximity measure. The patristic distance between variables is used together with a sparsified version of MultiDimensional Scaling to embed the phylogenetic tree in a Euclidean space. RESULTS: Ph-CNN is tested with a domain adaptation approach on synthetic data and on a metagenomics collection of gut microbiota of 38 healthy subjects and 222 Inflammatory Bowel Disease patients, divided in 6 subclasses. Classification performance is promising when compared to classical algorithms like Support Vector Machines and Random Forest and a baseline fully connected neural network, e.g. the Multi-Layer Perceptron. CONCLUSION: Ph-CNN represents a novel deep learning approach for the classification of metagenomics data. Operatively, the algorithm has been implemented as a custom Keras layer taking care of passing to the following convolutional layer not only the data but also the ranked list of neighbourhood of each sample, thus mimicking the case of image data, transparently to the user.


Assuntos
Metagenômica , Redes Neurais de Computação , Filogenia , Algoritmos , Análise de Dados , Bases de Dados Genéticas , Humanos , Doenças Inflamatórias Intestinais/genética , Análise de Componente Principal , Reprodutibilidade dos Testes , Máquina de Vetores de Suporte
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA