Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 58
Filtrar
1.
Genome Res ; 33(10): 1734-1746, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37879860

RESUMO

Although it is ubiquitous in genomics, the current human reference genome (GRCh38) is incomplete: It is missing large sections of heterochromatic sequence, and as a singular, linear reference genome, it does not represent the full spectrum of human genetic diversity. To characterize gaps in GRCh38 and human genetic diversity, we developed an algorithm for sequence location approximation using nuclear families (ASLAN) to identify the region of origin of reads that do not align to GRCh38. Using unmapped reads and variant calls from whole-genome sequences (WGSs), ASLAN uses a maximum likelihood model to identify the most likely region of the genome that a subsequence belongs to given the distribution of the subsequence in the unmapped reads and phasings of families. Validating ASLAN on synthetic data and on reads from the alternative haplotypes in the decoy genome, ASLAN localizes >90% of 100-bp sequences with >92% accuracy and ∼1 Mb of resolution. We then ran ASLAN on 100-mers from unmapped reads from WGS from more than 700 families, and compared ASLAN localizations to alignment of the 100-mers to the recently released T2T-CHM13 assembly. We found that many unmapped reads in GRCh38 originate from telomeres and centromeres that are gaps in GRCh38. ASLAN localizations are in high concordance with T2T-CHM13 alignments, except in the centromeres of the acrocentric chromosomes. Comparing ASLAN localizations and T2T-CHM13 alignments, we identified sequences missing from T2T-CHM13 or sequences with high divergence from their aligned region in T2T-CHM13, highlighting new hotspots for genetic diversity.


Assuntos
Genoma Humano , Genômica , Humanos , Algoritmos , Telômero/genética , Variação Genética , Análise de Sequência de DNA
2.
Genome Res ; 33(10): 1747-1756, 2023 10.
Artigo em Inglês | MEDLINE | ID: mdl-37879861

RESUMO

Large, whole-genome sequencing (WGS) data sets containing families provide an important opportunity to identify crossovers and shared genetic material in siblings. However, the high variant calling error rates of WGS in some areas of the genome can result in spurious crossover calls, and the special inheritance status of the X Chromosome presents challenges. We have developed a hidden Markov model that addresses these issues by modeling the inheritance of variants in families in the presence of error-prone regions and inherited deletions. We call our method PhasingFamilies. We validate PhasingFamilies using the platinum genome family NA1281 (precision: 0.81; recall: 0.97), as well as simulated genomes with known crossover positions (precision: 0.93; recall: 0.92). Using 1925 quads from the Simons Simplex Collection, we found that PhasingFamilies resolves crossovers to a median resolution of 3527.5 bp. These crossovers recapitulate existing recombination rate maps, including for the X Chromosome; produce sibling pair IBD that matches expected distributions; and are validated by the haplotype estimation tool SHAPEIT. We provide an efficient, open-source implementation of PhasingFamilies that can be used to identify crossovers from family sequencing data.


Assuntos
Genoma , Padrões de Herança , Humanos , Sequenciamento Completo do Genoma , Haplótipos
3.
J Med Internet Res ; 26: e51138, 2024 Apr 11.
Artigo em Inglês | MEDLINE | ID: mdl-38602750

RESUMO

Modern machine learning approaches have led to performant diagnostic models for a variety of health conditions. Several machine learning approaches, such as decision trees and deep neural networks, can, in principle, approximate any function. However, this power can be considered to be both a gift and a curse, as the propensity toward overfitting is magnified when the input data are heterogeneous and high dimensional and the output class is highly nonlinear. This issue can especially plague diagnostic systems that predict behavioral and psychiatric conditions that are diagnosed with subjective criteria. An emerging solution to this issue is crowdsourcing, where crowd workers are paid to annotate complex behavioral features in return for monetary compensation or a gamified experience. These labels can then be used to derive a diagnosis, either directly or by using the labels as inputs to a diagnostic machine learning model. This viewpoint describes existing work in this emerging field and discusses ongoing challenges and opportunities with crowd-powered diagnostic systems, a nascent field of study. With the correct considerations, the addition of crowdsourcing to human-in-the-loop machine learning workflows for the prediction of complex and nuanced health conditions can accelerate screening, diagnostics, and ultimately access to care.


Assuntos
Crowdsourcing , Transtornos Mentais , Humanos , Medicina de Precisão , Fluxo de Trabalho , Aprendizado de Máquina
4.
Proc Natl Acad Sci U S A ; 116(12): 5411-5419, 2019 03 19.
Artigo em Inglês | MEDLINE | ID: mdl-30824592

RESUMO

Recent advancements in life-science instrumentation and automation enable entirely new modes of human interaction with microbiological processes and corresponding applications for science and education through biology cloud laboratories. A critical barrier for remote and on-site life-science experimentation (for both experts and nonexperts alike) is the absence of suitable abstractions and interfaces for programming living matter. To this end we conceptualize a programming paradigm that provides stimulus and sensor control functions for real-time manipulation of physical biological matter. Additionally, a simulation mode facilitates higher user throughput, program debugging, and biophysical modeling. To evaluate this paradigm, we implemented a JavaScript-based web toolkit, "Bioty," that supports real-time interaction with swarms of phototactic Euglena cells hosted on a cloud laboratory. Studies with remote and on-site users demonstrate that individuals with little to no biology knowledge and intermediate programming knowledge were able to successfully create and use scientific applications and games. This work informs the design of programming environments for controlling living matter in general, for living material microfabrication and swarm robotics applications, and for lowering the access barriers to the life sciences for professional and citizen scientists, learners, and the lay public.


Assuntos
Computação em Nuvem , Interface Usuário-Computador , Biologia/métodos , Humanos , Software
5.
J Med Internet Res ; 24(2): e31830, 2022 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-35166683

RESUMO

BACKGROUND: Autism spectrum disorder (ASD) is a widespread neurodevelopmental condition with a range of potential causes and symptoms. Standard diagnostic mechanisms for ASD, which involve lengthy parent questionnaires and clinical observation, often result in long waiting times for results. Recent advances in computer vision and mobile technology hold potential for speeding up the diagnostic process by enabling computational analysis of behavioral and social impairments from home videos. Such techniques can improve objectivity and contribute quantitatively to the diagnostic process. OBJECTIVE: In this work, we evaluate whether home videos collected from a game-based mobile app can be used to provide diagnostic insights into ASD. To the best of our knowledge, this is the first study attempting to identify potential social indicators of ASD from mobile phone videos without the use of eye-tracking hardware, manual annotations, and structured scenarios or clinical environments. METHODS: Here, we used a mobile health app to collect over 11 hours of video footage depicting 95 children engaged in gameplay in a natural home environment. We used automated data set annotations to analyze two social indicators that have previously been shown to differ between children with ASD and their neurotypical (NT) peers: (1) gaze fixation patterns, which represent regions of an individual's visual focus and (2) visual scanning methods, which refer to the ways in which individuals scan their surrounding environment. We compared the gaze fixation and visual scanning methods used by children during a 90-second gameplay video to identify statistically significant differences between the 2 cohorts; we then trained a long short-term memory (LSTM) neural network to determine if gaze indicators could be predictive of ASD. RESULTS: Our results show that gaze fixation patterns differ between the 2 cohorts; specifically, we could identify 1 statistically significant region of fixation (P<.001). In addition, we also demonstrate that there are unique visual scanning patterns that exist for individuals with ASD when compared to NT children (P<.001). A deep learning model trained on coarse gaze fixation annotations demonstrates mild predictive power in identifying ASD. CONCLUSIONS: Ultimately, our study demonstrates that heterogeneous video data sets collected from mobile devices hold potential for quantifying visual patterns and providing insights into ASD. We show the importance of automated labeling techniques in generating large-scale data sets while simultaneously preserving the privacy of participants, and we demonstrate that specific social engagement indicators associated with ASD can be identified and characterized using such data.


Assuntos
Transtorno do Espectro Autista , Aplicativos Móveis , Transtorno do Espectro Autista/diagnóstico , Criança , Computadores de Mão , Fixação Ocular , Humanos , Participação Social
6.
BMC Bioinformatics ; 22(1): 509, 2021 Oct 19.
Artigo em Inglês | MEDLINE | ID: mdl-34666677

RESUMO

BACKGROUND: Sequencing partial 16S rRNA genes is a cost effective method for quantifying the microbial composition of an environment, such as the human gut. However, downstream analysis relies on binning reads into microbial groups by either considering each unique sequence as a different microbe, querying a database to get taxonomic labels from sequences, or clustering similar sequences together. However, these approaches do not fully capture evolutionary relationships between microbes, limiting the ability to identify differentially abundant groups of microbes between a diseased and control cohort. We present sequence-based biomarkers (SBBs), an aggregation method that groups and aggregates microbes using single variants and combinations of variants within their 16S sequences. We compare SBBs against other existing aggregation methods (OTU clustering and Microphenoor DiTaxa features) in several benchmarking tasks: biomarker discovery via permutation test, biomarker discovery via linear discriminant analysis, and phenotype prediction power. We demonstrate the SBBs perform on-par or better than the state-of-the-art methods in biomarker discovery and phenotype prediction. RESULTS: On two independent datasets, SBBs identify differentially abundant groups of microbes with similar or higher statistical significance than existing methods in both a permutation-test-based analysis and using linear discriminant analysis effect size. . By grouping microbes by SBB, we can identify several differentially abundant microbial groups (FDR <.1) between children with autism and neurotypical controls in a set of 115 discordant siblings. Porphyromonadaceae, Ruminococcaceae, and an unnamed species of Blastocystis were significantly enriched in autism, while Veillonellaceae was significantly depleted. Likewise, aggregating microbes by SBB on a dataset of obese and lean twins, we find several significantly differentially abundant microbial groups (FDR<.1). We observed Megasphaera andSutterellaceae highly enriched in obesity, and Phocaeicola significantly depleted. SBBs also perform on bar with or better than existing aggregation methods as features in a phenotype prediction model, predicting the autism phenotype with an ROC-AUC score of .64 and the obesity phenotype with an ROC-AUC score of .84. CONCLUSIONS: SBBs provide a powerful method for aggregating microbes to perform differential abundance analysis as well as phenotype prediction. Our source code can be freely downloaded from http://github.com/briannachrisman/16s_biomarkers .


Assuntos
Microbioma Gastrointestinal , Biomarcadores , Análise por Conglomerados , Microbioma Gastrointestinal/genética , Humanos , RNA Ribossômico 16S/genética , Software
7.
BMC Bioinformatics ; 21(1): 356, 2020 Aug 12.
Artigo em Inglês | MEDLINE | ID: mdl-32787845

RESUMO

BACKGROUND: Complex human health conditions with etiological heterogeneity like Autism Spectrum Disorder (ASD) often pose a challenge for traditional genome-wide association study approaches in defining a clear genotype to phenotype model. Coalitional game theory (CGT) is an exciting method that can consider the combinatorial effect of groups of variants working in concert to produce a phenotype. CGT has been applied to associate likely-gene-disrupting variants encoded from whole genome sequence data to ASD; however, this previous approach cannot take into account for prior biological knowledge. Here we extend CGT to incorporate a priori knowledge from biological networks through a game theoretic centrality measure based on Shapley value to rank genes by their relevance-the individual gene's synergistic influence in a gene-to-gene interaction network. Game theoretic centrality extends the notion of Shapley value to the evaluation of a gene's contribution to the overall connectivity of its corresponding node in a biological network. RESULTS: We implemented and applied game theoretic centrality to rank genes on whole genomes from 756 multiplex autism families. Top ranking genes with the highest game theoretic centrality in both the weighted and unweighted approaches were enriched for pathways previously associated with autism, including pathways of the immune system. Four of the selected genes HLA-A, HLA-B, HLA-G, and HLA-DRB1-have also been implicated in ASD and further support the link between ASD and the human leukocyte antigen complex. CONCLUSIONS: Game theoretic centrality can prioritize influential, disease-associated genes within biological networks, and assist in the decoding of polygenic associations to complex disorders like autism.


Assuntos
Algoritmos , Teoria dos Jogos , Redes Reguladoras de Genes , Estudos de Associação Genética , Transtorno do Espectro Autista/genética , Estudo de Associação Genômica Ampla , Humanos , Mapeamento de Interação de Proteínas , Reprodutibilidade dos Testes
8.
J Med Internet Res ; 21(7): e13094, 2019 07 10.
Artigo em Inglês | MEDLINE | ID: mdl-31293243

RESUMO

BACKGROUND: Autism affects 1 in every 59 children in the United States, according to estimates from the Centers for Disease Control and Prevention's Autism and Developmental Disabilities Monitoring Network in 2018. Although similar rates of autism are reported in rural and urban areas, rural families report greater difficulty in accessing resources. An overwhelming number of families experience long waitlists for diagnostic and therapeutic services. OBJECTIVE: The objective of this study was to accurately identify gaps in access to autism care using GapMap, a mobile platform that connects families with local resources while continuously collecting up-to-date autism resource epidemiological information. METHODS: After being extracted from various databases, resources were deduplicated, validated, and allocated into 7 categories based on the keywords identified on the resource website. The average distance between the individuals from a simulated autism population and the nearest autism resource in our database was calculated for each US county. Resource load, an approximation of demand over supply for diagnostic resources, was calculated for each US county. RESULTS: There are approximately 28,000 US resources validated on the GapMap database, each allocated into 1 or more of the 7 categories. States with the greatest distances to autism resources included Alaska, Nevada, Wyoming, Montana, and Arizona. Of the 7 resource categories, diagnostic resources were the most underrepresented, comprising only 8.83% (2472/28,003) of all resources. Alarmingly, 83.86% (2635/3142) of all US counties lacked any diagnostic resources. States with the highest diagnostic resource load included West Virginia, Kentucky, Maine, Mississippi, and New Mexico. CONCLUSIONS: Results from this study demonstrate the sparsity and uneven distribution of diagnostic resources in the United States, which may contribute to the lengthy waitlists and travel distances-barriers to be overcome to be able to receive diagnosis in specific regions. More data are needed on autism diagnosis demand to better quantify resource needs across the United States.


Assuntos
Transtorno Autístico/terapia , Crowdsourcing/métodos , Transtorno Autístico/epidemiologia , Criança , Feminino , Humanos , Masculino , Estados Unidos
9.
J Med Internet Res ; 21(4): e13822, 2019 04 24.
Artigo em Inglês | MEDLINE | ID: mdl-31017583

RESUMO

BACKGROUND: Autism spectrum disorder (ASD) is currently diagnosed using qualitative methods that measure between 20-100 behaviors, can span multiple appointments with trained clinicians, and take several hours to complete. In our previous work, we demonstrated the efficacy of machine learning classifiers to accelerate the process by collecting home videos of US-based children, identifying a reduced subset of behavioral features that are scored by untrained raters using a machine learning classifier to determine children's "risk scores" for autism. We achieved an accuracy of 92% (95% CI 88%-97%) on US videos using a classifier built on five features. OBJECTIVE: Using videos of Bangladeshi children collected from Dhaka Shishu Children's Hospital, we aim to scale our pipeline to another culture and other developmental delays, including speech and language conditions. METHODS: Although our previously published and validated pipeline and set of classifiers perform reasonably well on Bangladeshi videos (75% accuracy, 95% CI 71%-78%), this work improves on that accuracy through the development and application of a powerful new technique for adaptive aggregation of crowdsourced labels. We enhance both the utility and performance of our model by building two classification layers: The first layer distinguishes between typical and atypical behavior, and the second layer distinguishes between ASD and non-ASD. In each of the layers, we use a unique rater weighting scheme to aggregate classification scores from different raters based on their expertise. We also determine Shapley values for the most important features in the classifier to understand how the classifiers' process aligns with clinical intuition. RESULTS: Using these techniques, we achieved an accuracy (area under the curve [AUC]) of 76% (SD 3%) and sensitivity of 76% (SD 4%) for identifying atypical children from among developmentally delayed children, and an accuracy (AUC) of 85% (SD 5%) and sensitivity of 76% (SD 6%) for identifying children with ASD from those predicted to have other developmental delays. CONCLUSIONS: These results show promise for using a mobile video-based and machine learning-directed approach for early and remote detection of autism in Bangladeshi children. This strategy could provide important resources for developmental health in developing countries with few clinical resources for diagnosis, helping children get access to care at an early age. Future research aimed at extending the application of this approach to identify a range of other conditions and determine the population-level burden of developmental disabilities and impairments will be of high value.


Assuntos
Transtorno do Espectro Autista/diagnóstico , Deficiências do Desenvolvimento/diagnóstico , Aprendizado de Máquina/normas , Gravação em Vídeo/métodos , Bangladesh , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Estudos de Validação como Assunto
10.
J Med Internet Res ; 21(5): e13668, 2019 05 23.
Artigo em Inglês | MEDLINE | ID: mdl-31124463

RESUMO

BACKGROUND: Obtaining a diagnosis of neuropsychiatric disorders such as autism requires long waiting times that can exceed a year and can be prohibitively expensive. Crowdsourcing approaches may provide a scalable alternative that can accelerate general access to care and permit underserved populations to obtain an accurate diagnosis. OBJECTIVE: We aimed to perform a series of studies to explore whether paid crowd workers on Amazon Mechanical Turk (AMT) and citizen crowd workers on a public website shared on social media can provide accurate online detection of autism, conducted via crowdsourced ratings of short home video clips. METHODS: Three online studies were performed: (1) a paid crowdsourcing task on AMT (N=54) where crowd workers were asked to classify 10 short video clips of children as "Autism" or "Not autism," (2) a more complex paid crowdsourcing task (N=27) with only those raters who correctly rated ≥8 of the 10 videos during the first study, and (3) a public unpaid study (N=115) identical to the first study. RESULTS: For Study 1, the mean score of the participants who completed all questions was 7.50/10 (SD 1.46). When only analyzing the workers who scored ≥8/10 (n=27/54), there was a weak negative correlation between the time spent rating the videos and the sensitivity (ρ=-0.44, P=.02). For Study 2, the mean score of the participants rating new videos was 6.76/10 (SD 0.59). The average deviation between the crowdsourced answers and gold standard ratings provided by two expert clinical research coordinators was 0.56, with an SD of 0.51 (maximum possible SD is 3). All paid crowd workers who scored 8/10 in Study 1 either expressed enjoyment in performing the task in Study 2 or provided no negative comments. For Study 3, the mean score of the participants who completed all questions was 6.67/10 (SD 1.61). There were weak correlations between age and score (r=0.22, P=.014), age and sensitivity (r=-0.19, P=.04), number of family members with autism and sensitivity (r=-0.195, P=.04), and number of family members with autism and precision (r=-0.203, P=.03). A two-tailed t test between the scores of the paid workers in Study 1 and the unpaid workers in Study 3 showed a significant difference (P<.001). CONCLUSIONS: Many paid crowd workers on AMT enjoyed answering screening questions from videos, suggesting higher intrinsic motivation to make quality assessments. Paid crowdsourcing provides promising screening assessments of pediatric autism with an average deviation <20% from professional gold standard raters, which is potentially a clinically informative estimate for parents. Parents of children with autism likely overfit their intuition to their own affected child. This work provides preliminary demographic data on raters who may have higher ability to recognize and measure features of autism across its wide range of phenotypic manifestations.


Assuntos
Transtorno do Espectro Autista/diagnóstico , Crowdsourcing/métodos , Coleta de Dados/métodos , Testes Diagnósticos de Rotina/métodos , Programas de Rastreamento/métodos , Adulto , Pré-Escolar , Humanos , Internet , Mídias Sociais
12.
PLoS Med ; 15(11): e1002705, 2018 11.
Artigo em Inglês | MEDLINE | ID: mdl-30481180

RESUMO

BACKGROUND: The standard approaches to diagnosing autism spectrum disorder (ASD) evaluate between 20 and 100 behaviors and take several hours to complete. This has in part contributed to long wait times for a diagnosis and subsequent delays in access to therapy. We hypothesize that the use of machine learning analysis on home video can speed the diagnosis without compromising accuracy. We have analyzed item-level records from 2 standard diagnostic instruments to construct machine learning classifiers optimized for sparsity, interpretability, and accuracy. In the present study, we prospectively test whether the features from these optimized models can be extracted by blinded nonexpert raters from 3-minute home videos of children with and without ASD to arrive at a rapid and accurate machine learning autism classification. METHODS AND FINDINGS: We created a mobile web portal for video raters to assess 30 behavioral features (e.g., eye contact, social smile) that are used by 8 independent machine learning models for identifying ASD, each with >94% accuracy in cross-validation testing and subsequent independent validation from previous work. We then collected 116 short home videos of children with autism (mean age = 4 years 10 months, SD = 2 years 3 months) and 46 videos of typically developing children (mean age = 2 years 11 months, SD = 1 year 2 months). Three raters blind to the diagnosis independently measured each of the 30 features from the 8 models, with a median time to completion of 4 minutes. Although several models (consisting of alternating decision trees, support vector machine [SVM], logistic regression (LR), radial kernel, and linear SVM) performed well, a sparse 5-feature LR classifier (LR5) yielded the highest accuracy (area under the curve [AUC]: 92% [95% CI 88%-97%]) across all ages tested. We used a prospectively collected independent validation set of 66 videos (33 ASD and 33 non-ASD) and 3 independent rater measurements to validate the outcome, achieving lower but comparable accuracy (AUC: 89% [95% CI 81%-95%]). Finally, we applied LR to the 162-video-feature matrix to construct an 8-feature model, which achieved 0.93 AUC (95% CI 0.90-0.97) on the held-out test set and 0.86 on the validation set of 66 videos. Validation on children with an existing diagnosis limited the ability to generalize the performance to undiagnosed populations. CONCLUSIONS: These results support the hypothesis that feature tagging of home videos for machine learning classification of autism can yield accurate outcomes in short time frames, using mobile devices. Further work will be needed to confirm that this approach can accelerate autism diagnosis at scale.


Assuntos
Transtorno Autístico/diagnóstico , Diagnóstico por Computador/métodos , Aprendizado de Máquina , Consulta Remota/métodos , Gravação em Vídeo/métodos , Adolescente , Comportamento do Adolescente , Fatores Etários , Transtorno Autístico/fisiopatologia , Transtorno Autístico/psicologia , Criança , Comportamento Infantil , Pré-Escolar , Diagnóstico Precoce , Estudos de Viabilidade , Feminino , Humanos , Lactente , Masculino , Valor Preditivo dos Testes , Estudos Prospectivos , Reprodutibilidade dos Testes , Fatores de Tempo
13.
Algorithms ; 17(4)2024 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-38962581

RESUMO

Breast cancer is the most common cancer affecting women globally. Despite the significant impact of deep learning models on breast cancer diagnosis and treatment, achieving fairness or equitable outcomes across diverse populations remains a challenge when some demographic groups are underrepresented in the training data. We quantified the bias of models trained to predict breast cancer stage from a dataset consisting of 1000 biopsies from 842 patients provided by AIM-Ahead (Artificial Intelligence/Machine Learning Consortium to Advance Health Equity and Researcher Diversity). Notably, the majority of data (over 70%) were from White patients. We found that prior to post-processing adjustments, all deep learning models we trained consistently performed better for White patients than for non-White patients. After model calibration, we observed mixed results, with only some models demonstrating improved performance. This work provides a case study of bias in breast cancer medical imaging models and highlights the challenges in using post-processing to attempt to achieve fairness.

14.
JMIR AI ; 3: e52171, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38875573

RESUMO

BACKGROUND: There are a wide range of potential adverse health effects, ranging from headaches to cardiovascular disease, associated with long-term negative emotions and chronic stress. Because many indicators of stress are imperceptible to observers, the early detection of stress remains a pressing medical need, as it can enable early intervention. Physiological signals offer a noninvasive method for monitoring affective states and are recorded by a growing number of commercially available wearables. OBJECTIVE: We aim to study the differences between personalized and generalized machine learning models for 3-class emotion classification (neutral, stress, and amusement) using wearable biosignal data. METHODS: We developed a neural network for the 3-class emotion classification problem using data from the Wearable Stress and Affect Detection (WESAD) data set, a multimodal data set with physiological signals from 15 participants. We compared the results between a participant-exclusive generalized, a participant-inclusive generalized, and a personalized deep learning model. RESULTS: For the 3-class classification problem, our personalized model achieved an average accuracy of 95.06% and an F1-score of 91.71%; our participant-inclusive generalized model achieved an average accuracy of 66.95% and an F1-score of 42.50%; and our participant-exclusive generalized model achieved an average accuracy of 67.65% and an F1-score of 43.05%. CONCLUSIONS: Our results emphasize the need for increased research in personalized emotion recognition models given that they outperform generalized models in certain contexts. We also demonstrate that personalized machine learning models for emotion classification are viable and can achieve high performance.

15.
Biosensors (Basel) ; 14(4)2024 Apr 09.
Artigo em Inglês | MEDLINE | ID: mdl-38667177

RESUMO

The rapid development of biosensing technologies together with the advent of deep learning has marked an era in healthcare and biomedical research where widespread devices like smartphones, smartwatches, and health-specific technologies have the potential to facilitate remote and accessible diagnosis, monitoring, and adaptive therapy in a naturalistic environment. This systematic review focuses on the impact of combining multiple biosensing techniques with deep learning algorithms and the application of these models to healthcare. We explore the key areas that researchers and engineers must consider when developing a deep learning model for biosensing: the data modality, the model architecture, and the real-world use case for the model. We also discuss key ongoing challenges and potential future directions for research in this field. We aim to provide useful insights for researchers who seek to use intelligent biosensing to advance precision healthcare.


Assuntos
Inteligência Artificial , Técnicas Biossensoriais , Humanos , Atenção à Saúde , Aprendizado Profundo , Algoritmos
16.
AI (Basel) ; 5(1): 195-207, 2024 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-38715564

RESUMO

Emotion recognition models using audio input data can enable the development of interactive systems with applications in mental healthcare, marketing, gaming, and social media analysis. While the field of affective computing using audio data is rich, a major barrier to achieve consistently high-performance models is the paucity of available training labels. Self-supervised learning (SSL) is a family of methods which can learn despite a scarcity of supervised labels by predicting properties of the data itself. To understand the utility of self-supervised learning for audio-based emotion recognition, we have applied self-supervised learning pre-training to the classification of emotions from the CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU- MOSEI)'s acoustic data. Unlike prior papers that have experimented with raw acoustic data, our technique has been applied to encoded acoustic data with 74 parameters of distinctive audio features at discrete timesteps. Our model is first pre-trained to uncover the randomly masked timestamps of the acoustic data. The pre-trained model is then fine-tuned using a small sample of annotated data. The performance of the final model is then evaluated via overall mean absolute error (MAE), mean absolute error (MAE) per emotion, overall four-class accuracy, and four-class accuracy per emotion. These metrics are compared against a baseline deep learning model with an identical backbone architecture. We find that self-supervised learning consistently improves the performance of the model across all metrics, especially when the number of annotated data points in the fine-tuning step is small. Furthermore, we quantify the behaviors of the self-supervised model and its convergence as the amount of annotated data increases. This work characterizes the utility of self-supervised learning for affective computing, demonstrating that self-supervised learning is most useful when the number of training examples is small and that the effect is most pronounced for emotions which are easier to classify such as happy, sad, and angry. This work further demonstrates that self-supervised learning still improves performance when applied to the embedded feature representations rather than the traditional approach of pre-training on the raw input space.

17.
JMIR Form Res ; 8: e52660, 2024 Feb 14.
Artigo em Inglês | MEDLINE | ID: mdl-38354045

RESUMO

BACKGROUND: The increasing use of social media platforms has given rise to an unprecedented surge in user-generated content, with millions of individuals publicly sharing their thoughts, experiences, and health-related information. Social media can serve as a useful means to study and understand public health. Twitter (subsequently rebranded as "X") is one such social media platform that has proven to be a valuable source of rich information for both the general public and health officials. We conducted the first study applying Twitter data mining to autism screening. OBJECTIVE: This study used Twitter as the primary source of data to study the behavioral characteristics and real-time emotional projections of individuals identifying with autism spectrum disorder (ASD). We aimed to improve the rigor of ASD analytics research by using the digital footprint of an individual to study the linguistic patterns of individuals with ASD. METHODS: We developed a machine learning model to distinguish individuals with autism from their neurotypical peers based on the textual patterns from their public communications on Twitter. We collected 6,515,470 tweets from users' self-identification with autism using "#ActuallyAutistic" and a separate control group to identify linguistic markers associated with ASD traits. To construct the data set, we targeted English-language tweets using the search query "#ActuallyAutistic" posted from January 1, 2014, to December 31, 2022. From these tweets, we identified unique users who used keywords such as "autism" OR "autistic" OR "neurodiverse" in their profile description and collected all the tweets from their timeline. To build the control group data set, we formulated a search query excluding the hashtag, "-#ActuallyAutistic," and collected 1000 tweets per day during the same time period. We trained a word2vec model and an attention-based, bidirectional long short-term memory model to validate the performance of per-tweet and per-profile classification models. We also illustrate the utility of the data set through common natural language processing tasks such as sentiment analysis and topic modeling. RESULTS: Our tweet classifier reached a 73% accuracy, a 0.728 area under the receiver operating characteristic curve score, and an 0.71 F1-score using word2vec representations fed into a logistic regression model, while the user profile classifier achieved an 0.78 area under the receiver operating characteristic curve score and an F1-score of 0.805 using an attention-based, bidirectional long short-term memory model. This is a promising start, demonstrating the potential for effective digital phenotyping studies and large-scale intervention using text data mined from social media. CONCLUSIONS: Textual differences in social media communications can help researchers and clinicians conduct symptomatology studies in natural settings.

18.
Sci Rep ; 14(1): 13887, 2024 06 16.
Artigo em Inglês | MEDLINE | ID: mdl-38880810

RESUMO

Dementia is a progressive neurological disorder that affects the daily lives of older adults, impacting their verbal communication and cognitive function. Early diagnosis is important to enhance the lifespan and quality of life for affected individuals. Despite its importance, diagnosing dementia is a complex process. Automated machine learning solutions involving multiple types of data have the potential to improve the process of automated dementia screening. In this study, we build deep learning models to classify dementia cases from controls using the Pitt Cookie Theft dataset from DementiaBank, a database of short participant responses to the structured task of describing a picture of a cookie theft. We fine-tune Wav2vec and Word2vec baseline models to make binary predictions of dementia from audio recordings and text transcripts, respectively. We conduct experiments with four versions of the dataset: (1) the original data, (2) the data with short sentences removed, (3) text-based augmentation of the original data, and (4) text-based augmentation of the data with short sentences removed. Our results indicate that synonym-based text data augmentation generally enhances the performance of models that incorporate the text modality. Without data augmentation, models using the text modality achieve around 60% accuracy and 70% AUROC scores, and with data augmentation, the models achieve around 80% accuracy and 90% AUROC scores. We do not observe significant improvements in performance with the addition of audio or timestamp information into the model. We include a qualitative error analysis of the sentences that are misclassified under each study condition. This study provides preliminary insights into the effects of both text-based data augmentation and multimodal deep learning for automated dementia classification.


Assuntos
Aprendizado Profundo , Demência , Humanos , Demência/diagnóstico , Demência/classificação , Idoso , Feminino , Masculino , Idoso de 80 Anos ou mais , Bases de Dados Factuais
19.
JMIR Res Protoc ; 13: e55615, 2024 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-38526539

RESUMO

BACKGROUND: Referred to as the "silent killer," elevated blood pressure (BP) often goes unnoticed due to the absence of apparent symptoms, resulting in cumulative harm over time. Chronic stress has been consistently linked to increased BP. Prior studies have found that elevated BP often arises due to a stressful lifestyle, although the effect of exact stressors varies drastically between individuals. The heterogeneous nature of both the stress and BP response to a multitude of lifestyle decisions can make it difficult if not impossible to pinpoint the most deleterious behaviors using the traditional mechanism of clinical interviews. OBJECTIVE: The aim of this study is to leverage machine learning (ML) algorithms for real-time predictions of stress-induced BP spikes using consumer wearable devices such as Fitbit, providing actionable insights to both patients and clinicians to improve diagnostics and enable proactive health monitoring. This study also seeks to address the significant challenges in identifying specific deleterious behaviors associated with stress-induced hypertension through the development of personalized artificial intelligence models for individual patients, departing from the conventional approach of using generalized models. METHODS: The study proposes the development of ML algorithms to analyze biosignals obtained from these wearable devices, aiming to make real-time predictions about BP spikes. Given the longitudinal nature of the data set comprising time-series data from wearables (eg, Fitbit) and corresponding time-stamped labels representing stress levels from Ecological Momentary Assessment reports, the adoption of self-supervised learning for pretraining the network and using transformer models for fine-tuning the model on a personalized prediction task is proposed. Transformer models, with their self-attention mechanisms, dynamically weigh the importance of different time steps, enabling the model to focus on relevant temporal features and dependencies, facilitating accurate prediction. RESULTS: Supported as a pilot project from the Robert C Perry Fund of the Hawaii Community Foundation, the study team has developed the core study app, CardioMate. CardioMate not only reminds participants to initiate BP readings using an Omron HeartGuide wearable monitor but also prompts them multiple times a day to report stress levels. Additionally, it collects other useful information including medications, environmental conditions, and daily interactions. Through the app's messaging system, efficient contact and interaction between users and study admins ensure smooth progress. CONCLUSIONS: Personalized ML when applied to biosignals offers the potential for real-time digital health interventions for chronic stress and its symptoms. The project's clinical use for Hawaiians with stress-induced high BP combined with its methodological innovation of personalized artificial intelligence models highlights its significance in advancing health care interventions. Through iterative refinement and optimization, the aim is to develop a personalized deep-learning framework capable of accurately predicting stress-induced BP spikes, thereby promoting individual well-being and health outcomes. INTERNATIONAL REGISTERED REPORT IDENTIFIER (IRRID): DERR1-10.2196/55615.

20.
JMIR Form Res ; 8: e59794, 2024 Jul 17.
Artigo em Inglês | MEDLINE | ID: mdl-39018549

RESUMO

Digital phenotyping, or personal sensing, is a field of research that seeks to quantify traits and characteristics of people using digital technologies, usually for health care purposes. In this commentary, we discuss emerging ethical issues regarding the use of social media as training data for artificial intelligence (AI) models used for digital phenotyping. In particular, we describe the ethical need for explicit consent from social media users, particularly in cases where sensitive information such as labels related to neurodiversity are scraped. We also advocate for the use of community-based participatory design principles when developing health care AI models using social media data.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA