RESUMO
BACKGROUND: In the process of finding the causative variant of rare diseases, accurate assessment and prioritization of genetic variants is essential. Previous variant prioritization tools mainly depend on the in-silico prediction of the pathogenicity of variants, which results in low sensitivity and difficulty in interpreting the prioritization result. In this study, we propose an explainable algorithm for variant prioritization, named 3ASC, with higher sensitivity and ability to annotate evidence used for prioritization. 3ASC annotates each variant with the 28 criteria defined by the ACMG/AMP genome interpretation guidelines and features related to the clinical interpretation of the variants. The system can explain the result based on annotated evidence and feature contributions. RESULTS: We trained various machine learning algorithms using in-house patient data. The performance of variant ranking was assessed using the recall rate of identifying causative variants in the top-ranked variants. The best practice model was a random forest classifier that showed top 1 recall of 85.6% and top 3 recall of 94.4%. The 3ASC annotates the ACMG/AMP criteria for each genetic variant of a patient so that clinical geneticists can interpret the result as in the CAGI6 SickKids challenge. In the challenge, 3ASC identified causal genes for 10 out of 14 patient cases, with evidence of decreased gene expression for 6 cases. Among them, two genes (HDAC8 and CASK) had decreased gene expression profiles confirmed by transcriptome data. CONCLUSIONS: 3ASC can prioritize genetic variants with higher sensitivity compared to previous methods by integrating various features related to clinical interpretation, including features related to false positive risk such as quality control and disease inheritance pattern. The system allows interpretation of each variant based on the ACMG/AMP criteria and feature contribution assessed using explainable AI techniques.
Assuntos
Algoritmos , Doenças Raras , Humanos , Doenças Raras/diagnóstico , Doenças Raras/genética , Testes Genéticos , Aprendizado de Máquina , Variação Genética/genética , Histona Desacetilases/genética , Proteínas Repressoras/genéticaRESUMO
[This corrects the article DOI: .].
RESUMO
BACKGROUND: Policy makers and practitioners in low- and middle-income countries (LMICs) are increasingly focusing on the effectiveness of digital devices in the delivery of medical and educational services to children under resource constraints. It is widely known that digital literacy can be fostered through exposure to and education regarding digital devices, which can improve children's academic performance as well as their search and communication skills in the digital era. However, the correlation between the cognitive function of children and exposure and intensity of the exposure to digital devices has rarely been studied, and the association between digital device exposure and the socioeconomic characteristics and cognitive development of children in LMICs is unknown. OBJECTIVE: This study examines the association among exposure to digital devices, socioeconomic status, and cognitive function in children aged 3 to 9 years in Cambodia. METHODS: We used a survey of 232 children that gathered data on familiarity with digital devices, demographic characteristics, and socioeconomic status, as well as a Cambridge Neuropsychological Test Automated Battery test for cognitive function, to examine the association between possible barriers and factors that may influence the cognitive function of children in 2 Cambodian schools from April 22, 2019, to May 4, 2019. A comparative analysis was performed with and without digital exposure, and an association analysis was performed among the variables from the survey and cognitive function. RESULTS: Significant differences were observed in demographic and socioeconomic characteristics such as school location, family type, and family income according to digital device exposure. The results of the Cambridge Neuropsychological Test Automated Battery tests, except for 1 test related to executive function, indicated no significant differences (P>.05) between group A and group B or among the 4 subgroups. Pretest digital device experience and amount of time spent using digital devices during the test had no significant impacts on the cognitive development of the children. Conversely, the multivariate analyses showed that cognitive function was associated with educational expenses per child, school (location), family type, and family income. CONCLUSIONS: These results provide evidence to policy makers and practitioners on the importance of improving socioeconomic conditions, leading to investment in education by implementing programs for children's cognitive development through digital devices in LMICs.
Assuntos
Países em Desenvolvimento , Renda , Camboja , Criança , Cognição , Estudos Transversais , HumanosRESUMO
BACKGROUND: Adverse drug reactions (ADRs) are unintended negative drug-induced responses. Determining the association between drugs and ADRs is crucial, and several methods have been proposed to demonstrate this association. This systematic review aimed to examine the analytical tools by considering original articles that utilized statistical and machine learning methods for detecting ADRs. METHODS: A systematic literature review was conducted based on articles published between 2015 and 2020. The keywords used were statistical, machine learning, and deep learning methods for detecting ADR signals. The study was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses statement (PRISMA) guidelines. RESULTS: We reviewed 72 articles, of which 51 and 21 addressed statistical and machine learning methods, respectively. Electronic medical record (EMR) data were exclusively analyzed using the regression method. For FDA Adverse Event Reporting System (FAERS) data, components of the disproportionality method were preferable. DrugBank was the most used database for machine learning. Other methods accounted for the highest and supervised methods accounted for the second highest. CONCLUSIONS: Using the 72 main articles, this review provides guidelines on which databases are frequently utilized and which analysis methods can be connected. For statistical analysis, >90% of the cases were analyzed by disproportionate or regression analysis with each spontaneous reporting system (SRS) data or electronic medical record (EMR) data; for machine learning research, however, there was a strong tendency to analyze various data combinations. Only half of the DrugBank database was occupied, and the k-nearest neighbor method accounted for the greatest proportion.
Assuntos
Sistemas de Notificação de Reações Adversas a Medicamentos , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos , Bases de Dados Factuais , Efeitos Colaterais e Reações Adversas Relacionados a Medicamentos/epidemiologia , Registros Eletrônicos de Saúde , Humanos , Aprendizado de MáquinaRESUMO
BACKGROUND: In obesity management, whether patients lose ≥5% of their initial weight is a critical factor in clinical outcomes. However, evaluations that take only this approach are unable to identify and distinguish between individuals whose weight changes vary and those who steadily lose weight. Evaluation of weight loss considering the volatility of weight changes through a mobile-based intervention for obesity can facilitate understanding of an individual's behavior and weight changes from a longitudinal perspective. OBJECTIVE: The aim of this study is to use a machine learning approach to examine weight loss trajectories and explore factors related to behavioral and app use characteristics that induce weight loss. METHODS: We used the lifelog data of 13,140 individuals enrolled in a 16-week obesity management program on the health care app Noom in the United States from August 8, 2013, to August 8, 2019. We performed k-means clustering with dynamic time warping to cluster the weight loss time series and inspected the quality of clusters with the total sum of distance within the clusters. To identify use factors determining clustering assignment, we longitudinally compared weekly use statistics with effect size on a weekly basis. RESULTS: The initial average BMI value for the participants was 33.6 (SD 5.9) kg/m2, and it ultimately reached 31.6 (SD 5.7) kg/m2. Using the weight log data, we identified five clusters: cluster 1 (sharp decrease) showed the highest proportion of participants who reduced their weight by >5% (7296/11,295, 64.59%), followed by cluster 2 (moderate decrease). In each comparison between clusters 1 and 3 (yo-yo) and clusters 2 and 3, although the effect size of the difference in average meal record adherence and average weight record adherence was not significant in the first week, it peaked within the initial 8 weeks (Cohen d>0.35) and decreased after that. CONCLUSIONS: Using a machine learning approach and clustering shape-based time series similarities, we identified 5 weight loss trajectories in a mobile weight management app. Overall adherence and early adherence related to self-monitoring emerged as potential predictors of these trajectories.
Assuntos
Trajetória do Peso do Corpo , Aplicativos Móveis , Humanos , Obesidade/terapia , Estudos Retrospectivos , Redução de PesoRESUMO
BACKGROUND: Early detection of developmental disabilities in children is essential because early intervention can improve the prognosis of children. Meanwhile, a growing body of evidence has indicated a relationship between developmental disability and motor skill, and thus, motor skill is considered in the early diagnosis of developmental disability. However, there are challenges to assessing motor skill in the diagnosis of developmental disorder, such as a lack of specialists and time constraints, and thus it is commonly conducted through informal questions or surveys to parents. OBJECTIVE: This study sought to evaluate the possibility of using drag-and-drop data as a digital biomarker and to develop a classification model based on drag-and-drop data with which to classify children with developmental disabilities. METHODS: We collected drag-and-drop data from children with typical development and developmental disabilities from May 1, 2018, to May 1, 2020, via a mobile application (DoBrain). We used touch coordinates and extracted kinetic variables from these coordinates. A deep learning algorithm was developed to predict potential development disabilities in children. For interpretability of the model results, we identified which coordinates contributed to the classification results by applying gradient-weighted class activation mapping. RESULTS: Of the 370 children in the study, 223 had typical development, and 147 had developmental disabilities. In all games, the number of changes in the acceleration sign based on the direction of progress both in the x- and y-axes showed significant differences between the 2 groups (P<.001; effect size >0.5). The deep learning convolutional neural network model showed that drag-and-drop data can help diagnose developmental disabilities, with an area under the receiving operating characteristics curve of 0.817. A gradient class activation map, which can interpret the results of a deep learning model, was visualized with the game results for specific children. CONCLUSIONS: Through the results of the deep learning model, we confirmed that drag-and-drop data can be a new digital biomarker for the diagnosis of developmental disabilities.
RESUMO
BACKGROUND: Securing the representativeness of study populations is crucial in biomedical research to ensure high generalizability. In this regard, using multi-institutional data have advantages in medicine. However, combining data physically is difficult as the confidential nature of biomedical data causes privacy issues. Therefore, a methodological approach is necessary when using multi-institution medical data for research to develop a model without sharing data between institutions. OBJECTIVE: This study aims to develop a weight-based integrated predictive model of multi-institutional data, which does not require iterative communication between institutions, to improve average predictive performance by increasing the generalizability of the model under privacy-preserving conditions without sharing patient-level data. METHODS: The weight-based integrated model generates a weight for each institutional model and builds an integrated model for multi-institutional data based on these weights. We performed 3 simulations to show the weight characteristics and to determine the number of repetitions of the weight required to obtain stable values. We also conducted an experiment using real multi-institutional data to verify the developed weight-based integrated model. We selected 10 hospitals (2845 intensive care unit [ICU] stays in total) from the electronic intensive care unit Collaborative Research Database to predict ICU mortality with 11 features. To evaluate the validity of our model, compared with a centralized model, which was developed by combining all the data of 10 hospitals, we used proportional overlap (ie, 0.5 or less indicates a significant difference at a level of .05; and 2 indicates 2 CIs overlapping completely). Standard and firth logistic regression models were applied for the 2 simulations and the experiment. RESULTS: The results of these simulations indicate that the weight of each institution is determined by 2 factors (ie, the data size of each institution and how well each institutional model fits into the overall institutional data) and that repeatedly generating 200 weights is necessary per institution. In the experiment, the estimated area under the receiver operating characteristic curve (AUC) and 95% CIs were 81.36% (79.37%-83.36%) and 81.95% (80.03%-83.87%) in the centralized model and weight-based integrated model, respectively. The proportional overlap of the CIs for AUC in both the weight-based integrated model and the centralized model was approximately 1.70, and that of overlap of the 11 estimated odds ratios was over 1, except for 1 case. CONCLUSIONS: In the experiment where real multi-institutional data were used, our model showed similar results to the centralized model without iterative communication between institutions. In addition, our weight-based integrated model provided a weighted average model by integrating 10 models overfitted or underfitted, compared with the centralized model. The proposed weight-based integrated model is expected to provide an efficient distributed research approach as it increases the generalizability of the model and does not require iterative communication.
RESUMO
BACKGROUND: In recent years, mobile-based interventions have received more attention as an alternative to on-site obesity management. Despite increased mobile interventions for obesity, there are lost opportunities to achieve better outcomes due to the lack of a predictive model using current existing longitudinal and cross-sectional health data. Noom (Noom Inc) is a mobile app that provides various lifestyle-related logs including food logging, exercise logging, and weight logging. OBJECTIVE: The aim of this study was to develop a weight change predictive model using an interpretable artificial intelligence algorithm for mobile-based interventions and to explore contributing factors to weight loss. METHODS: Lifelog mobile app (Noom) user data of individuals who used the weight loss program for 16 weeks in the United States were used to develop an interpretable recurrent neural network algorithm for weight prediction that considers both time-variant and time-fixed variables. From a total of 93,696 users in the coaching program, we excluded users who did not take part in the 16-week weight loss program or who were not overweight or obese or had not entered weight or meal records for the entire 16-week program. This interpretable model was trained and validated with 5-fold cross-validation (training set: 70%; testing: 30%) using the lifelog data. Mean absolute percentage error between actual weight loss and predicted weight was used to measure model performance. To better understand the behavior factors contributing to weight loss or gain, we calculated contribution coefficients in test sets. RESULTS: A total of 17,867 users' data were included in the analysis. The overall mean absolute percentage error of the model was 3.50%, and the error of the model declined from 3.78% to 3.45% by the end of the program. The time-level attention weighting was shown to be equally distributed at 0.0625 each week, but this gradually decreased (from 0.0626 to 0.0624) as it approached 16 weeks. Factors such as usage pattern, weight input frequency, meal input adherence, exercise, and sharp decreases in weight trajectories had negative contribution coefficients of -0.021, -0.032, -0.015, and -0.066, respectively. For time-fixed variables, being male had a contribution coefficient of -0.091. CONCLUSIONS: An interpretable algorithm, with both time-variant and time-fixed data, was used to precisely predict weight loss while preserving model transparency. This week-to-week prediction model is expected to improve weight loss and provide a global explanation of contributing factors, leading to better outcomes.
Assuntos
Inteligência Artificial , Programas de Redução de Peso , Estudos Transversais , Humanos , Masculino , Redes Neurais de Computação , Estados Unidos , Redução de PesoRESUMO
BACKGROUND: Customer churn is the rate at which customers stop doing business with an entity. In the field of digital health care, user churn prediction is important not only in terms of company revenue but also for improving the health of users. Churn prediction has been previously studied, but most studies applied time-invariant model structures and used structured data. However, additional unstructured data have become available; therefore, it has become essential to process daily time-series log data for churn predictions. OBJECTIVE: We aimed to apply a recurrent neural network structure to accept time-series patterns using lifelog data and text message data to predict the churn of digital health care users. METHODS: This study was based on the use data of a digital health care app that provides interactive messages with human coaches regarding food, exercise, and weight logs. Among the users in Korea who enrolled between January 1, 2017 and January 1, 2019, we defined churn users according to the following criteria: users who received a refund before the paid program ended and users who received a refund 7 days after the trial period. We used long short-term memory with a masking layer to receive sequence data with different lengths. We also performed topic modeling to vectorize text messages. To interpret the contributions of each variable to model predictions, we used integrated gradients, which is an attribution method. RESULTS: A total of 1868 eligible users were included in this study. The final performance of churn prediction was an F1 score of 0.89; that score decreased by 0.12 when the data of the final week were excluded (F1 score 0.77). Additionally, when text data were included, the mean predicted performance increased by approximately 0.085 at every time point. Steps per day had the largest contribution (0.1085). Among the topic variables, poor habits (eg, drinking alcohol, overeating, and late-night eating) showed the largest contribution (0.0875). CONCLUSIONS: The model with a recurrent neural network architecture that used log data and message data demonstrated high performance for churn classification. Additionally, the analysis of the contribution of the variables is expected to help identify signs of user churn in advance and improve the adherence in digital health care.
Assuntos
Aplicativos Móveis/normas , Adulto , Humanos , Estudos Retrospectivos , TelemedicinaRESUMO
Although early diagnosis of developmental delay is important, there are challenges in identifying cognitive status in developing countries because of limited human and financial resources to perform diagnostic tests. Moreover, diagnosis stability of developmental delay in children using neuropsychological tests (NPTs) can remain unsettled. The aim of this study is (1) to verify the effectiveness of a serious game (DoBrain), (2) to identify existing inconsistencies between NPTs, and (3) to explore the potential of the serious game as a complement to diagnostic tools. Eligible children who had completed results of NPTs were selected (n=119/235; 116/235; case, control). With these children's scores, we performed the Mann- Whitney U test to investigate the effectiveness of the serious game by comparing the improvement of scores in both groups. Among the participants, we additionally selected a case group to identify the potential of the serious game for detecting mild developmental delay. Using the results of the CGI-S as a baseline, we defined the participants whose scores indicated more than mild illness (>=2 points) in at least one area as the suspected group. The score improvement related to memory in case group was greater than that of the control group (p<0.05). Furthermore, four of the NPTs were not inconsistent, and the sensitivity/specificity of DDST-II was the highest score considering CGI-S results as the ground truth (0.43; 0.96). Additionally, games measuring discrimination, velocity, memory, and spatial perception showed statistical significance (p<0.05). This study verifies that the serious game can help specific cognitive areas and suggests that the serious game could be used as a low-cost and unconstrained spatiotemporal alternative to NPTs.
Assuntos
Jogos de Vídeo , Camboja , Criança , Humanos , Memória , Testes NeuropsicológicosRESUMO
BACKGROUND: There has been significant effort in attempting to use health care data. However, laws that protect patients' privacy have restricted data use because health care data contain sensitive information. Thus, discussions on privacy laws now focus on the active use of health care data beyond protection. However, current literature does not clarify the obstacles that make data usage and deidentification processes difficult or elaborate on users' needs for data linking from practical perspectives. OBJECTIVE: The objective of this study is to investigate (1) the current status of data use in each medical area, (2) institutional efforts and difficulties in deidentification processes, and (3) users' data linking needs. METHODS: We conducted a cross-sectional online survey. To recruit people who have used health care data, we publicized the promotion campaign and sent official documents to an academic society encouraging participation in the online survey. RESULTS: In total, 128 participants responded to the online survey; 10 participants were excluded for either inconsistent responses or lack of demand for health care data. Finally, 118 participants' responses were analyzed. The majority of participants worked in general hospitals or universities (62/118, 52.5% and 51/118, 43.2%, respectively, multiple-choice answers). More than half of participants responded that they have a need for clinical data (82/118, 69.5%) and public data (76/118, 64.4%). Furthermore, 85.6% (101/118) of respondents conducted deidentification measures when using data, and they considered rigid social culture as an obstacle for deidentification (28/101, 27.7%). In addition, they required data linking (98/118, 83.1%), and they noted deregulation and data standardization to allow access to health care data linking (33/98, 33.7% and 38/98, 38.8%, respectively). There were no significant differences in the proportion of responded data needs and linking in groups that used health care data for either public purposes or commercial purposes. CONCLUSIONS: This study provides a cross-sectional view from a practical, user-oriented perspective on the kinds of data users want to utilize, efforts and difficulties in deidentification processes, and the needs for data linking. Most users want to use clinical and public data, and most participants conduct deidentification processes and express a desire to conduct data linking. Our study confirmed that they noted regulation as a primary obstacle whether their purpose is commercial or public. A legal system based on both data utilization and data protection needs is required.