Pesquisa | Biblioteca Virtual em Saúde

1.

Path and Direction Discovery in Individual Dynamic Factor Models: A Regularized Hybrid Unified Structural Equation Modeling with Latent Variable.

Ye, Ai; Bollen, Kenneth A.

Multivariate Behav Res ; : 1-24, 2024 Jul 26.

Artigo em Inglês | MEDLINE | ID: mdl-39058418

RESUMO

There has been an increasing call to model multivariate time series data with measurement error. The combination of latent factors with a vector autoregressive (VAR) model leads to the dynamic factor model (DFM), in which dynamic relations are derived within factor series, among factors and observed time series, or both. However, a few limitations exist in the current DFM representatives and estimation: (1) the dynamic component contains either directed or undirected contemporaneous relations, but not both, (2) selecting the optimal model in exploratory DFM is a challenge, (3) the consequences of structural misspecifications from model selection is barely studied. Our paper serves to advance DFM with a hybrid VAR representations and the utilization of LASSO regularization to select dynamic implied instrumental variable, two-stage least squares (MIIV-2SLS) estimation. Our proposed method highlights the flexibility in modeling the directions of dynamic relations with a robust estimation. We aim to offer researchers guidance on model selection and estimation in person-centered dynamic assessments.

2.

The Distance Between: An Algorithmic Approach to Comparing Stochastic Models to Time-Series Data.

Sherlock, Brock D; Boon, Marko A A; Vlasiou, Maria; Coster, Adelle C F.

Bull Math Biol ; 86(9): 111, 2024 Jul 26.

Artigo em Inglês | MEDLINE | ID: mdl-39060776

RESUMO

While mean-field models of cellular operations have identified dominant processes at the macroscopic scale, stochastic models may provide further insight into mechanisms at the molecular scale. In order to identify plausible stochastic models, quantitative comparisons between the models and the experimental data are required. The data for these systems have small sample sizes and time-evolving distributions. The aim of this study is to identify appropriate distance metrics for the quantitative comparison of stochastic model outputs and time-evolving stochastic measurements of a system. We identify distance metrics with features suitable for driving parameter inference, model comparison, and model validation, constrained by data from multiple experimental protocols. In this study, stochastic model outputs are compared to synthetic data across three scales: that of the data at the points the system is sampled during the time course of each type of experiment; a combined distance across the time course of each experiment; and a combined distance across all the experiments. Two broad categories of comparators at each point were considered, based on the empirical cumulative distribution function (ECDF) of the data and of the model outputs: discrete based measures such as the Kolmogorov-Smirnov distance, and integrated measures such as the Wasserstein-1 distance between the ECDFs. It was found that the discrete based measures were highly sensitive to parameter changes near the synthetic data parameters, but were largely insensitive otherwise, whereas the integrated distances had smoother transitions as the parameters approached the true values. The integrated measures were also found to be robust to noise added to the synthetic data, replicating experimental error. The characteristics of the identified distances provides the basis for the design of an algorithm suitable for fitting stochastic models to real world stochastic data.

Assuntos

Algoritmos , Conceitos Matemáticos , Modelos Biológicos , Processos Estocásticos , Fatores de Tempo , Simulação por Computador

3.

Comment on Martínez-Delgado et al. Using Absorption Models for Insulin and Carbohydrates and Deep Leaning to Improve Glucose Level Predictions. Sensors 2021, 21, 5273.

Misplon, Josiah Z R; Saini, Varun; Sloves, Brianna P; Meerts, Sarah H; Musicant, David R.

Sensors (Basel) ; 24(13)2024 Jul 05.

Artigo em Inglês | MEDLINE | ID: mdl-39001139

RESUMO

The paper "Using Absorption Models for Insulin and Carbohydrates and Deep Leaning to Improve Glucose Level Predictions" (Sensors2021, 21, 5273) proposes a novel approach to predicting blood glucose levels for people with type 1 diabetes mellitus (T1DM). By building exponential models from raw carbohydrate and insulin data to simulate the absorption in the body, the authors reported a reduction in their model's root-mean-square error (RMSE) from 15.5 mg/dL (raw) to 9.2 mg/dL (exponential) when predicting blood glucose levels one hour into the future. In this comment, we demonstrate that the experimental techniques used in that paper are flawed, which invalidates its results and conclusions. Specifically, after reviewing the authors' code, we found that the model validation scheme was malformed, namely, the training and test data from the same time intervals were mixed. This means that the reported RMSE numbers in the referenced paper did not accurately measure the predictive capabilities of the approaches that were presented. We repaired the measurement technique by appropriately isolating the training and test data, and we discovered that their models actually performed dramatically worse than was reported in the paper. In fact, the models presented in the that paper do not appear to perform any better than a naive model that predicts future glucose levels to be the same as the current ones.

Assuntos

Glicemia , Diabetes Mellitus Tipo 1 , Insulina , Insulina/metabolismo , Humanos , Glicemia/metabolismo , Glicemia/análise , Diabetes Mellitus Tipo 1/metabolismo , Carboidratos/química , Modelos Biológicos

4.

Leveraging VQ-VAE tokenization for autoregressive modeling of medical time series.

Lee, Yoonhyung; Chae, Younhyung; Jung, Kyomin.

Artif Intell Med ; 154: 102925, 2024 Jun 28.

Artigo em Inglês | MEDLINE | ID: mdl-38968921

RESUMO

In this work, we present CodeAR, a medical time series generative model for electronic health record (EHR) synthesis. CodeAR employs autoregressive modeling on discrete tokens obtained using a vector quantized-variational autoencoder (VQ-VAE), which addresses key challenges of accurate distribution modeling and patient privacy preservation in the medical domain. The proposed model is trained with next-token prediction instead of a regression problem for more accurate distribution modeling, where the autoregressive property of CodeAR is useful to capture the inherent causality in time series data. In addition, the compressive property of the VQ-VAE prevents CodeAR from memorizing the original training data, which ensures patient privacy. Experimental results demonstrate that CodeAR outperforms the baseline autoregressive-based and GAN-based models in terms of maximum mean discrepancy (MMD) and Train on Synthetic, Test on Real tests. Our results highlight the effectiveness of autoregressive modeling on discrete tokens, the utility of CodeAR in causal modeling, and its robustness against data memorization.

5.

Research on species categorical authentication of accelerants based on flame characteristics analysis.

Zhang, Qianqian; Zang, Zhengzhe; Wang, Peibin; Zhu, Lin; Cao, Yiyue; Jin, Jing; Lu, Lingang.

Forensic Sci Int ; 361: 112125, 2024 Jul 02.

Artigo em Inglês | MEDLINE | ID: mdl-39002411

RESUMO

Species categorical authentication of accelerants has traditionally relied on fire debris analysis. To explore a novel method for identifying the accelerants species, four commonly used accelerants for arson were loaded onto different substrates and ignited at different locations. The entire combustion process was recorded and flame characteristics were analyzed. The results showed that the probability density function (PDF) of flame apex angle counts within a certain period after ignition can be used to distinguish accelerant species, and this method is not affected by accelerant loading amount, ignition location, and substrate, demonstrating strong stability and universality, while the temporal variation of flame area and the value obtained by dividing half of the flame width by the flame height (tangent of flame cone angle) can effectively differentiate gasoline and diesel. The utilization of flame characteristics for identifying accelerants species holds significant implications for arson investigation.

6.

Cross-modal missing time-series imputation using dense spatio-temporal transformer nets.

Qian, Xusheng; Zhang, Teng; Miao, Meng; Xu, Gaojun; Zhang, Xuancheng; Yu, Wenwu; Chen, Duxin.

Math Biosci Eng ; 21(4): 4989-5006, 2024 Mar 01.

Artigo em Inglês | MEDLINE | ID: mdl-38872523

RESUMO

Due to irregular sampling or device failure, the data collected from sensor network has missing value, that is, missing time-series data occurs. To address this issue, many methods have been proposed to impute random or non-random missing data. However, the imputation accuracy of these methods are not accurate enough to be applied, especially in the case of complete data missing (CDM). Thus, we propose a cross-modal method to impute time-series missing data by dense spatio-temporal transformer nets (DSTTN). This model embeds spatial modal data into time-series data by stacked spatio-temporal transformer blocks and deployment of dense connections. It adopts cross-modal constraints, a graph Laplacian regularization term, to optimize model parameters. When the model is trained, it recovers missing data finally by an end-to-end imputation pipeline. Various baseline models are compared by sufficient experiments. Based on the experimental results, it is verified that DSTTN achieves state-of-the-art imputation performance in the cases of random and non-random missing. Especially, the proposed method provides a new solution to the CDM problem.

7.

Tree rings reveal the transient risk of extinction hidden inside climate envelope forecasts.

Evans, Margaret E K; Dey, Sharmila M N; Heilman, Kelly A; Tipton, John R; DeRose, R Justin; Klesse, Stefan; Schultz, Emily L; Shaw, John D.

Proc Natl Acad Sci U S A ; 121(24): e2315700121, 2024 Jun 11.

Artigo em Inglês | MEDLINE | ID: mdl-38830099

RESUMO

Given the importance of climate in shaping species' geographic distributions, climate change poses an existential threat to biodiversity. Climate envelope modeling, the predominant approach used to quantify this threat, presumes that individuals in populations respond to climate variability and change according to species-level responses inferred from spatial occurrence data-such that individuals at the cool edge of a species' distribution should benefit from warming (the "leading edge"), whereas individuals at the warm edge should suffer (the "trailing edge"). Using 1,558 tree-ring time series of an aridland pine (Pinus edulis) collected at 977 locations across the species' distribution, we found that trees everywhere grow less in warmer-than-average and drier-than-average years. Ubiquitous negative temperature sensitivity indicates that individuals across the entire distribution should suffer with warming-the entire distribution is a trailing edge. Species-level responses to spatial climate variation are opposite in sign to individual-scale responses to time-varying climate for approximately half the species' distribution with respect to temperature and the majority of the species' distribution with respect to precipitation. These findings, added to evidence from the literature for scale-dependent climate responses in hundreds of species, suggest that correlative, equilibrium-based range forecasts may fail to accurately represent how individuals in populations will be impacted by changing climate. A scale-dependent view of the impact of climate change on biodiversity highlights the transient risk of extinction hidden inside climate envelope forecasts and the importance of evolution in rescuing species from extinction whenever local climate variability and change exceeds individual-scale climate tolerances.

Assuntos

Mudança Climática , Extinção Biológica , Pinus , Pinus/fisiologia , Árvores , Biodiversidade , Previsões/métodos , Temperatura , Modelos Climáticos

8.

Cleaned Meta Pseudo Labels-Based Pet Behavior Recognition Using Time-Series Sensor Data.

Go, Junhyeok; Moon, Nammee.

Sensors (Basel) ; 24(11)2024 May 24.

Artigo em Inglês | MEDLINE | ID: mdl-38894180

RESUMO

With the increasing number of households owning pets, the importance of sensor data for recognizing pet behavior has grown significantly. However, challenges arise due to the costs and reliability issues associated with data collection. This paper proposes a method for classifying pet behavior using cleaned meta pseudo labels to overcome these issues. The data for this study were collected using wearable devices equipped with accelerometers, gyroscopes, and magnetometers, and pet behaviors were classified into five categories. Utilizing this data, we analyzed the impact of the quantity of labeled data on accuracy and further enhanced the learning process by integrating an additional Distance Loss. This method effectively improves the learning process by removing noise from unlabeled data. Experimental results demonstrated that while the conventional supervised learning method achieved an accuracy of 82.9%, the existing meta pseudo labels method showed an accuracy of 86.2%, and the cleaned meta pseudo labels method proposed in this study surpassed these with an accuracy of 88.3%. These results hold significant implications for the development of pet monitoring systems, and the approach of this paper provides an effective solution for recognizing and classifying pet behavior in environments with insufficient labels.

9.

Multivariate Time Series Change-Point Detection with a Novel Pearson-like Scaled Bregman Divergence.

Si, Tong; Wang, Yunge; Zhang, Lingling; Richmond, Evan; Ahn, Tae-Hyuk; Gong, Haijun.

Stats (Basel) ; 7(2): 462-480, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38827579

RESUMO

Change-point detection is a challenging problem that has a number of applications across various real-world domains. The primary objective of CPD is to identify specific time points where the underlying system undergoes transitions between different states, each characterized by its distinct data distribution. Precise identification of change points in time series omics data can provide insights into the dynamic and temporal characteristics inherent to complex biological systems. Many change-point detection methods have traditionally focused on the direct estimation of data distributions. However, these approaches become unrealistic in high-dimensional data analysis. Density ratio methods have emerged as promising approaches for change-point detection since estimating density ratios is easier than directly estimating individual densities. Nevertheless, the divergence measures used in these methods may suffer from numerical instability during computation. Additionally, the most popular α-relative Pearson divergence cannot measure the dissimilarity between two distributions of data but a mixture of distributions. To overcome the limitations of existing density ratio-based methods, we propose a novel approach called the Pearson-like scaled-Bregman divergence-based (PLsBD) density ratio estimation method for change-point detection. Our theoretical studies derive an analytical expression for the Pearson-like scaled Bregman divergence using a mixture measure. We integrate the PLsBD with a kernel regression model and apply a random sampling strategy to identify change points in both synthetic data and real-world high-dimensional genomics data of Drosophila. Our PLsBD method demonstrates superior performance compared to many other change-point detection methods.

10.

Key drivers of hypoxia revealed by time-series data in the coastal waters of Muping, China.

Zheng, Xiangyang; Liu, Hui; Xing, Qianguo; Li, Yanfang; Guo, Jie; Tang, Cheng; Zou, Tao; Hou, Chawei.

Mar Environ Res ; 199: 106613, 2024 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-38905867

RESUMO

Coastal hypoxia (low dissolved oxygen in seawater) is a cumulative result of many physical and biochemical processes. However, it is often difficult to determine the key drivers of hypoxia due to the lack of frequent observational oceanographic and meteorological data. In this study, high-frequency time-series observational data of dissolved oxygen (DO) and related parameters in the coastal waters of Muping, China, were used to analyze the temporal pattern of hypoxia and its key drivers. Two complete cycles with the formation and destruction of hypoxia were captured over the observational period. Persistent thermal stratification, high winds and phytoplankton blooms are identified as key drivers of hypoxia in this region. Hypoxia largely occurs due to persistent thermal stratification in summer, and hypoxia can be noticeably relieved when strong wind mixing weakens thermal stratification. Furthermore, we found that northerly high winds are more efficient at eroding stratification than southerly winds and thus have a greater ability to relieve hypoxia. This study revealed an episodic hypoxic event driven by a phytoplankton bloom that was probably triggered by terrestrial nutrient loading, confirming the causal relationship between phytoplankton blooms and hypoxia. In addition, we found that the lag time between nutrient loading, phytoplankton blooms and hypoxia can be as short as one week. This study could help better understand the development of hypoxia and forecast phytoplankton and hypoxia, which are beneficial for aquaculture in this region.

Assuntos

Monitoramento Ambiental , Eutrofização , Oxigênio , Fitoplâncton , Água do Mar , China , Fitoplâncton/fisiologia , Água do Mar/química , Oxigênio/análise , Vento , Estações do Ano

11.

ContrAttNet: Contribution and attention approach to multivariate time-series data imputation.

Yin, Yunfei; Huang, Caihao; Bao, Xianjian.

Network ; : 1-24, 2024 Jun 03.

Artigo em Inglês | MEDLINE | ID: mdl-38828665

RESUMO

The imputation of missing values in multivariate time-series data is a basic and popular data processing technology. Recently, some studies have exploited Recurrent Neural Networks (RNNs) and Generative Adversarial Networks (GANs) to impute/fill the missing values in multivariate time-series data. However, when faced with datasets with high missing rates, the imputation error of these methods increases dramatically. To this end, we propose a neural network model based on dynamic contribution and attention, denoted as ContrAttNet. ContrAttNet consists of three novel modules: feature attention module, iLSTM (imputation Long Short-Term Memory) module, and 1D-CNN (1-Dimensional Convolutional Neural Network) module. ContrAttNet exploits temporal information and spatial feature information to predict missing values, where iLSTM attenuates the memory of LSTM according to the characteristics of the missing values, to learn the contributions of different features. Moreover, the feature attention module introduces an attention mechanism based on contributions, to calculate supervised weights. Furthermore, under the influence of these supervised weights, 1D-CNN processes the time-series data by treating them as spatial features. Experimental results show that ContrAttNet outperforms other state-of-the-art models in the missing value imputation of multivariate time-series data, with average 6% MAPE and 9% MAE on the benchmark datasets.

12.

Identification and validation of sepsis subphenotypes using time-series data.

Hao, Chenxiao; Hao, Rui; Zhao, Huiying; Zhang, Yong; Sheng, Ming; An, Youzhong.

Heliyon ; 10(7): e28520, 2024 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-38689952

RESUMO

Purpose: The recognition of sepsis as a heterogeneous syndrome necessitates identifying distinct subphenotypes to select targeted treatment. Methods: Patients with sepsis from the MIMIC-IV database (2008-2019) were randomly divided into a development cohort (80%) and an internal validation cohort (20%). Patients with sepsis from the ICU database of Peking University People's Hospital (2008-2022) were included in the external validation cohort. Time-series k-means clustering analysis and dynamic time warping was performed to develop and validate sepsis subphenotypes by analyzing the trends of 21 vital signs and laboratory indicators within 24 h after sepsis onset. Inflammatory biomarkers were compared in the ICU database of Peking University People's Hospital, whereas treatment heterogeneity was compared in the MIMIC-IV database. Findings: Three sub-phenotypes were identified in the development cohort. Type A patients (N = 2525, 47%) exhibited stable vital signs and fair organ function, type B (N = 1552, 29%) was exhibited an obvious inflammatory response and stable organ function, and type C (N = 1251, 24%) exhibited severely impaired organ function with a deteriorating tendency. Type C demonstrated the highest mortality rate (33%) and levels of inflammatory biomarkers, followed by type B (24%), whereas type A exhibited the lowest mortality rate (11%) and levels of inflammatory biomarkers. These subphenotypes were confirmed in both the internal and external cohorts, demonstrating similar features and comparable mortality rates. In type C patients, survivors had significantly lower fluid intake within 24 h after sepsis onset (median 2891 mL, interquartile range (IQR) 1530-5470 mL) than that in non-survivors (median 4342 mL, IQR 2189-7305 mL). For types B and C, survivors showed a higher proportion of indwelling central venous catheters (p < 0.05). Conclusion: Three novel phenotypes of patients with sepsis were identified and validated using time-series data, revealing significant heterogeneity in inflammatory biomarkers, treatments, and consistency across cohorts.

13.

Identifying temporal pathways using biomarkers in the presence of latent non-Gaussian components.

Xie, Shanghong; Zeng, Donglin; Wang, Yuanjia.

Biometrics ; 80(2)2024 Mar 27.

Artigo em Inglês | MEDLINE | ID: mdl-38708763

RESUMO

Time-series data collected from a network of random variables are useful for identifying temporal pathways among the network nodes. Observed measurements may contain multiple sources of signals and noises, including Gaussian signals of interest and non-Gaussian noises, including artifacts, structured noise, and other unobserved factors (eg, genetic risk factors, disease susceptibility). Existing methods, including vector autoregression (VAR) and dynamic causal modeling do not account for unobserved non-Gaussian components. Furthermore, existing methods cannot effectively distinguish contemporaneous relationships from temporal relations. In this work, we propose a novel method to identify latent temporal pathways using time-series biomarker data collected from multiple subjects. The model adjusts for the non-Gaussian components and separates the temporal network from the contemporaneous network. Specifically, an independent component analysis (ICA) is used to extract the unobserved non-Gaussian components, and residuals are used to estimate the contemporaneous and temporal networks among the node variables based on method of moments. The algorithm is fast and can easily scale up. We derive the identifiability and the asymptotic properties of the temporal and contemporaneous networks. We demonstrate superior performance of our method by extensive simulations and an application to a study of attention-deficit/hyperactivity disorder (ADHD), where we analyze the temporal relationships between brain regional biomarkers. We find that temporal network edges were across different brain regions, while most contemporaneous network edges were bilateral between the same regions and belong to a subset of the functional connectivity network.

Assuntos

Algoritmos , Biomarcadores , Simulação por Computador , Modelos Estatísticos , Humanos , Biomarcadores/análise , Distribuição Normal , Transtorno do Deficit de Atenção com Hiperatividade , Fatores de Tempo , Biometria/métodos

14.

Time series (2003-15) analysis of selected physicochemical parameters in Indian Ocean: Cumulative impacts prediction on coral bleaching using machine learning.

Panja, Atanu Kumar; Jaiswal, Sweta; Haldar, Soumya.

Sci Total Environ ; 933: 173002, 2024 Jul 10.

Artigo em Inglês | MEDLINE | ID: mdl-38710398

RESUMO

Coral bleaching is an important ecological threat worldwide, as the coral ecosystem supports a rich marine biodiversity to survive. Sea surface temperature was considered a major culprit; however, later it was observed that other water parameters like pH, tCO2, fCO2, salinity, dissolved oxygen, etc. also play a significant role in bleaching. In the present study, all these parameters of the Indian Ocean area for 15 years (2003-2017) were collected and analysed using machine learning language. The main aim is to see the cumulative impacts of various ocean parameters on coral bleaching. Introducing machine learning in environmental impact assessment studies is a new approach, and the prediction of coral bleaching using simulation of physico-chemical parameters interactions shows 94.4 % accuracy for the prediction of the future bleaching event. This study can be probably the first step in the application of the machine learning language for the prediction of coral bleaching in the field of marine science.

Assuntos

Antozoários , Recifes de Corais , Monitoramento Ambiental , Aprendizado de Máquina , Oceano Índico , Animais , Monitoramento Ambiental/métodos , Água do Mar/química , Temperatura , Ecossistema

15.

Sensor-Based Indoor Fire Forecasting Using Transformer Encoder.

Jeong, Young-Seob; Hwang, JunHa; Lee, SeungDong; Ndomba, Goodwill Erasmo; Kim, Youngjin; Kim, Jeung-Im.

Sensors (Basel) ; 24(7)2024 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-38610590

RESUMO

Indoor fires may cause casualties and property damage, so it is important to develop a system that predicts fires in advance. There have been studies to predict potential fires using sensor values, and they mostly exploited machine learning models or recurrent neural networks. In this paper, we propose a stack of Transformer encoders for fire prediction using multiple sensors. Our model takes the time-series values collected from the sensors as input, and predicts the potential fire based on the sequential patterns underlying the time-series data. We compared our model with traditional machine learning models and recurrent neural networks on two datasets. For a simple dataset, we found that the machine learning models are better than ours, whereas our model gave better performance for a complex dataset. This implies that our model has a greater potential for real-world applications that probably have complex patterns and scenarios.

16.

Neural Models for Generating Natural Language Summaries from Temporal Personal Health Data.

Harris, Jonathan; Zaki, Mohammed J.

J Healthc Inform Res ; 8(2): 370-399, 2024 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-38681757

RESUMO

With an increased interest in the production of personal health technologies designed to track user data (e.g., nutrient intake, step counts), there is now more opportunity than ever to surface meaningful behavioral insights to everyday users in the form of natural language. This knowledge can increase their behavioral awareness and allow them to take action to meet their health goals. It can also bridge the gap between the vast collection of personal health data and the summary generation required to describe an individual's behavioral tendencies. Previous work has focused on rule-based time-series data summarization methods designed to generate natural language summaries of interesting patterns found within temporal personal health data. We examine recurrent, convolutional, and Transformer-based encoder-decoder models to automatically generate natural language summaries from numeric temporal personal health data. We showcase the effectiveness of our models on real user health data logged in MyFitnessPal (Weber and Achananuparp 2016) and show that we can automatically generate high-quality natural language summaries. Our work serves as a first step towards the ambitious goal of automatically generating novel and meaningful temporal summaries from personal health data.

17.

A Generative Neighborhood-Based Deep Autoencoder for Robust Imbalanced Classification.

Troullinou, Eirini; Tsagkatakis, Grigorios; Losonczy, Attila; Poirazi, Panayiota; Tsakalides, Panagiotis.

IEEE Trans Artif Intell ; 5(1): 80-91, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38500544

RESUMO

Deep learning models perform remarkably well on many classification tasks recently. The superior performance of deep neural networks relies on the large number of training data, which at the same time must have an equal class distribution in order to be efficient. However, in most real-world applications, the labeled data may be limited with high imbalance ratios among the classes, and thus, the learning process of most classification algorithms is adversely affected resulting in unstable predictions and low performance. Three main categories of approaches address the problem of imbalanced learning, i.e., data-level, algorithmic level, and hybrid methods, which combine the two aforementioned approaches. Data generative methods are typically based on generative adversarial networks, which require significant amounts of data, while model-level methods entail extensive domain expert knowledge to craft the learning objectives, thereby being less accessible for users without such knowledge. Moreover, the vast majority of these approaches are designed and applied to imaging applications, less to time series, and extremely rare to both of them. To address the above issues, we introduce GENDA, a generative neighborhood-based deep autoencoder, which is simple yet effective in its design and can be successfully applied to both image and time-series data. GENDA is based on learning latent representations that rely on the neighboring embedding space of the samples. Extensive experiments, conducted on a variety of widely-used real datasets demonstrate the efficacy of the proposed method. Impact Statement: Imbalanced data classification is an actual and important issue in many real-world learning applications hampering most classification tasks. Fraud detection, biomedical imaging categorizing healthy people versus patients, and object detection are some indicative domains with an economic, social and technological impact, which are greatly affected by inherent imbalanced data distribution. However, the majority of the existing algorithms that address the imbalanced classification problem are designed with a particular application in mind, and thus they can be used with specific datasets and even hyperparameters. The generative model introduced in this paper overcomes this limitation and produces improved results for a large class of imaging and time series data even under severe imbalance ratios, making it quite competitive.

18.

Analyzing entropy features in time-series data for pattern recognition in neurological conditions.

Huang, Yushan; Zhao, Yuchen; Capstick, Alexander; Palermo, Francesca; Haddadi, Hamed; Barnaghi, Payam.

Artif Intell Med ; 150: 102821, 2024 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-38553161

RESUMO

In the field of medical diagnosis and patient monitoring, effective pattern recognition in neurological time-series data is essential. Traditional methods predominantly based on statistical or probabilistic learning and inference often struggle with multivariate, multi-source, state-varying, and noisy data while also posing privacy risks due to excessive information collection and modeling. Furthermore, these methods often overlook critical statistical information, such as the distribution of data points and inherent uncertainties. To address these challenges, we introduce an information theory-based pipeline that leverages specialized features to identify patterns in neurological time-series data while minimizing privacy risks. We incorporate various entropy methods based on the characteristics of different scenarios and entropy. For stochastic state transition applications, we incorporate Shannon's entropy, entropy rates, entropy production, and the von Neumann entropy of Markov chains. When state modeling is impractical, we select and employ approximate entropy, increment entropy, dispersion entropy, phase entropy, and slope entropy. The pipeline's effectiveness and scalability are demonstrated through pattern analysis in a dementia care dataset and also an epileptic and a myocardial infarction dataset. The results indicate that our information theory-based pipeline can achieve average performance improvements across various models on the recall rate, F1 score, and accuracy by up to 13.08 percentage points, while enhancing inference efficiency by reducing the number of model parameters by an average of 3.10 times. Thus, our approach opens a promising avenue for improved, efficient, and critical statistical information-considered pattern recognition in medical time-series data.

Assuntos

Entropia , Humanos , Cadeias de Markov , Fatores de Tempo

19.

Evaluation of the District Health Information System in District Kotli, Azad Jammu and Kashmir: A Retrospective Analysis.

Saleem Khan, Mohammad; Faizan Ejaz, Khawaja; Adnan, Khan; Ahmed, Sohail; Saleem, Humayun; Jadoon, Sarosh Khan; Akbar, Amna; Tasneem, Sabahat.

Cureus ; 16(1): e53242, 2024 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-38425611

RESUMO

BACKGROUND: It is essential to implement a high-quality electronic database for keeping important information. The District Health Information System (DHIS) is an active data-keeping system in Pakistan. This study aimed to evaluate the patients' data from the DHIS dashboard for the District Headquarters Hospital, Kotli, Azad Jammu and Kashmir (AJK). METHODOLOGY: The data was requested from the hospital administration at District Headquarters Hospital, Kotli, AJK, and the data was analyzed after permission was granted. The data was given in two forms; one was a hard copy of the data for August and September and the other was a comma-separated values file for October and November, 2023. RESULTS: The highest frequency of patients was received in the department of emergency and trauma and the patient's median age was between 15 and 49 years. The second department was medicine with the >50 years of age. Common conditions that needed more attention were chronic obstructive pulmonary disease, acute respiratory infection, diarrhea, pneumonia, diabetes mellitus, hypertension, and ischemic heart disease. CONCLUSION: For nations with constrained healthcare systems and funds, primary health care (PHC) is the only viable approach for managing non-communicable diseases (NCDs). However, PHC systems intended for infectious diseases have not sufficiently adapted to the growing requirement of chronic care for NCD. Research using health information databases offers numerous benefits, such as the evaluation of large data sets and unexpected prevalence of disease in certain populations, such as a higher prevalence of disease in one gender or age group. Health information system-based data analysis or studies are less expensive and faster but lack scientific control over data collection.

20.

A Description of Missing Data in Single-Case Experimental Designs Studies and an Evaluation of Single Imputation Methods.

Aydin, Orhan.

Behav Modif ; 48(3): 312-359, 2024 May.

Artigo em Inglês | MEDLINE | ID: mdl-38374608

RESUMO

Missing data is inevitable in single-case experimental designs (SCEDs) studies due to repeated measures over a period of time. Despite this fact, SCEDs implementers such as researchers, teachers, clinicians, and school psychologists usually ignore missing data in their studies. Performing analyses without considering missing data in an intervention study using SCEDs or a meta-analysis study including SCEDs studies in a topic can lead to biased results and affect the validity of individual or overall results. In addition, missingness can undermine the generalizability of SCEDs studies. Considering these drawbacks, this study aims to give descriptive and advisory information to SCEDs practitioners and researchers about missing data in single-case data. To accomplish this task, the study presents information about missing data mechanisms, item level and unit level missing data, planned missing data designs, drawbacks of ignoring missing data in SCEDs, and missing data handling methods. Since single imputation methods among missing data handling methods do not require complicated statistical knowledge, are easy to use, and hence are more likely to be used by practitioners and researchers, the present study evaluates single imputation methods in terms of intervention effect sizes and missing data rates by using a real and hypothetical data sample. This study encourages SCEDs implementers, and also meta-analysts to use some of the single imputation methods to increase the generalizability and validity of the study results in case they encounter missing data in their studies.

Assuntos

Projetos de Pesquisa , Humanos , Interpretação Estatística de Dados

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA