Búsqueda | Portal Regional de la BVS

1.

A tabular data generation framework guided by downstream tasks optimization.

Jia, Fengwei; Zhu, Hongli; Jia, Fengyuan; Ren, Xinyue; Chen, Siqi; Tan, Hongming; Chan, Wai Kin Victor.

Sci Rep ; 14(1): 15267, 2024 Jul 03.

Artículo en Inglés | MEDLINE | ID: mdl-38961107

RESUMEN

Recently, generative models have been gradually emerging into the extended dataset field, showcasing their advantages. However, when it comes to generating tabular data, these models often fail to satisfy the constraints of numerical columns, which cannot generate high-quality datasets that accurately represent real-world data and are suitable for the intended downstream applications. Responding to the challenge, we propose a tabular data generation framework guided by downstream task optimization (TDGGD). It incorporates three indicators into each time step of diffusion generation, using gradient optimization to align the generated fake data. Unlike the traditional strategy of separating the downstream task model from the upstream data synthesis model, TDGGD ensures that the generated data has highly focused columns feasibility in upstream real tabular data. For downstream task, TDGGD strikes the utility of tabular data over solely pursuing statistical fidelity. Through extensive experiments conducted on real-world tables with explicit column constraints and tables without explicit column constraints, we have demonstrated that TDGGD ensures increasing data volume while enhancing prediction accuracy. To the best of our knowledge, this is the first instance of deploying downstream information into a diffusion model framework.

2.

Modeling SARS-CoV-2 nucleotide mutations as a stochastic process.

Lim Kai Rong, Maverick; Kuruoglu, Ercan Engin; Chan, Wai Kin Victor.

PLoS One ; 18(4): e0284874, 2023.

Artículo en Inglés | MEDLINE | ID: mdl-37115784

RESUMEN

This study analyzes the SARS-CoV-2 genome sequence mutations by modeling its nucleotide mutations as a stochastic process in both the time-series and spatial domain of the gene sequence. In the time-series model, a Markov Chain embedded Poisson random process characterizes the mutation rate matrix, while the spatial gene sequence model delineates the distribution of mutation inter-occurrence distances. Our experiment focuses on five key variants of concern that had become a global concern due to their high transmissibility and virulence. The time-series results reveal distinct asymmetries in mutation rate and propensities among different nucleotides and across different strains, with a mean mutation rate of approximately 2 mutations per month. In particular, our spatial gene sequence results reveal some novel biological insights on the characteristic distribution of mutation inter-occurrence distances, which display a notable pattern similar to other natural diseases. Our findings contribute interesting insights to the underlying biological mechanism of SARS-CoV-2 mutations, bringing us one step closer to improving the accuracy of existing mutation prediction models. This research could also potentially pave the way for future work in adopting similar spatial random process models and advanced spatial pattern recognition algorithms in order to characterize mutations on other different kinds of virus families.

Asunto(s)

COVID-19 , SARS-CoV-2 , Humanos , SARS-CoV-2/genética , COVID-19/genética , Mutación , Procesos Estocásticos , Nucleótidos , Glicoproteína de la Espiga del Coronavirus

3.

Neural Causal Information Extractor for Unobserved Causes.

Leong, Keng-Hou; Xiu, Yuxuan; Chen, Bokui; Chan, Wai Kin Victor.

Entropy (Basel) ; 26(1)2023 Dec 31.

Artículo en Inglés | MEDLINE | ID: mdl-38248172

RESUMEN

Causal inference aims to faithfully depict the causal relationships between given variables. However, in many practical systems, variables are often partially observed, and some unobserved variables could carry significant information and induce causal effects on a target. Identifying these unobserved causes remains a challenge, and existing works have not considered extracting the unobserved causes while retaining the causes that have already been observed and included. In this work, we aim to construct the implicit variables with a generator-discriminator framework named the Neural Causal Information Extractor (NCIE), which can complement the information of unobserved causes and thus provide a complete set of causes with both observed causes and the representations of unobserved causes. By maximizing the mutual information between the targets and the union of observed causes and implicit variables, the implicit variables we generate could complement the information that the unobserved causes should have provided. The synthetic experiments show that the implicit variables preserve the information and dynamics of the unobserved causes. In addition, extensive real-world time series prediction tasks show improved precision after introducing implicit variables, thus indicating their causality to the targets.

4.

Neural Network Structure Optimization by Simulated Annealing.

Kuo, Chun Lin; Kuruoglu, Ercan Engin; Chan, Wai Kin Victor.

Entropy (Basel) ; 24(3)2022 Feb 28.

Artículo en Inglés | MEDLINE | ID: mdl-35327859

RESUMEN

A critical problem in large neural networks is over parameterization with a large number of weight parameters, which limits their use on edge devices due to prohibitive computational power and memory/storage requirements. To make neural networks more practical on edge devices and real-time industrial applications, they need to be compressed in advance. Since edge devices cannot train or access trained networks when internet resources are scarce, the preloading of smaller networks is essential. Various works in the literature have shown that the redundant branches can be pruned strategically in a fully connected network without sacrificing the performance significantly. However, majority of these methodologies need high computational resources to integrate weight training via the back-propagation algorithm during the process of network compression. In this work, we draw attention to the optimization of the network structure for preserving performance despite compression by pruning aggressively. The structure optimization is performed using the simulated annealing algorithm only, without utilizing back-propagation for branch weight training. Being a heuristic-based, non-convex optimization method, simulated annealing provides a globally near-optimal solution to this NP-hard problem for a given percentage of branch pruning. Our simulation results have shown that simulated annealing can significantly reduce the complexity of a fully connected network while maintaining the performance without the help of back-propagation.

5.

Crash Diagnosis and Price Rebound Prediction in NYSE Composite Index Based on Visibility Graph and Time-Evolving Stock Correlation Network.

Xiu, Yuxuan; Wang, Guanying; Chan, Wai Kin Victor.

Entropy (Basel) ; 23(12)2021 Nov 30.

Artículo en Inglés | MEDLINE | ID: mdl-34945918

RESUMEN

This study proposes a framework to diagnose stock market crashes and predict the subsequent price rebounds. Based on the observation of anomalous changes in stock correlation networks during market crashes, we extend the log-periodic power-law model with a metric that is proposed to measure network anomalies. To calculate this metric, we design a prediction-guided anomaly detection algorithm based on the extreme value theory. Finally, we proposed a hybrid indicator to predict price rebounds of the stock index by combining the network anomaly metric and the visibility graph-based log-periodic power-law model. Experiments are conducted based on the New York Stock Exchange Composite Index from 4 January 1991 to 7 May 2021. It is shown that our proposed method outperforms the benchmark log-periodic power-law model on detecting the 12 major crashes and predicting the subsequent price rebounds by reducing the false alarm rate. This study sheds light on combining stock network analysis and financial time series modeling and highlights that anomalous changes of a stock network can be important criteria for detecting crashes and predicting recoveries of the stock market.

6.

Consensus Control With Failure--Wait or Abandon?

Chan, Wai Kin Victor; Chen, C L Philip.

IEEE Trans Cybern ; 46(1): 75-84, 2016 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-25794406

RESUMEN

This paper introduces and solves a decision-making problem under the context of consensus control with failure. We study an optimal consensus control problem in which n autonomous agents try to arrive at a target at the same time. One of the agents suddenly fails and the rest n - 1 agents can either wait or abandon the failed agent. If they wait, they must slow down and delay the consensus time. If they abandon the failed agent, they can reach consensus earlier at the cost of losing one agent at consensus. This cost is an added delay to the consensus time. The decision problem is to decide whether to wait or abandon and, if abandon, when? To solve this problem, we derive analytical expressions and establish structural properties for target distance functions. We use numerical examples and simulation examples to demonstrate the applications of the derived formulas and results.

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA