RESUMEN
Gaussian graphical models are widely used to study the dependence structure among variables. When samples are obtained from multiple conditions or populations, joint analysis of multiple graphical models are desired due to their capacity to borrow strength across populations. Nonetheless, existing methods often overlook the varying levels of similarity between populations, leading to unsatisfactory results. Moreover, in many applications, learning the population-level clustering structure itself is of particular interest. In this article, we develop a novel method, called Simultaneous Clustering and Estimation of Networks via Tensor decomposition (SCENT), that simultaneously clusters and estimates graphical models from multiple populations. Precision matrices from different populations are uniquely organized as a three-way tensor array, and a low-rank sparse model is proposed for joint population clustering and network estimation. We develop a penalized likelihood method and an augmented Lagrangian algorithm for model fitting. We also establish the clustering accuracy and norm consistency of the estimated precision matrices. We demonstrate the efficacy of the proposed method with comprehensive simulation studies. The application to the Genotype-Tissue Expression multi-tissue gene expression data provides important insights into tissue clustering and gene coexpression patterns in multiple brain tissues.
RESUMEN
Recently, N6-methylation (m6A) has recently become a hot topic due to its key role in disease pathogenesis. Identifying disease-related m6A sites aids in the understanding of the molecular mechanisms and biosynthetic pathways underlying m6A-mediated diseases. Existing methods treat it primarily as a binary classification issue, focusing solely on whether an m6A-disease association exists or not. Although they achieved good results, they all shared one common flaw: they ignored the post-transcriptional regulation events during disease pathogenesis, which makes biological interpretation unsatisfactory. Thus, accurate and explainable computational models are required to unveil the post-transcriptional regulation mechanisms of disease pathogenesis mediated by m6A modification, rather than simply inferring whether the m6A sites cause disease or not. Emerging laboratory experiments have revealed the interactions between m6A and other post-transcriptional regulation events, such as circular RNA (circRNA) targeting, microRNA (miRNA) targeting, RNA-binding protein binding and alternative splicing events, etc., present a diverse landscape during tumorigenesis. Based on these findings, we proposed a low-rank tensor completion-based method to infer disease-related m6A sites from a biological standpoint, which can further aid in specifying the post-transcriptional machinery of disease pathogenesis. It is so exciting that our biological analysis results show that Coronavirus disease 2019 may play a role in an m6A- and miRNA-dependent manner in inducing non-small cell lung cancer.
Asunto(s)
COVID-19 , Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , MicroARNs , Adenosina/metabolismo , Empalme Alternativo , COVID-19/genética , Humanos , Metilación , MicroARNs/genética , MicroARNs/metabolismo , ARN Circular , Proteínas de Unión al ARN/metabolismoRESUMEN
Modular control of the muscle, which is called muscle synergy, simplifies control of the movement by the central nervous system. The purpose of this study was to explore the synergy in both the frequency and movement domains based on the non-negative Tucker decomposition (NTD) method. Surface electromyography (sEMG) data of 8 upper limb muscles in 10 healthy subjects under wrist flexion (WF) and wrist extension (WE) were recorded. NTD was selected for exploring the multi-domain muscle synergy from the sEMG data. The results showed two synergistic flexor pairs, Palmaris longus-Flexor Digitorum Superficialis (PL-FDS) and Extensor Carpi Radialis-Flexor Carpi Radialis (ECR-FCR), in the WF stage. Their spectral components are mainly in the respective bands 0-20 Hz and 25-50 Hz. And the spectral components of two extensor pairs, Extensor Digitorum-Extensor Carpi Ulnar (ED-ECU) and Extensor Carpi Radialis-Brachioradialis (ECR-B), are mainly in the respective bands 0-20 Hz and 7-45 Hz in the WE stage. Additionally, further analysis showed that the Biceps Brachii (BB) muscle was a shared muscle synergy module of the WE and WF stage, while the flexor muscles FCR, PL and FDS were the specific synergy modules of the WF stage, and the extensor muscles ED, ECU, ECR and B were the specific synergy modules of the WE stage. This study showed that NTD is a meaningful method to explore the multi-domain synergistic characteristics of multi-channel sEMG signals. The results can help us to better understand the frequency features of muscle synergy and shared and specific synergies, and expand the study perspective related to motor control in the nervous system.
Asunto(s)
Electromiografía , Movimiento , Músculo Esquelético , Muñeca , Humanos , Músculo Esquelético/fisiología , Masculino , Muñeca/fisiología , Adulto , Movimiento/fisiología , Femenino , Adulto Joven , Procesamiento de Señales Asistido por ComputadorRESUMEN
In behavioral research, it is very common to have manage multiple datasets containing information about the same set of individuals, in such a way that one dataset attempts to explain the others. To address this need, in this paper the Tucker3-PCovR model is proposed. This model is a particular case of PCovR models which focuses on the analysis of a three-way data array and a two-way data matrix where the latter plays the explanatory role. The Tucker3-PCovR model reduces the predictors to a few components and predicts the criterion by using these components and, at the same time, the three-way data is fitted by the Tucker3 model. Both the reduction of the predictors and the prediction of the criterion are done simultaneously. An alternating least squares algorithm is proposed to estimate the Tucker3-PCovR model. A biplot representation is presented to facilitate the interpretation of the results. Some applications are made to empirical datasets from the field of psychology.
Asunto(s)
Algoritmos , Modelos Estadísticos , Humanos , Análisis de Regresión , Interpretación Estadística de Datos , Investigación Conductal/métodos , Análisis de los Mínimos CuadradosRESUMEN
BACKGROUND: Complex biological systems are described as a multitude of cell-cell interactions (CCIs). Recent single-cell RNA-sequencing studies focus on CCIs based on ligand-receptor (L-R) gene co-expression but the analytical methods are not appropriate to detect many-to-many CCIs. RESULTS: In this work, we propose scTensor, a novel method for extracting representative triadic relationships (or hypergraphs), which include ligand-expression, receptor-expression, and related L-R pairs. CONCLUSIONS: Through extensive studies with simulated and empirical datasets, we have shown that scTensor can detect some hypergraphs that cannot be detected using conventional CCI detection methods, especially when they include many-to-many relationships. scTensor is implemented as a freely available R/Bioconductor package.
Asunto(s)
ARN , Programas Informáticos , Ligandos , Análisis de Secuencia de ARN/métodos , Expresión Génica , ARN/genéticaRESUMEN
ECG quality assessment is crucial for reducing false alarms and physician strain in automated diagnosis of cardiovascular diseases. Recent researches have focused on constructing an automatic noisy ECG record rejection mechanism. This work develops a noisy ECG record rejection system using scalogram and Tucker tensor decomposition. The system can reject ECG records, which cannot be analyzed or diagnosed. Scalogram of all 12lead ECG signals per subject are stacked to form a 3-way tensor. Tucker tensor decomposition is applied with empirical settings to obtain the core tensor. The core tensor is reshaped to form the latent features set. When tested using the PhysioNet challenge 2011 dataset in five-fold cross validation settings, the RusBoost ensemble classifier proved to be a very reliable option, producing an accuracy of 92.4% along with sensitivity of 87.1% and specificity of 93.5%. According to the experimental findings, combining the scalogram with Tucker tensor decomposition yields competitive performance and has the potential to be used in actual evaluation of ECG quality.
Asunto(s)
Algoritmos , Electrocardiografía , Humanos , Procesamiento de Señales Asistido por ComputadorRESUMEN
The deployment of Electronic Toll Collection (ETC) gantry systems marks a transformative advancement in the journey toward an interconnected and intelligent highway traffic infrastructure. The integration of these systems signifies a leap forward in streamlining toll collection and minimizing environmental impact through decreased idle times. To solve the problems of missing sensor data in an ETC gantry system with large volumes and insufficient traffic detection among ETC gantries, this study constructs a high-order tensor model based on the analysis of the high-dimensional, sparse, large-volume, and heterogeneous characteristics of ETC gantry data. In addition, a missing data completion method for the ETC gantry data is proposed based on an improved dynamic tensor flow model. This study approximates the decomposition of neighboring tensor blocks in the high-order tensor model of the ETC gantry data based on tensor Tucker decomposition and the Laplacian matrix. This method captures the correlations among space, time, and user information in the ETC gantry data. Case studies demonstrate that our method enhances ETC gantry data quality across various rates of missing data while also reducing computational complexity. For instance, at a less than 5% missing data rate, our approach reduced the RMSE for time vehicle distance by 0.0051, for traffic volume by 0.0056, and for interval speed by 0.0049 compared to the MATRIX method. These improvements not only indicate a potential for more precise traffic data analysis but also add value to the application of ETC systems and contribute to theoretical and practical advancements in the field.
RESUMEN
Due to the increase in the number of mobile stations in recent years, cooperative relaying systems have emerged as a promising technique for improving the quality of fifth-generation (5G) wireless networks with an extension of the coverage area. In this paper, we propose a two-hop orthogonal frequency division multiplexing and code-division multiple-access (OFDM-CDMA) multiple-input multiple-output (MIMO) relay system, which combines, both at the source and relay nodes, a tensor space-time-frequency (TSTF) coding with a multiple symbol matrices Kronecker product (MSMKron), called TSTF-MSMKron coding, aiming to increase the diversity gain. It is first established that the signals received at the relay and the destination satisfy generalized Tucker models whose core tensors are the coding tensors. Assuming the coding tensors are known at both nodes, tensor models are exploited to derive two semi-blind receivers, composed of two steps, to jointly estimate symbol matrices and individual channels. Necessary conditions for parameter identifiability with each receiver are established. Extensive Monte Carlo simulation results are provided to show the impact of design parameters on the symbol error rate (SER) performance, using the zero-forcing (ZF) receiver. Next, Monte Carlo simulations illustrate the effectiveness of the proposed TSTF-MSMKron coding and semi-blind receivers, highlighting the benefit of exploiting the new coding to increase the diversity gain.
Asunto(s)
Algoritmos , Simulación por Computador , Método de MontecarloRESUMEN
Tensor completion is a fundamental tool to estimate unknown information from observed data, which is widely used in many areas, including image and video recovery, traffic data completion and the multi-input multi-output problems in information theory. Based on Tucker decomposition, this paper proposes a new algorithm to complete tensors with missing data. In decomposition-based tensor completion methods, underestimation or overestimation of tensor ranks can lead to inaccurate results. To tackle this problem, we design an alternative iterating method that breaks the original problem into several matrix completion subproblems and adaptively adjusts the multilinear rank of the model during optimization procedures. Through numerical experiments on synthetic data and authentic images, we show that the proposed method can effectively estimate the tensor ranks and predict the missing entries.
RESUMEN
Cancer progression can be described by continuous-time Markov chains whose state space grows exponentially in the number of somatic mutations. The age of a tumor at diagnosis is typically unknown. Therefore, the quantity of interest is the time-marginal distribution over all possible genotypes of tumors, defined as the transient distribution integrated over an exponentially distributed observation time. It can be obtained as the solution of a large linear system. However, the sheer size of this system renders classical solvers infeasible. We consider Markov chains whose transition rates are separable functions, allowing for an efficient low-rank tensor representation of the linear system's operator. Thus we can reduce the computational complexity from exponential to linear. We derive a convergent iterative method using low-rank formats whose result satisfies the normalization constraint of a distribution. We also perform numerical experiments illustrating that the marginal distribution is well approximated with low rank.
Asunto(s)
Cadenas de Markov , GenotipoRESUMEN
In this work, we propose a method for the compression of the coupling matrix in volume-surface integral equation (VSIE) formulations. VSIE methods are used for electromagnetic analysis in magnetic resonance imaging (MRI) applications, for which the coupling matrix models the interactions between the coil and the body. We showed that these effects can be represented as independent interactions between remote elements in 3D tensor formats, and subsequently decomposed with the Tucker model. Our method can work in tandem with the adaptive cross approximation technique to provide fast solutions of VSIE problems. We demonstrated that our compression approaches can enable the use of VSIE matrices of prohibitive memory requirements, by allowing the effective use of modern graphical processing units (GPUs) to accelerate the arising matrix-vector products. This is critical to enable numerical MRI simulations at clinical voxel resolutions in a feasible computation time. In this paper, we demonstrate that the VSIE matrix-vector products needed to calculate the electromagnetic field produced by an MRI coil inside a numerical body model with 1 mm3 voxel resolution, could be performed in ~ 33 seconds in a GPU, after compressing the associated coupling matrix from ~ 80 TB to ~ 43 MB.
RESUMEN
The development of deep learning technology has resulted in great contributions in many artificial intelligence services, but adversarial attack techniques on deep learning models are also becoming more diverse and sophisticated. IoT edge devices take cloud-independent on-device DNN (deep neural network) processing technology to exhibit a fast response time. However, if the computational complexity of the denoizer for adversarial noises is high, or if a single embedded GPU is shared by multiple DNN models, adversarial defense at the on-device level is bound to represent a long latency. To solve this problem, eDenoizer is proposed in this paper. First, it applies Tucker decomposition to reduce the computational amount required for convolutional kernel tensors in the denoizer. Second, eDenoizer effectively orchestrates both the denoizer and the model defended by the denoizer simultaneously. In addition, the priority of the CPU side can be projected onto the GPU which is completely priority-agnostic, so that the delay can be minimized when the denoizer and the defense target model are assigned a high priority. As a result of confirming through extensive experiments, the reduction of classification accuracy was very marginal, up to 1.78%, and the inference speed accompanied by adversarial defense was improved up to 51.72%.
RESUMEN
The excitation-emission matrix fluorescence (EEMF) spectroscopic technique provides a viable means of analyzing samples from different fields. EEMF spectral data sets are much larger in volume, so that they can only be interpreted using novel data analysis techniques. Here, a novel spectral initialization approach was introduced to fit the Tucker3 model to EEMF spectral data. The suggested method involved variable initialization in a restrained way, yielding initial estimates of EEMF spectra that were comparable with experimentally acquired EEMF profiles. Tucker3 modelling of EEMF spectra with these initial estimates made these analyses fast and computationally economical. The Tucker3 model with the proposed initialization approach was found to yield much purer spectral and concentration profiles. The proposed approach was validated by successfully processing the EEMF spectral data sets of biomolecule mixtures.
Asunto(s)
Espectrometría de Fluorescencia , FluorometríaRESUMEN
The recent worldwide outbreak of the novel coronavirus disease 2019 (COVID-19) opened new challenges for the research community. Machine learning (ML)-guided methods can be useful for feature prediction, involved risk, and the causes of an analogous epidemic. Such predictions can be useful for managing and intercepting the outbreak of such diseases. The foremost advantages of applying ML methods are handling a wide variety of data and easy identification of trends and patterns of an undetermined nature.In this study, we propose a partial derivative regression and nonlinear machine learning (PDR-NML) method for global pandemic prediction of COVID-19. We used a Progressive Partial Derivative Linear Regression model to search for the best parameters in the dataset in a computationally efficient manner. Next, a Nonlinear Global Pandemic Machine Learning model was applied to the normalized features for making accurate predictions. The results show that the proposed ML method outperformed state-of-the-art methods in the Indian population and can also be a convenient tool for making predictions for other countries.
RESUMEN
In psychology, many studies measure the same variables in different groups. In the case of a large number of variables when a strong a priori idea about the underlying latent construct is lacking, researchers often start by reducing the variables to a few principal components in an exploratory way. Herewith, one often wants to evaluate whether the components represent the same construct in the different groups. To this end, it makes sense to remove outlying variables that have significantly different loadings on the extracted components across the groups, hampering equivalent interpretations of the components. Moreover, identifying such outlying variables is important when testing theories about which variables behave similarly or differently across groups. In this article, we first scrutinize the lower bound congruence method (LBCM; De Roover, Timmerman, & Ceulemans in Behavior Research Methods, 49, 216-229, 2017), which was recently proposed for solving the outlying-variable detection problem. LBCM investigates how Tucker's congruence between the loadings of the obtained cluster-loading matrices improves when specific variables are discarded. We show that LBCM has the tendency to output outlying variables that either are false positives or concern very small, and thus practically insignificant, loading differences. To address this issue, we present a new heuristic: the lower and resampled upper bound congruence method (LRUBCM). This method uses a resampling technique to obtain a sampling distribution for the congruence coefficient, under the hypothesis that no outlying variable is present. In a simulation study, we show that LRUBCM outperforms LBCM. Finally, we illustrate the use of the method by means of empirical data.
Asunto(s)
Proyectos de InvestigaciónRESUMEN
Real signals are usually contaminated with various types of noise. This phenomenon has a negative impact on the operation of systems that rely on signals processing. In this paper, we propose a tensor-based method for speckle noise reduction in the side-scan sonar images. The method is based on the Tucker decomposition with automatically determined ranks of factoring tensors. As verified experimentally, the proposed method shows very good results, outperforming other types of speckle-noise filters.
RESUMEN
We prove an extremal result for long Markov chains based on the monotone path argument, generalizing an earlier work by Courtade and Jiao.
RESUMEN
In many situations, a researcher is interested in the analysis of the scores of a set of observation units on a set of variables. However, in medicine, it is very frequent that the information is replicated at different occasions. The occasions can be time-varying or refer to different conditions. In such cases, the data can be stored in a 3-way array or tensor. The Candecomp/Parafac and Tucker3 methods represent the most common methods for analyzing 3-way tensors. In this work, a review of these methods is provided, and then this class of methods is applied to a 3-way data set concerning hospital care data for a hospital in Rome (Italy) during 15 years distinguished in 3 groups of consecutive years (1892-1896, 1940-1944, 1968-1972). The analysis reveals some peculiar aspects about the use of health services and its evolution along the time.
Asunto(s)
Bioestadística/métodos , Servicios de Salud/estadística & datos numéricos , Registros de Hospitales/estadística & datos numéricos , Interpretación Estadística de Datos , Bases de Datos Factuales/estadística & datos numéricos , Humanos , Modelos Estadísticos , Análisis de Componente Principal/métodos , Ciudad de Roma , Programas InformáticosRESUMEN
We study the nonparametric estimation of a decreasing density function g 0 in a general s-sample biased sampling model with weight (or bias) functions wi for i = 1, , s. The determination of the monotone maximum likelihood estimator gn and its asymptotic distribution, except for the case when s = 1, has been long missing in the literature due to certain non-standard structures of the likelihood function, such as non-separability and a lack of strictly positive second order derivatives of the negative of the log-likelihood function. The existence, uniqueness, self-characterization, consistency of gn and its asymptotic distribution at a fixed point are established in this article. To overcome the barriers caused by non-standard likelihood structures, for instance, we show the tightness of gn via a purely analytic argument instead of an intrinsic geometric one and propose an indirect approach to attain the n -rate of convergence of the linear functional ∫ wi gn.
RESUMEN
NMR spectroscopy is an emerging analytical tool for measuring complex drug product qualities, e.g., protein higher order structure (HOS) or heparin chemical composition. Most drug NMR spectra have been visually analyzed; however, NMR spectra are inherently quantitative and multivariate and thus suitable for chemometric analysis. Therefore, quantitative measurements derived from chemometric comparisons between spectra could be a key step in establishing acceptance criteria for a new generic drug or a new batch after manufacture change. To measure the capability of chemometric methods to differentiate comparator NMR spectra, we calculated inter-spectra difference metrics on 1D/2D spectra of two insulin drugs, Humulin R® and Novolin R®, from different manufacturers. Both insulin drugs have an identical drug substance but differ in formulation. Chemometric methods (i.e., principal component analysis (PCA), 3-way Tucker3 or graph invariant (GI)) were performed to calculate Mahalanobis distance (D M) between the two brands (inter-brand) and distance ratio (D R) among the different lots (intra-brand). The PCA on 1D inter-brand spectral comparison yielded a D M value of 213. In comparing 2D spectra, the Tucker3 analysis yielded the highest differentiability value (D M = 305) in the comparisons made followed by PCA (D M = 255) then the GI method (D M = 40). In conclusion, drug quality comparisons among different lots might benefit from PCA on 1D spectra for rapidly comparing many samples, while higher resolution but more time-consuming 2D-NMR-data-based comparisons using Tucker3 analysis or PCA provide a greater level of assurance for drug structural similarity evaluation between drug brands.