Búsqueda | BVS CLAP/SMR-OPS/OMS

1.

Struct2Graph: a graph attention network for structure based predictions of protein-protein interactions.

Baranwal, Mayank; Magner, Abram; Saldinger, Jacob; Turali-Emre, Emine S; Elvati, Paolo; Kozarekar, Shivani; VanEpps, J Scott; Kotov, Nicholas A; Violi, Angela; Hero, Alfred O.

BMC Bioinformatics ; 23(1): 370, 2022 Sep 10.

Artículo en Inglés | MEDLINE | ID: mdl-36088285

RESUMEN

BACKGROUND: Development of new methods for analysis of protein-protein interactions (PPIs) at molecular and nanometer scales gives insights into intracellular signaling pathways and will improve understanding of protein functions, as well as other nanoscale structures of biological and abiological origins. Recent advances in computational tools, particularly the ones involving modern deep learning algorithms, have been shown to complement experimental approaches for describing and rationalizing PPIs. However, most of the existing works on PPI predictions use protein-sequence information, and thus have difficulties in accounting for the three-dimensional organization of the protein chains. RESULTS: In this study, we address this problem and describe a PPI analysis based on a graph attention network, named Struct2Graph, for identifying PPIs directly from the structural data of folded protein globules. Our method is capable of predicting the PPI with an accuracy of 98.89% on the balanced set consisting of an equal number of positive and negative pairs. On the unbalanced set with the ratio of 1:10 between positive and negative pairs, Struct2Graph achieves a fivefold cross validation average accuracy of 99.42%. Moreover, Struct2Graph can potentially identify residues that likely contribute to the formation of the protein-protein complex. The identification of important residues is tested for two different interaction types: (a) Proteins with multiple ligands competing for the same binding area, (b) Dynamic protein-protein adhesion interaction. Struct2Graph identifies interacting residues with 30% sensitivity, 89% specificity, and 87% accuracy. CONCLUSIONS: In this manuscript, we address the problem of prediction of PPIs using a first of its kind, 3D-structure-based graph attention network (code available at https://github.com/baranwa2/Struct2Graph ). Furthermore, the novel mutual attention mechanism provides insights into likely interaction sites through its unsupervised knowledge selection process. This study demonstrates that a relatively low-dimensional feature embedding learned from graph structures of individual proteins outperforms other modern machine learning classifiers based on global protein features. In addition, through the analysis of single amino acid variations, the attention mechanism shows preference for disease-causing residue variations over benign polymorphisms, demonstrating that it is not limited to interface residues.

Asunto(s)

Algoritmos , Proteínas , Secuencia de Aminoácidos , Aminoácidos , Aprendizaje Automático , Proteínas/química

2.

A Pattern Dictionary Method for Anomaly Detection.

Sabeti, Elyas; Oh, Sehong; Song, Peter X K; Hero, Alfred O.

Entropy (Basel) ; 24(8)2022 Aug 09.

Artículo en Inglés | MEDLINE | ID: mdl-36010758

RESUMEN

In this paper, we propose a compression-based anomaly detection method for time series and sequence data using a pattern dictionary. The proposed method is capable of learning complex patterns in a training data sequence, using these learned patterns to detect potentially anomalous patterns in a test data sequence. The proposed pattern dictionary method uses a measure of complexity of the test sequence as an anomaly score that can be used to perform stand-alone anomaly detection. We also show that when combined with a universal source coder, the proposed pattern dictionary yields a powerful atypicality detector that is equally applicable to anomaly detection. The pattern dictionary-based atypicality detector uses an anomaly score defined as the difference between the complexity of the test sequence data encoded by the trained pattern dictionary (typical) encoder and the universal (atypical) encoder, respectively. We consider two complexity measures: the number of parsed phrases in the sequence, and the length of the encoded sequence (codelength). Specializing to a particular type of universal encoder, the Tree-Structured Lempel-Ziv (LZ78), we obtain a novel non-asymptotic upper bound, in terms of the Lambert W function, on the number of distinct phrases resulting from the LZ78 parser. This non-asymptotic bound determines the range of anomaly score. As a concrete application, we illustrate the pattern dictionary framework for constructing a baseline of health against which anomalous deviations can be detected.

3.

A deep learning architecture for metabolic pathway prediction.

Baranwal, Mayank; Magner, Abram; Elvati, Paolo; Saldinger, Jacob; Violi, Angela; Hero, Alfred O.

Bioinformatics ; 36(8): 2547-2553, 2020 04 15.

Artículo en Inglés | MEDLINE | ID: mdl-31879763

RESUMEN

MOTIVATION: Understanding the mechanisms and structural mappings between molecules and pathway classes are critical for design of reaction predictors for synthesizing new molecules. This article studies the problem of prediction of classes of metabolic pathways (series of chemical reactions occurring within a cell) in which a given biochemical compound participates. We apply a hybrid machine learning approach consisting of graph convolutional networks used to extract molecular shape features as input to a random forest classifier. In contrast to previously applied machine learning methods for this problem, our framework automatically extracts relevant shape features directly from input SMILES representations, which are atom-bond specifications of chemical structures composing the molecules. RESULTS: Our method is capable of correctly predicting the respective metabolic pathway class of 95.16% of tested compounds, whereas competing methods only achieve an accuracy of 84.92% or less. Furthermore, our framework extends to the task of classification of compounds having mixed membership in multiple pathway classes. Our prediction accuracy for this multi-label task is 97.61%. We analyze the relative importance of various global physicochemical features to the pathway class prediction problem and show that simple linear/logistic regression models can predict the values of these global features from the shape features extracted using our framework. AVAILABILITY AND IMPLEMENTATION: https://github.com/baranwa2/MetabolicPathwayPrediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Aprendizaje Profundo , Redes Neurales de la Computación , Aprendizaje Automático , Redes y Vías Metabólicas , Programas Informáticos

4.

Exponential Strong Converse for Successive Refinement with Causal Decoder Side Information.

Zhou, Lin; Hero, Alfred.

Entropy (Basel) ; 21(4)2019 Apr 17.

Artículo en Inglés | MEDLINE | ID: mdl-33267124

RESUMEN

We consider the k-user successive refinement problem with causal decoder side information and derive an exponential strong converse theorem. The rate-distortion region for the problem can be derived as a straightforward extension of the two-user case by Maor and Merhav (2008). We show that for any rate-distortion tuple outside the rate-distortion region of the k-user successive refinement problem with causal decoder side information, the joint excess-distortion probability approaches one exponentially fast. Our proof follows by judiciously adapting the recently proposed strong converse technique by Oohama using the information spectrum method, the variational form of the rate-distortion region and Hölder's inequality. The lossy source coding problem with causal decoder side information considered by El Gamal and Weissman is a special case ( k = 1 ) of the current problem. Therefore, the exponential strong converse theorem for the El Gamal and Weissman problem follows as a corollary of our result.

5.

Geometric Estimation of Multivariate Dependency.

Yasaei Sekeh, Salimeh; Hero, Alfred O.

Entropy (Basel) ; 21(8)2019 Aug 12.

Artículo en Inglés | MEDLINE | ID: mdl-33267500

RESUMEN

This paper proposes a geometric estimator of dependency between a pair of multivariate random variables. The proposed estimator of dependency is based on a randomly permuted geometric graph (the minimal spanning tree) over the two multivariate samples. This estimator converges to a quantity that we call the geometric mutual information (GMI), which is equivalent to the Henze-Penrose divergence. between the joint distribution of the multivariate samples and the product of the marginals. The GMI has many of the same properties as standard MI but can be estimated from empirical data without density estimation; making it scalable to large datasets. The proposed empirical estimator of GMI is simple to implement, involving the construction of an minimal spanning tree (MST) spanning over both the original data and a randomly permuted version of this data. We establish asymptotic convergence of the estimator and convergence rates of the bias and variance for smooth multivariate density functions belonging to a Hölder class. We demonstrate the advantages of our proposed geometric dependency estimator in a series of experiments.

6.

Joint camera blur and pose estimation from aliased data.

LeBlanc, Joel W; Thelen, Brian J; Hero, Alfred O.

J Opt Soc Am A Opt Image Sci Vis ; 35(4): 639-651, 2018 Apr 01.

Artículo en Inglés | MEDLINE | ID: mdl-29603952

RESUMEN

A joint-estimation algorithm is presented that enables simultaneous camera blur and pose estimation from a known calibration target in the presence of aliasing. Specifically, a parametric maximum-likelihood (ML) point-spread function estimate is derived for characterizing a camera's optical imperfections through the use of a calibration target in an otherwise loosely controlled environment. The imaging perspective, ambient-light levels, target reflectance, detector gain and offset, quantum efficiency, and read-noise levels are all treated as nuisance parameters. The Cramér-Rao bound is derived, and simulations demonstrate that the proposed estimator achieves near optimal mean squared error performance. The proposed method is applied to experimental data to validate the fidelity of the forward models as well as to establish the utility of the resulting ML estimates for both system identification and subsequent image restoration.

7.

Ensemble Estimation of Information Divergence .

Moon, Kevin R; Sricharan, Kumar; Greenewald, Kristjan; Hero, Alfred O.

Entropy (Basel) ; 20(8)2018 Jul 27.

Artículo en Inglés | MEDLINE | ID: mdl-33265649

RESUMEN

Recent work has focused on the problem of nonparametric estimation of information divergence functionals between two continuous random variables. Many existing approaches require either restrictive assumptions about the density support set or difficult calculations at the support set boundary which must be known a priori. The mean squared error (MSE) convergence rate of a leave-one-out kernel density plug-in divergence functional estimator for general bounded density support sets is derived where knowledge of the support boundary, and therefore, the boundary correction is not required. The theory of optimally weighted ensemble estimation is generalized to derive a divergence estimator that achieves the parametric rate when the densities are sufficiently smooth. Guidelines for the tuning parameter selection and the asymptotic distribution of this estimator are provided. Based on the theory, an empirical estimator of Rényi-α divergence is proposed that greatly outperforms the standard kernel density plug-in estimator in terms of mean squared error, especially in high dimensions. The estimator is shown to be robust to the choice of tuning parameters. We show extensive simulation results that verify the theoretical results of our paper. Finally, we apply the proposed estimator to estimate the bounds on the Bayes error rate of a cell classification problem.

8.

Spectral identification of topological domains.

Chen, Jie; Hero, Alfred O; Rajapakse, Indika.

Bioinformatics ; 32(14): 2151-8, 2016 07 15.

Artículo en Inglés | MEDLINE | ID: mdl-27153657

RESUMEN

MOTIVATION: Topological domains have been proposed as the backbone of interphase chromosome structure. They are regions of high local contact frequency separated by sharp boundaries. Genes within a domain often have correlated transcription. In this paper, we present a computational efficient spectral algorithm to identify topological domains from chromosome conformation data (Hi-C data). We consider the genome as a weighted graph with vertices defined by loci on a chromosome and the edge weights given by interaction frequency between two loci. Laplacian-based graph segmentation is then applied iteratively to obtain the domains at the given compactness level. Comparison with algorithms in the literature shows the advantage of the proposed strategy. RESULTS: An efficient algorithm is presented to identify topological domains from the Hi-C matrix. AVAILABILITY AND IMPLEMENTATION: The Matlab source code and illustrative examples are available at http://bionetworks.ccmb.med.umich.edu/ CONTACT: : indikar@med.umich.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Cromosomas/ultraestructura , Transcripción Genética , Modelos Teóricos , Lenguajes de Programación

9.

An individualized predictor of health and disease using paired reference and target samples.

Liu, Tzu-Yu; Burke, Thomas; Park, Lawrence P; Woods, Christopher W; Zaas, Aimee K; Ginsburg, Geoffrey S; Hero, Alfred O.

BMC Bioinformatics ; 17: 47, 2016 Jan 22.

Artículo en Inglés | MEDLINE | ID: mdl-26801061

RESUMEN

BACKGROUND: Consider the problem of designing a panel of complex biomarkers to predict a patient's health or disease state when one can pair his or her current test sample, called a target sample, with the patient's previously acquired healthy sample, called a reference sample. As contrasted to a population averaged reference this reference sample is individualized. Automated predictor algorithms that compare and contrast the paired samples to each other could result in a new generation of test panels that compare to a person's healthy reference to enhance predictive accuracy. This paper develops such an individualized predictor and illustrates the added value of including the healthy reference for design of predictive gene expression panels. RESULTS: The objective is to predict each subject's state of infection, e.g., neither exposed nor infected, exposed but not infected, pre-acute phase of infection, acute phase of infection, post-acute phase of infection. Using gene microarray data collected in a large scale serially sampled respiratory virus challenge study we quantify the diagnostic advantage of pairing a person's baseline reference with his or her target sample. The full study consists of 2886 microarray chips assaying 12,023 genes of 151 human volunteer subjects under 4 different inoculation regimes (HRV, RSV, H1N1, H3N2). We train (with cross-validation) reference-aided sparse multi-class classifier algorithms on this data to show that inclusion of a subject's reference sample can improve prediction accuracy by as much as 14 %, for the H3N2 cohort, and by at least 6 %, for the H1N1 cohort. Remarkably, these gains in accuracy are achieved by using smaller panels of genes, e.g., 39 % fewer for H3N2 and 31 % fewer for H1N1. The biomarkers selected by the predictors fall into two categories: 1) contrasting genes that tend to differentially express between target and reference samples over the population; 2) reinforcement genes that remain constant over the two samples, which function as housekeeping normalization genes. Many of these genes are common to all 4 viruses and their roles in the predictor elucidate the function that they play in differentiating the different states of host immune response. CONCLUSIONS: If one uses a suitable mathematical prediction algorithm, inclusion of a healthy reference in biomarker diagnostic testing can potentially improve accuracy of disease prediction with fewer biomarkers.

Asunto(s)

Marcadores Genéticos , Análisis por Micromatrices , Virosis/diagnóstico , Algoritmos , Expresión Génica , Genes Esenciales , Humanos , Subtipo H1N1 del Virus de la Influenza A , Subtipo H3N2 del Virus de la Influenza A , Modelos Moleculares , Virus Sincitiales Respiratorios , Rhinovirus

10.

Foundational Principles for Large-Scale Inference: Illustrations Through Correlation Mining.

Hero, Alfred O; Rajaratnam, Bala.

Proc IEEE Inst Electr Electron Eng ; 104(1): 93-110, 2016 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-27087700

RESUMEN

When can reliable inference be drawn in fue "Big Data" context? This paper presents a framework for answering this fundamental question in the context of correlation mining, wifu implications for general large scale inference. In large scale data applications like genomics, connectomics, and eco-informatics fue dataset is often variable-rich but sample-starved: a regime where the number n of acquired samples (statistical replicates) is far fewer than fue number p of observed variables (genes, neurons, voxels, or chemical constituents). Much of recent work has focused on understanding the computational complexity of proposed methods for "Big Data". Sample complexity however has received relatively less attention, especially in the setting when the sample size n is fixed, and the dimension p grows without bound. To address fuis gap, we develop a unified statistical framework that explicitly quantifies the sample complexity of various inferential tasks. Sampling regimes can be divided into several categories: 1) the classical asymptotic regime where fue variable dimension is fixed and fue sample size goes to infinity; 2) the mixed asymptotic regime where both variable dimension and sample size go to infinity at comparable rates; 3) the purely high dimensional asymptotic regime where the variable dimension goes to infinity and the sample size is fixed. Each regime has its niche but only the latter regime applies to exa cale data dimension. We illustrate this high dimensional framework for the problem of correlation mining, where it is the matrix of pairwise and partial correlations among the variables fua t are of interest. Correlation mining arises in numerous applications and subsumes the regression context as a special case. we demonstrate various regimes of correlation mining based on the unifying perspective of high dimensional learning rates and sample complexity for different structured covariance models and different inference tasks.

11.

Empirically Estimable Classification Bounds Based on a Nonparametric Divergence Measure.

Berisha, Visar; Wisler, Alan; Hero, Alfred O; Spanias, Andreas.

IEEE Trans Signal Process ; 64(3): 580-591, 2016 Feb 01.

Artículo en Inglés | MEDLINE | ID: mdl-26807014

RESUMEN

Information divergence functions play a critical role in statistics and information theory. In this paper we show that a non-parametric f-divergence measure can be used to provide improved bounds on the minimum binary classification probability of error for the case when the training and test data are drawn from the same distribution and for the case where there exists some mismatch between training and test distributions. We confirm the theoretical results by designing feature selection algorithms using the criteria from these bounds and by evaluating the algorithms on a series of pathological speech classification tasks.

12.

A Dictionary Approach to Electron Backscatter Diffraction Indexing.

Chen, Yu H; Park, Se Un; Wei, Dennis; Newstadt, Greg; Jackson, Michael A; Simmons, Jeff P; De Graef, Marc; Hero, Alfred O.

Microsc Microanal ; 21(3): 739-52, 2015 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-26055190

RESUMEN

We propose a framework for indexing of grain and subgrain structures in electron backscatter diffraction patterns of polycrystalline materials. We discretize the domain of a dynamical forward model onto a dense grid of orientations, producing a dictionary of patterns. For each measured pattern, we identify the most similar patterns in the dictionary, and identify boundaries, detect anomalies, and index crystal orientations. The statistical distribution of these closest matches is used in an unsupervised binary decision tree (DT) classifier to identify grain boundaries and anomalous regions. The DT classifies a pattern as an anomaly if it has an abnormally low similarity to any pattern in the dictionary. It classifies a pixel as being near a grain boundary if the highly ranked patterns in the dictionary differ significantly over the pixel's neighborhood. Indexing is accomplished by computing the mean orientation of the closest matches to each pattern. The mean orientation is estimated using a maximum likelihood approach that models the orientation distribution as a mixture of Von Mises-Fisher distributions over the quaternionic three sphere. The proposed dictionary matching approach permits segmentation, anomaly detection, and indexing to be performed in a unified manner with the additional benefit of uncertainty quantification.

13.

Identification and characterization of Hoxa9 binding sites in hematopoietic cells.

Huang, Yongsheng; Sitwala, Kajal; Bronstein, Joel; Sanders, Daniel; Dandekar, Monisha; Collins, Cailin; Robertson, Gordon; MacDonald, James; Cezard, Timothee; Bilenky, Misha; Thiessen, Nina; Zhao, Yongjun; Zeng, Thomas; Hirst, Martin; Hero, Alfred; Jones, Steven; Hess, Jay L.

Blood ; 119(2): 388-98, 2012 Jan 12.

Artículo en Inglés | MEDLINE | ID: mdl-22072553

RESUMEN

The clustered homeobox proteins play crucial roles in development, hematopoiesis, and leukemia, yet the targets they regulate and their mechanisms of action are poorly understood. Here, we identified the binding sites for Hoxa9 and the Hox cofactor Meis1 on a genome-wide level and profiled their associated epigenetic modifications and transcriptional targets. Hoxa9 and the Hox cofactor Meis1 cobind at hundreds of highly evolutionarily conserved sites, most of which are distant from transcription start sites. These sites show high levels of histone H3K4 monomethylation and CBP/P300 binding characteristic of enhancers. Furthermore, a subset of these sites shows enhancer activity in transient transfection assays. Many Hoxa9 and Meis1 binding sites are also bound by PU.1 and other lineage-restricted transcription factors previously implicated in establishment of myeloid enhancers. Conditional Hoxa9 activation is associated with CBP/P300 recruitment, histone acetylation, and transcriptional activation of a network of proto-oncogenes, including Erg, Flt3, Lmo2, Myb, and Sox4. Collectively, this work suggests that Hoxa9 regulates transcription by interacting with enhancers of genes important for hematopoiesis and leukemia.

Asunto(s)

Regulación Leucémica de la Expresión Génica , Hematopoyesis/fisiología , Proteínas de Homeodominio/genética , Proteínas de Homeodominio/metabolismo , Leucemia/genética , Acetilación , Animales , Sitios de Unión , Biomarcadores de Tumor/genética , Biomarcadores de Tumor/metabolismo , Western Blotting , Células de la Médula Ósea/metabolismo , Inmunoprecipitación de Cromatina , Elementos de Facilitación Genéticos , Epigenómica , Femenino , Perfilación de la Expresión Génica , Leucemia/metabolismo , Ratones , Ratones Endogámicos C57BL , Proteína 1 del Sitio de Integración Viral Ecotrópica Mieloide , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo , Análisis de Secuencia por Matrices de Oligonucleótidos , ARN Mensajero/genética , Reacción en Cadena en Tiempo Real de la Polimerasa , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Factores de Transcripción/genética , Factores de Transcripción/metabolismo

14.

Temporal dynamics of host molecular responses differentiate symptomatic and asymptomatic influenza a infection.

Huang, Yongsheng; Zaas, Aimee K; Rao, Arvind; Dobigeon, Nicolas; Woolf, Peter J; Veldman, Timothy; Øien, N Christine; McClain, Micah T; Varkey, Jay B; Nicholson, Bradley; Carin, Lawrence; Kingsmore, Stephen; Woods, Christopher W; Ginsburg, Geoffrey S; Hero, Alfred O.

PLoS Genet ; 7(8): e1002234, 2011 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-21901105

RESUMEN

Exposure to influenza viruses is necessary, but not sufficient, for healthy human hosts to develop symptomatic illness. The host response is an important determinant of disease progression. In order to delineate host molecular responses that differentiate symptomatic and asymptomatic Influenza A infection, we inoculated 17 healthy adults with live influenza (H3N2/Wisconsin) and examined changes in host peripheral blood gene expression at 16 timepoints over 132 hours. Here we present distinct transcriptional dynamics of host responses unique to asymptomatic and symptomatic infections. We show that symptomatic hosts invoke, simultaneously, multiple pattern recognition receptors-mediated antiviral and inflammatory responses that may relate to virus-induced oxidative stress. In contrast, asymptomatic subjects tightly regulate these responses and exhibit elevated expression of genes that function in antioxidant responses and cell-mediated responses. We reveal an ab initio molecular signature that strongly correlates to symptomatic clinical disease and biomarkers whose expression patterns best discriminate early from late phases of infection. Our results establish a temporal pattern of host molecular responses that differentiates symptomatic from asymptomatic infections and reveals an asymptomatic host-unique non-passive response signature, suggesting novel putative molecular targets for both prognostic assessment and ameliorative therapeutic intervention in seasonal and pandemic influenza.

Asunto(s)

Infecciones Asintomáticas , Interacciones Huésped-Patógeno , Subtipo H3N2 del Virus de la Influenza A , Gripe Humana/metabolismo , Adolescente , Adulto , Citocinas/biosíntesis , Citocinas/metabolismo , Perfilación de la Expresión Génica , Humanos , Gripe Humana/genética , Gripe Humana/virología , Persona de Mediana Edad , Estrés Oxidativo/genética , Proteínas Ribosómicas/genética , Proteínas Ribosómicas/metabolismo , Estrés Fisiológico

15.

Unsupervised Bayesian linear unmixing of gene expression microarrays.

Bazot, Cécile; Dobigeon, Nicolas; Tourneret, Jean-Yves; Zaas, Aimee K; Ginsburg, Geoffrey S; Hero, Alfred O.

BMC Bioinformatics ; 14: 99, 2013 Mar 19.

Artículo en Inglés | MEDLINE | ID: mdl-23506672

RESUMEN

BACKGROUND: This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. RESULTS: Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. CONCLUSIONS: The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica/métodos , Análisis por Micromatrices/métodos , Teorema de Bayes , Humanos , Subtipo H3N2 del Virus de la Influenza A , Gripe Humana/genética , Gripe Humana/metabolismo , Masculino

16.

Ensemble estimators for multivariate entropy estimation.

Sricharan, Kumar; Wei, Dennis; Hero, Alfred O.

IEEE Trans Inf Theory ; 59(7): 4374-4388, 2013 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-25897177

RESUMEN

The problem of estimation of density functionals like entropy and mutual information has received much attention in the statistics and information theory communities. A large class of estimators of functionals of the probability density suffer from the curse of dimensionality, wherein the mean squared error (MSE) decays increasingly slowly as a function of the sample size T as the dimension d of the samples increases. In particular, the rate is often glacially slow of order O(T-Î³/d ), where Î³ > 0 is a rate parameter. Examples of such estimators include kernel density estimators, k-nearest neighbor (k-NN) density estimators, k-NN entropy estimators, intrinsic dimension estimators and other examples. In this paper, we propose a weighted affine combination of an ensemble of such estimators, where optimal weights can be chosen such that the weighted estimator converges at a much faster dimension invariant rate of O(T-1). Furthermore, we show that these optimal weights can be determined by solving a convex optimization problem which can be performed offline and does not require training data. We illustrate the superior performance of our weighted estimator for two important applications: (i) estimating the Panter-Dite distortion-rate factor and (ii) estimating the Shannon entropy for testing the probability distribution of a random sample.

17.

Hierarchical network models for exchangeable structured interaction processes.

Dempsey, Walter; Oselio, Brandon; Hero, Alfred.

J Am Stat Assoc ; 117(540): 2056-2073, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36908312

RESUMEN

Network data often arises via a series of structured interactions among a population of constituent elements. E-mail exchanges, for example, have a single sender followed by potentially multiple receivers. Scientific articles, on the other hand, may have multiple subject areas and multiple authors. We introduce a statistical model, termed the Pitman-Yor hierarchical vertex components model (PY-HVCM), that is well suited for structured interaction data. The proposed PY-HVCM effectively models complex relational data by partial pooling of local information via a latent, shared population-level distribution. The PY-HCVM is a canonical example of hierarchical vertex components models - a subfamily of models for exchangeable structured interaction-labeled networks, i.e., networks invariant to interaction relabeling. Theoretical analysis and supporting simulations provide clear model interpretation, and establish global sparsity and power law degree distribution. A computationally tractable Gibbs sampling algorithm is derived for inferring sparsity and power law properties of complex networks. We demonstrate the model on both the Enron e-mail dataset and an ArXiv dataset, showing goodness of fit of the model via posterior predictive validation.

18.

Recurrent neural networks enable design of multifunctional synthetic human gut microbiome dynamics.

Baranwal, Mayank; Clark, Ryan L; Thompson, Jaron; Sun, Zeyu; Hero, Alfred O; Venturelli, Ophelia S.

Elife ; 112022 06 23.

Artículo en Inglés | MEDLINE | ID: mdl-35736613

RESUMEN

Predicting the dynamics and functions of microbiomes constructed from the bottom-up is a key challenge in exploiting them to our benefit. Current models based on ecological theory fail to capture complex community behaviors due to higher order interactions, do not scale well with increasing complexity and in considering multiple functions. We develop and apply a long short-term memory (LSTM) framework to advance our understanding of community assembly and health-relevant metabolite production using a synthetic human gut community. A mainstay of recurrent neural networks, the LSTM learns a high dimensional data-driven non-linear dynamical system model. We show that the LSTM model can outperform the widely used generalized Lotka-Volterra model based on ecological theory. We build methods to decipher microbe-microbe and microbe-metabolite interactions from an otherwise black-box model. These methods highlight that Actinobacteria, Firmicutes and Proteobacteria are significant drivers of metabolite production whereas Bacteroides shape community dynamics. We use the LSTM model to navigate a large multidimensional functional landscape to design communities with unique health-relevant metabolite profiles and temporal behaviors. In sum, the accuracy of the LSTM model can be exploited for experimental planning and to guide the design of synthetic microbiomes with target dynamic functions.

Asunto(s)

Microbioma Gastrointestinal , Microbiota , Bacterias , Humanos , Interacciones Microbianas , Redes Neurales de la Computación

19.

Pre-exposure cognitive performance variability is associated with severity of respiratory infection.

Zhai, Yaya; Doraiswamy, P Murali; Woods, Christopher W; Turner, Ronald B; Burke, Thomas W; Ginsburg, Geoffrey S; Hero, Alfred O.

Sci Rep ; 12(1): 22589, 2022 12 30.

Artículo en Inglés | MEDLINE | ID: mdl-36585416

RESUMEN

Using data from a longitudinal viral challenge study, we find that the post-exposure viral shedding and symptom severity are associated with a novel measure of pre-exposure cognitive performance variability (CPV), defined before viral exposure occurs. Each individual's CPV score is computed from data collected from a repeated NeuroCognitive Performance Test (NCPT) over a 3 day pre-exposure period. Of the 18 NCPT measures reported by the tests, 6 contribute materially to the CPV score, prospectively differentiating the high from the low shedders. Among these 6 are the 4 clinical measures digSym-time, digSym-correct, trail-time, and reaction-time, commonly used for assessing cognitive executive functioning. CPV is found to be correlated with stress and also with several genes previously reported to be associated with cognitive development and dysfunction. A perturbation study over the number and timing of NCPT sessions indicates that as few as 5 sessions is sufficient to maintain high association between the CPV score and viral shedding, as long as the timing of these sessions is balanced over the three pre-exposure days. Our results suggest that variations in cognitive function are closely related to immunity and susceptibility to severe infection. Further studying these relationships may help us better understand the links between neurocognitive and neuroimmune systems which is timely in this COVID-19 pandemic era.

Asunto(s)

COVID-19 , Infecciones del Sistema Respiratorio , Humanos , Pandemias , Cognición , Tiempo de Reacción

20.

Penalized ensemble Kalman filters for high dimensional non-linear systems.

Hou, Elizabeth; Lawrence, Earl; Hero, Alfred O.

PLoS One ; 16(3): e0248046, 2021.

Artículo en Inglés | MEDLINE | ID: mdl-33735201

RESUMEN

The ensemble Kalman filter (EnKF) is a data assimilation technique that uses an ensemble of models, updated with data, to track the time evolution of a usually non-linear system. It does so by using an empirical approximation to the well-known Kalman filter. However, its performance can suffer when the ensemble size is smaller than the state space, as is often necessary for computationally burdensome models. This scenario means that the empirical estimate of the state covariance is not full rank and possibly quite noisy. To solve this problem in this high dimensional regime, we propose a computationally fast and easy to implement algorithm called the penalized ensemble Kalman filter (PEnKF). Under certain conditions, it can be theoretically proven that the PEnKF will be accurate (the estimation error will converge to zero) despite having fewer ensemble members than state dimensions. Further, as contrasted to localization methods, the proposed approach learns the covariance structure associated with the dynamical system. These theoretical results are supported with simulations of several non-linear and high dimensional systems.

Asunto(s)

Modelos Teóricos , Dinámicas no Lineales , Algoritmos

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA