Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 178
Filtrar
1.
Front Bioinform ; 4: 1280971, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38812660

RESUMO

Radiation exposure poses a significant threat to human health. Emerging research indicates that even low-dose radiation once believed to be safe, may have harmful effects. This perception has spurred a growing interest in investigating the potential risks associated with low-dose radiation exposure across various scenarios. To comprehensively explore the health consequences of low-dose radiation, our study employs a robust statistical framework that examines whether specific groups of genes, belonging to known pathways, exhibit coordinated expression patterns that align with the radiation levels. Notably, our findings reveal the existence of intricate yet consistent signatures that reflect the molecular response to radiation exposure, distinguishing between low-dose and high-dose radiation. Moreover, we leverage a pathway-constrained variational autoencoder to capture the nonlinear interactions within gene expression data. By comparing these two analytical approaches, our study aims to gain valuable insights into the impact of low-dose radiation on gene expression patterns, identify pathways that are differentially affected, and harness the potential of machine learning to uncover hidden activity within biological networks. This comparative analysis contributes to a deeper understanding of the molecular consequences of low-dose radiation exposure.

2.
Patterns (N Y) ; 4(11): 100863, 2023 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-38035192

RESUMO

Significant acceleration of the future discovery of novel functional materials requires a fundamental shift from the current materials discovery practice, which is heavily dependent on trial-and-error campaigns and high-throughput screening, to one that builds on knowledge-driven advanced informatics techniques enabled by the latest advances in signal processing and machine learning. In this review, we discuss the major research issues that need to be addressed to expedite this transformation along with the salient challenges involved. We especially focus on Bayesian signal processing and machine learning schemes that are uncertainty aware and physics informed for knowledge-driven learning, robust optimization, and efficient objective-driven experimental design.

3.
Patterns (N Y) ; 4(11): 100875, 2023 Nov 10.
Artigo em Inglês | MEDLINE | ID: mdl-38035191

RESUMO

The need for efficient computational screening of molecular candidates that possess desired properties frequently arises in various scientific and engineering problems, including drug discovery and materials design. However, the enormous search space containing the candidates and the substantial computational cost of high-fidelity property prediction models make screening practically challenging. In this work, we propose a general framework for constructing and optimizing a high-throughput virtual screening (HTVS) pipeline that consists of multi-fidelity models. The central idea is to optimally allocate the computational resources to models with varying costs and accuracy to optimize the return on computational investment. Based on both simulated and real-world data, we demonstrate that the proposed optimal HTVS framework can significantly accelerate virtual screening without any degradation in terms of accuracy. Furthermore, it enables an adaptive operational strategy for HTVS, where one can trade accuracy for efficiency.

4.
J Comput Biol ; 30(7): 751-765, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-36961389

RESUMO

TRIMER, Transcription Regulation Integrated with MEtabolic Regulation, is a genome-scale modeling pipeline targeting at metabolic engineering applications. Using TRIMER, regulated metabolic reactions can be effectively predicted by integrative modeling of metabolic reactions with a transcription factor-gene regulatory network (TRN), which is modeled through a Bayesian network (BN). In this article, we focus on sensitivity analysis of metabolic flux prediction for uncertainty quantification of BN structures for TRN modeling in TRIMER. We propose a computational strategy to construct the uncertainty class of TRN models based on the inferred regulatory order uncertainty given transcriptomic expression data. With that, we analyze the prediction sensitivity of the TRIMER pipeline for the metabolite yields of interest. The obtained sensitivity analyses can guide optimal experimental design (OED) to help acquire new data that can enhance TRN modeling and achieve specific metabolic engineering objectives, including metabolite yield alterations. We have performed small- and large-scale simulated experiments, demonstrating the effectiveness of our developed sensitivity analysis strategy for BN structure learning to quantify the edge importance in terms of metabolic flux prediction uncertainty reduction and its potential to effectively guide OED.


Assuntos
Redes e Vias Metabólicas , Modelos Biológicos , Teorema de Bayes , Redes e Vias Metabólicas/genética , Redes Reguladoras de Genes , Análise do Fluxo Metabólico
5.
iScience ; 25(9): 104951, 2022 Sep 16.
Artigo em Inglês | MEDLINE | ID: mdl-36093045

RESUMO

We developed a computational approach to find the best intervention to achieve transcription factor (TF) mediated transdifferentiation. We construct probabilistic Boolean networks (PBNs) from single-cell RNA sequencing data of two different cell states to model hematopoietic transcription factors cross-talk. This was achieved by a "sampled network" approach, which enabled us to construct large networks. The interventions to induce transdifferentiation consisted of permanently activating or deactivating each of the TFs and determining the probability mass transfer of steady-state probabilities from the departure to the destination cell type or state. Our findings support the common assumption that TFs that are differentially expressed between the two cell types are the best intervention points to achieve transdifferentiation. TFs whose interventions are found to transdifferentiate progenitor B cells into monocytes include EBF1 down-regulation, CEBPB up-regulation, TCF3 down-regulation, and STAT3 up-regulation.

6.
Patterns (N Y) ; 3(3): 100428, 2022 Mar 11.
Artigo em Inglês | MEDLINE | ID: mdl-35510184

RESUMO

Classification has been a major task for building intelligent systems because it enables decision-making under uncertainty. Classifier design aims at building models from training data for representing feature-label distributions-either explicitly or implicitly. In many scientific or clinical settings, training data are typically limited, which impedes the design and evaluation of accurate classifiers. Atlhough transfer learning can improve the learning in target domains by incorporating data from relevant source domains, it has received little attention for performance assessment, notably in error estimation. Here, we investigate knowledge transferability in the context of classification error estimation within a Bayesian paradigm. We introduce a class of Bayesian minimum mean-square error estimators for optimal Bayesian transfer learning, which enables rigorous evaluation of classification error under uncertainty in small-sample settings. Using Monte Carlo importance sampling, we illustrate the outstanding performance of the proposed estimator for a broad family of classifiers that span diverse learning capabilities.

7.
Data Brief ; 42: 108113, 2022 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-35434232

RESUMO

Transfer learning (TL) techniques can enable effective learning in data scarce domains by allowing one to re-purpose data or scientific knowledge available in relevant source domains for predictive tasks in a target domain of interest. In this Data in Brief article, we present a synthetic dataset for binary classification in the context of Bayesian transfer learning, which can be used for the design and evaluation of TL-based classifiers. For this purpose, we consider numerous combinations of classification settings, based on which we simulate a diverse set of feature-label distributions with varying learning complexity. For each set of model parameters, we provide a pair of target and source datasets that have been jointly sampled from the underlying feature-label distributions in the target and source domains, respectively. For both target and source domains, the data in a given class and domain are normally distributed, where the distributions across domains are related to each other through a joint prior. To ensure the consistency of the classification complexity across the provided datasets, we have controlled the Bayes error such that it is maintained within a range of predefined values that mimic realistic classification scenarios across different relatedness levels. The provided datasets may serve as useful resources for designing and benchmarking transfer learning schemes for binary classification as well as the estimation of classification error.

8.
STAR Protoc ; 3(1): 101184, 2022 03 18.
Artigo em Inglês | MEDLINE | ID: mdl-35243375

RESUMO

This protocol explains the pipeline for condition-dependent metabolite yield prediction using Transcription Regulation Integrated with MEtabolic Regulation (TRIMER). TRIMER targets metabolic engineering applications via a hybrid model integrating transcription factor (TF)-gene regulatory network (TRN) with a Bayesian network (BN) inferred from transcriptomic expression data to effectively regulate metabolic reactions. For E. coli and yeast, TRIMER achieves reliable knockout phenotype and flux predictions from the deletion of one or more TFs at the genome scale. For complete details on the use and execution of this protocol, please refer to Niu et al. (2021).


Assuntos
Escherichia coli , Redes Reguladoras de Genes , Teorema de Bayes , Escherichia coli/genética , Regulação da Expressão Gênica , Saccharomyces cerevisiae/genética , Fatores de Transcrição/genética
9.
Artigo em Inglês | MEDLINE | ID: mdl-32750876

RESUMO

A key objective of studying biological systems is to design therapeutic intervention strategies for beneficially altering cell dynamics. Derivation of control policies is hindered by the high-dimensional state spaces associated with gene regulatory networks. Hence, it is critical to reduce the network complexity and the paper aims to address this issue by focusing on the distribution of the canalizing power (CP) of the genes in the model. Canalizing genes enforce broad corrective actions on cellular processes and play a crucial role in producing optimal reactions to external stimuli. Therefore, it is critical to reduce the network while preserving the canalizing power of genes. We reduce Boolean networks with perturbation by removing genes with the smallest canalizing power consecutively, and evaluate the stability of canalizing power. A systematic empirical study demonstrates that there are two classes of networks, reducible and irreducible with respect to the preservation of canalizing power of the genes. Based on these observations, we introduce the definition of reducible networks and proceed with the problem of selecting the relevant network features that allow for discriminating networks from the two different classes. We demonstrate the efficacy of the selected features on synthetic and real gene regulatory networks.


Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , Redes Reguladoras de Genes/genética
10.
iScience ; 24(11): 103218, 2021 Nov 19.
Artigo em Inglês | MEDLINE | ID: mdl-34761179

RESUMO

There has been extensive research in predictive modeling of genome-scale metabolic reaction networks. Living systems involve complex stochastic processes arising from interactions among different biomolecules. For more accurate and robust prediction of target metabolic behavior under different conditions, not only metabolic reactions but also the genetic regulatory relationships involving transcription factors (TFs) affecting these metabolic reactions should be modeled. We have developed a modeling and simulation pipeline enabling the analysis of Transcription Regulation Integrated with Metabolic Regulation: TRIMER. TRIMER utilizes a Bayesian network (BN) inferred from transcriptomes to model the transcription factor regulatory network. TRIMER then infers the probabilities of the gene states relevant to the metabolism of interest, and predicts the metabolic fluxes and their changes that result from the deletion of one or more transcription factors at the genome scale. We demonstrate TRIMER's applicability to both simulated and experimental data and provide performance comparison with other existing approaches.

11.
Bioinformatics ; 37(19): 3212-3219, 2021 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-33822889

RESUMO

MOTIVATION: When learning to subtype complex disease based on next-generation sequencing data, the amount of available data is often limited. Recent works have tried to leverage data from other domains to design better predictors in the target domain of interest with varying degrees of success. But they are either limited to the cases requiring the outcome label correspondence across domains or cannot leverage the label information at all. Moreover, the existing methods cannot usually benefit from other information available a priori such as gene interaction networks. RESULTS: In this article, we develop a generative optimal Bayesian supervised domain adaptation (OBSDA) model that can integrate RNA sequencing (RNA-Seq) data from different domains along with their labels for improving prediction accuracy in the target domain. Our model can be applied in cases where different domains share the same labels or have different ones. OBSDA is based on a hierarchical Bayesian negative binomial model with parameter factorization, for which the optimal predictor can be derived by marginalization of likelihood over the posterior of the parameters. We first provide an efficient Gibbs sampler for parameter inference in OBSDA. Then, we leverage the gene-gene network prior information and construct an informed and flexible variational family to infer the posterior distributions of model parameters. Comprehensive experiments on real-world RNA-Seq data demonstrate the superior performance of OBSDA, in terms of accuracy in identifying cancer subtypes by utilizing data from different domains. Moreover, we show that by taking advantage of the prior network information we can further improve the performance. AVAILABILITY AND IMPLEMENTATION: The source code for implementations of OBSDA and SI-OBSDA are available at the following link. https://github.com/SHBLK/BSDA. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Artigo em Inglês | MEDLINE | ID: mdl-31180899

RESUMO

There is often a limited amount of omics data to design predictive models in biomedicine. Knowing that these omics data come from underlying processes that may share common pathways and disease mechanisms, it may be beneficial for designing a more accurate and reliable predictor in a target domain of interest, where there is a lack of labeled data to leverage available data in relevant source domains. Here, we focus on developing Bayesian transfer learning methods for analyzing next-generation sequencing (NGS) data to help improve predictions in the target domain. We formulate transfer learning in a fully Bayesian framework and define the relatedness by a joint prior distribution of the model parameters of the source and target domains. Defining joint priors acts as a bridge across domains, through which the related knowledge of source data is transferred to the target domain. We focus on RNA-seq discrete count data, which are often overdispersed. To appropriately model them, we consider the Negative Binomial model and propose an Optimal Bayesian Transfer Learning (OBTL) classifier that minimizes the expected classification error in the target domain. We evaluate the performance of the OBTL classifier via both synthetic and cancer data from The Cancer Genome Atlas (TCGA).


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Aprendizado de Máquina , Bases de Dados Genéticas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Estatísticos , Neoplasias/genética , Neoplasias/metabolismo
13.
BMC Bioinformatics ; 20(Suppl 12): 321, 2019 Jun 20.
Artigo em Inglês | MEDLINE | ID: mdl-31216989

RESUMO

BACKGROUND: Missing values frequently arise in modern biomedical studies due to various reasons, including missing tests or complex profiling technologies for different omics measurements. Missing values can complicate the application of clustering algorithms, whose goals are to group points based on some similarity criterion. A common practice for dealing with missing values in the context of clustering is to first impute the missing values, and then apply the clustering algorithm on the completed data. RESULTS: We consider missing values in the context of optimal clustering, which finds an optimal clustering operator with reference to an underlying random labeled point process (RLPP). We show how the missing-value problem fits neatly into the overall framework of optimal clustering by incorporating the missing value mechanism into the random labeled point process and then marginalizing out the missing-value process. In particular, we demonstrate the proposed framework for the Gaussian model with arbitrary covariance structures. Comprehensive experimental studies on both synthetic and real-world RNA-seq data show the superior performance of the proposed optimal clustering with missing values when compared to various clustering approaches. CONCLUSION: Optimal clustering with missing values obviates the need for imputation-based pre-processing of the data, while at the same time possessing smaller clustering errors.


Assuntos
Algoritmos , Neoplasias da Mama/genética , Análise por Conglomerados , Simulação por Computador , Feminino , Perfilação da Expressão Gênica , Humanos , Modelos Teóricos , Distribuição Normal , Probabilidade
14.
BMC Genomics ; 20(Suppl 6): 435, 2019 Jun 13.
Artigo em Inglês | MEDLINE | ID: mdl-31189480

RESUMO

BACKGROUND: Single-cell gene expression measurements offer opportunities in deriving mechanistic understanding of complex diseases, including cancer. However, due to the complex regulatory machinery of the cell, gene regulatory network (GRN) model inference based on such data still manifests significant uncertainty. RESULTS: The goal of this paper is to develop optimal classification of single-cell trajectories accounting for potential model uncertainty. Partially-observed Boolean dynamical systems (POBDS) are used for modeling gene regulatory networks observed through noisy gene-expression data. We derive the exact optimal Bayesian classifier (OBC) for binary classification of single-cell trajectories. The application of the OBC becomes impractical for large GRNs, due to computational and memory requirements. To address this, we introduce a particle-based single-cell classification method that is highly scalable for large GRNs with much lower complexity than the optimal solution. CONCLUSION: The performance of the proposed particle-based method is demonstrated through numerical experiments using a POBDS model of the well-known T-cell large granular lymphocyte (T-LGL) leukemia network with noisy time-series gene-expression data.


Assuntos
Algoritmos , Teorema de Bayes , Biologia Computacional/métodos , Redes Reguladoras de Genes , Leucemia Linfocítica Granular Grande/genética , Análise de Célula Única/métodos , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Modelos Genéticos , Incerteza
15.
Curr Genomics ; 20(1): 16-23, 2019 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-31015788

RESUMO

INTRODUCTION: The most basic aspect of modern engineering is the design of operators to act on physical systems in an optimal manner relative to a desired objective - for instance, designing a con-trol policy to autonomously direct a system or designing a classifier to make decisions regarding the sys-tem. These kinds of problems appear in biomedical science, where physical models are created with the intention of using them to design tools for diagnosis, prognosis, and therapy. METHODS: In the classical paradigm, our knowledge regarding the model is certain; however, in practice, especially with complex systems, our knowledge is uncertain and operators must be designed while tak-ing this uncertainty into account. The related concepts of intrinsically Bayesian robust operators and op-timal Bayesian operators treat operator design under uncertainty. An objective-based experimental de-sign procedure is naturally related to operator design: We would like to perform an experiment that max-imally reduces our uncertainty as it pertains to our objective. RESULTS & DISCUSSION: This paper provides a nonmathematical review of optimal Bayesian operators directed at biomedical scientists. It considers two applications important to genomics, structural interven-tion in gene regulatory networks and classification. CONCLUSION: The salient point regarding intrinsically Bayesian operators is that uncertainty is quantified relative to the scientific model, and the prior distribution is on the parameters of this model. Optimization has direct physical (biological) meaning. This is opposed to the common method of placing prior distri-butions on the parameters of the operator, in which case there is a scientific gap between operator design and the phenomena.

16.
Artigo em Inglês | MEDLINE | ID: mdl-29053466

RESUMO

This paper studies classification of gene-expression trajectories coming from two classes, healthy and mutated (cancerous) using Boolean networks with perturbation (BNps) to model the dynamics of each class at the state level. Each class has its own BNp, which is partially known based on gene pathways. We employ a Gaussian model at the observation level to show the expression values of the genes given the hidden binary states at each time point. We use expectation maximization (EM) to learn the BNps and the unknown model parameters, derive closed-form updates for the parameters, and propose a learning algorithm. After learning, a plug-in Bayes classifier is used to classify unlabeled trajectories, which can have missing data. Measuring gene expressions at different times yields trajectories only when measurements come from a single cell. In multiple-cell scenarios, the expression values are averages over many cells with possibly different states. Via the central-limit theorem, we propose another model for expression data in multiple-cell scenarios. Simulations demonstrate that single-cell trajectory data can outperform multiple-cell average expression data relative to classification error, especially in high-noise situations. We also consider data generated via a mammalian cell-cycle network, both the wild-type and with a common mutation affecting p27.


Assuntos
Perfilação da Expressão Gênica/métodos , Redes Reguladoras de Genes/genética , Análise de Célula Única/métodos , Algoritmos , Animais , Teorema de Bayes , Humanos , Modelos Genéticos , Modelos Estatísticos , Neoplasias/genética , Neoplasias/metabolismo
17.
Artigo em Inglês | MEDLINE | ID: mdl-29990066

RESUMO

Gene-expression-based classification and regression are major concerns in translational genomics. If the feature-label distribution is known, then an optimal classifier can be derived. If the predictor-target distribution is known, then an optimal regression function can be derived. In practice, neither is known, data must be employed, and, for small samples, prior knowledge concerning the feature-label or predictor-target distribution can be used in the learning process. Optimal Bayesian classification and optimal Bayesian regression provide optimality under uncertainty. With optimal Bayesian classification (or regression), uncertainty is treated directly on the feature-label (or predictor-target) distribution. The fundamental engineering problem is prior construction. The Regularized Expected Mean Log-Likelihood Prior (REMLP) utilizes pathway information and provides viable priors for the feature-label distribution, assuming that the training data contain labels. In practice, the labels may not be observed. This paper extends the REMLP methodology to a Gaussian mixture model (GMM) when the labels are unknown. Prior construction bundled with prior update via Bayesian sampling results in Monte Carlo approximations to the optimal Bayesian regression function and optimal Bayesian classifier. Simulations demonstrate that the GMM REMLP prior yields better performance than the EM algorithm for small data sets. We apply it to phenotype classification when the prior knowledge consists of colon cancer pathways.


Assuntos
Perfilação da Expressão Gênica/métodos , Genômica/métodos , Modelos Estatísticos , Algoritmos , Teorema de Bayes , Neoplasias do Colo/genética , Bases de Dados Genéticas , Humanos , Distribuição Normal
18.
Bioinformatics ; 35(4): 643-649, 2019 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-30052771

RESUMO

MOTIVATION: Canalizing genes enforce broad corrective actions on cellular processes for the purpose of biological robustness maintaining a constant phenotype to remain unchanged in spite of genetic mutations or environmental perturbations. Despite their central role in biological systems, the observation/detection of canalizing genes is often impeded because the behavior of affected genes is highly varied relative to the inactive canalizer. Therefore, the activity of canalizing genes is difficult to predict to any significant degree by their subject genes under normal cell conditions. RESULTS: We investigate this question and present a quantitative framework that allows for the estimation of the power of canalizing genes in the context of Boolean Networks (BNs) with perturbation. This framework borrows tools from the Pattern Recognition theory and uses the coefficient of determination (CoD) to capture the capacity of the canalizing genes. The canalizing power (CP) of a gene is quantitatively characterized by two terms: regulation power (RP) and incapacitating power (IP). We base this assumption on the idea that canalizing power of a gene should be quantified by the extent of its regulation on the overall network and the extent of control that the gene takes over from other master genes when it is activated, which is equivalent to reduction of the control of other master genes upon its activation. Following this, the CP concept is illustrated with examples in which the goal is to provide preliminary evidence that CP can be used to characterize the ability of canalizing genes. AVAILABILITY AND IMPLEMENTATION: A library of functions written in MATLAB for computing CP is available at http://github.com/eunjikim-angie/CanalizingPower.


Assuntos
Redes Reguladoras de Genes , Modelos Genéticos , Biologia Computacional
19.
BMC Syst Biol ; 12(Suppl 8): 137, 2018 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-30577732

RESUMO

BACKGROUND: A fundamental problem for translational genomics is to find optimal therapies based on gene regulatory intervention. Dynamic intervention involves a control policy that optimally reduces a cost function based on phenotype by externally altering the state of the network over time. When a gene regulatory network (GRN) model is fully known, the problem is addressed using classical dynamic programming based on the Markov chain associated with the network. When the network is uncertain, a Bayesian framework can be applied, where policy optimality is with respect to both the dynamical objective and the uncertainty, as characterized by a prior distribution. In the presence of uncertainty, it is of great practical interest to develop an experimental design strategy and thereby select experiments that optimally reduce a measure of uncertainty. RESULTS: In this paper, we employ mean objective cost of uncertainty (MOCU), which quantifies uncertainty based on the degree to which uncertainty degrades the operational objective, that being the cost owing to undesirable phenotypes. We assume that a number of conditional probabilities characterizing regulatory relationships among genes are unknown in the Markovian GRN. In sum, there is a prior distribution which can be updated to a posterior distribution by observing a regulatory trajectory, and an optimal control policy, known as an "intrinsically Bayesian robust" (IBR) policy. To obtain a better IBR policy, we select an experiment that minimizes the MOCU remaining after applying its output to the network. At this point, we can either stop and find the resulting IBR policy or proceed to determine more unknown conditional probabilities via regulatory observation and find the IBR policy from the resulting posterior distribution. For sequential experimental design this entire process is iterated. Owing to the computational complexity of experimental design, which requires computation of many potential IBR policies, we implement an approximate method utilizing mean first passage times (MFPTs) - but only in experimental design, the final policy being an IBR policy. CONCLUSIONS: Comprehensive performance analysis based on extensive simulations on synthetic and real GRNs demonstrate the efficacy of the proposed method, including the accuracy and computational advantage of the approximate MFPT-based design.


Assuntos
Biologia Computacional/métodos , Redes Reguladoras de Genes , Cadeias de Markov , Animais , Ciclo Celular/genética , Mutação , Proteína Supressora de Tumor p53/metabolismo , Incerteza
20.
PLoS One ; 13(10): e0204627, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30278063

RESUMO

Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.


Assuntos
Análise por Conglomerados , Reconhecimento Automatizado de Padrão/métodos , Algoritmos , Teorema de Bayes , Simulação por Computador , Probabilidade , Incerteza
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA