Pesquisa | BVS IEC

1.

Bickel, Peter J; Kur, Gil; Nadler, Boaz.

Proc Natl Acad Sci U S A ; 115(37): 9151-9156, 2018 09 11.

Artigo em Inglês | MEDLINE | ID: mdl-30150379

RESUMO

Projection pursuit is a classical exploratory data analysis method to detect interesting low-dimensional structures in multivariate data. Originally, projection pursuit was applied mostly to data of moderately low dimension. Motivated by contemporary applications, we here study its properties in high-dimensional settings. Specifically, we analyze the asymptotic properties of projection pursuit on structureless multivariate Gaussian data with an identity covariance, as both dimension p and sample size n tend to infinity, with [Formula: see text] Our main results are that (i) if [Formula: see text] then there exist projections whose corresponding empirical cumulative distribution function can approximate any arbitrary distribution; and (ii) if [Formula: see text], not all limiting distributions are possible. However, depending on the value of Î³, various non-Gaussian distributions may still be approximated. In contrast, if we restrict to sparse projections, involving only a few of the p variables, then asymptotically all empirical cumulative distribution functions are Gaussian. And (iii) if [Formula: see text], then asymptotically all projections are Gaussian. Some of these results extend to mean-centered sub-Gaussian data and to projections into k dimensions. Hence, in the "small n, large p" setting, unless sparsity is enforced, and regardless of the chosen projection index, projection pursuit may detect an apparent structure that has no statistical significance. Furthermore, our work reveals fundamental limitations on the ability to detect non-Gaussian signals in high-dimensional data, in particular through independent component analysis and related non-Gaussian component analysis.

2.

Ranking and combining multiple predictors without labeled data.

Parisi, Fabio; Strino, Francesco; Nadler, Boaz; Kluger, Yuval.

Proc Natl Acad Sci U S A ; 111(4): 1253-8, 2014 Jan 28.

Artigo em Inglês | MEDLINE | ID: mdl-24474744

RESUMO

In a broad range of classification and decision-making problems, one is given the advice or predictions of several classifiers, of unknown reliability, over multiple questions or queries. This scenario is different from the standard supervised setting, where each classifier's accuracy can be assessed using available labeled data, and raises two questions: Given only the predictions of several classifiers over a large set of unlabeled test data, is it possible to (i) reliably rank them and (ii) construct a metaclassifier more accurate than most classifiers in the ensemble? Here we present a spectral approach to address these questions. First, assuming conditional independence between classifiers, we show that the off-diagonal entries of their covariance matrix correspond to a rank-one matrix. Moreover, the classifiers can be ranked using the leading eigenvector of this covariance matrix, because its entries are proportional to their balanced accuracies. Second, via a linear approximation to the maximum likelihood estimator, we derive the Spectral Meta-Learner (SML), an unsupervised ensemble classifier whose weights are equal to these eigenvector entries. On both simulated and real data, SML typically achieves a higher accuracy than most classifiers in the ensemble and can provide a better starting point than majority voting for estimating the maximum likelihood solution. Furthermore, SML is robust to the presence of small malicious groups of classifiers designed to veer the ensemble prediction away from the (unknown) ground truth.

Assuntos

Funções Verossimilhança , Modelos Teóricos

3.

Direct phase retrieval in double blind Fourier holography.

Raz, Oren; Leshem, Ben; Miao, Jianwei; Nadler, Boaz; Oron, Dan; Dudovich, Nirit.

Opt Express ; 22(21): 24935-50, 2014 Oct 20.

Artigo em Inglês | MEDLINE | ID: mdl-25401527

RESUMO

Phase measurement is a long-standing challenge in a wide range of applications, from X-ray imaging to astrophysics and spectroscopy. While in some scenarios the phase is resolved by an interferometric measurement, in others it is reconstructed via numerical optimization, based on some a-priori knowledge about the signal. The latter commonly use iterative algorithms, and thus have to deal with their convergence, stagnation, and robustness to noise. Here we combine these two approaches and present a new scheme, termed double blind Fourier holography, providing an efficient solution to the phase problem in two dimensions, by solving a system of linear equations. We present and experimentally demonstrate our approach for the case of lens-less imaging.

Assuntos

Análise de Fourier , Holografia/métodos , Processamento de Imagem Assistida por Computador , Lentes

4.

MINIMAX BOUNDS FOR SPARSE PCA WITH NOISY HIGH-DIMENSIONAL DATA.

Birnbaum, Aharon; Johnstone, Iain M; Nadler, Boaz; Paul, Debashis.

Ann Stat ; 41(3): 1055-1084, 2013 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-25324581

RESUMO

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the l2 loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.

5.

Spectral top-down recovery of latent tree models.

Aizenbud, Yariv; Jaffe, Ariel; Wang, Meng; Hu, Amber; Amsel, Noah; Nadler, Boaz; Chang, Joseph T; Kluger, Yuval.

Inf inference ; 12(3): iaad032, 2023 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-37593361

RESUMO

Modeling the distribution of high-dimensional data by a latent tree graphical model is a prevalent approach in multiple scientific domains. A common task is to infer the underlying tree structure, given only observations of its terminal nodes. Many algorithms for tree recovery are computationally intensive, which limits their applicability to trees of moderate size. For large trees, a common approach, termed divide-and-conquer, is to recover the tree structure in two steps. First, separately recover the structure of multiple, possibly random subsets of the terminal nodes. Second, merge the resulting subtrees to form a full tree. Here, we develop spectral top-down recovery (STDR), a deterministic divide-and-conquer approach to infer large latent tree models. Unlike previous methods, STDR partitions the terminal nodes in a non random way, based on the Fiedler vector of a suitable Laplacian matrix related to the observed nodes. We prove that under certain conditions, this partitioning is consistent with the tree structure. This, in turn, leads to a significantly simpler merging procedure of the small subtrees. We prove that STDR is statistically consistent and bound the number of samples required to accurately recover the tree with high probability. Using simulated data from several common tree models in phylogenetics, we demonstrate that STDR has a significant advantage in terms of runtime, with improved or similar accuracy.

6.

Zero-preserving imputation of single-cell RNA-seq data.

Linderman, George C; Zhao, Jun; Roulis, Manolis; Bielecki, Piotr; Flavell, Richard A; Nadler, Boaz; Kluger, Yuval.

Nat Commun ; 13(1): 192, 2022 01 11.

Artigo em Inglês | MEDLINE | ID: mdl-35017482

RESUMO

A key challenge in analyzing single cell RNA-sequencing data is the large number of false zeros, where genes actually expressed in a given cell are incorrectly measured as unexpressed. We present a method based on low-rank matrix approximation which imputes these values while preserving biologically non-expressed genes (true biological zeros) at zero expression levels. We provide theoretical justification for this denoising approach and demonstrate its advantages relative to other methods on simulated and biological datasets.

Assuntos

Algoritmos , RNA/genética , Análise de Sequência de RNA/estatística & dados numéricos , Animais , Linfócitos B/citologia , Linfócitos B/metabolismo , Brônquios/citologia , Brônquios/metabolismo , Conjuntos de Dados como Assunto , Células Epiteliais/citologia , Células Epiteliais/metabolismo , Humanos , Células Matadoras Naturais/citologia , Células Matadoras Naturais/metabolismo , Camundongos , Monócitos/citologia , Monócitos/metabolismo , Cultura Primária de Células , RNA/metabolismo , RNA-Seq , Análise de Célula Única , Linfócitos T/citologia , Linfócitos T/metabolismo

7.

Global features of neural activity in the olfactory system form a parallel code that predicts olfactory behavior and perception.

Haddad, Rafi; Weiss, Tali; Khan, Rehan; Nadler, Boaz; Mandairon, Nathalie; Bensafi, Moustafa; Schneidman, Elad; Sobel, Noam.

J Neurosci ; 30(27): 9017-26, 2010 Jul 07.

Artigo em Inglês | MEDLINE | ID: mdl-20610736

RESUMO

Odor identity is coded in spatiotemporal patterns of neural activity in the olfactory bulb. Here we asked whether meaningful olfactory information could also be read from the global olfactory neural population response. We applied standard statistical methods of dimensionality-reduction to neural activity from 12 previously published studies using seven different species. Four studies reported olfactory receptor activity, seven reported glomerulus activity, and one reported the activity of projection-neurons. We found two linear axes of neural population activity that accounted for more than half of the variance in neural response across species. The first axis was correlated with the total sum of odor-induced neural activity, and reflected the behavior of approach or withdrawal in animals, and odorant pleasantness in humans. The second and orthogonal axis reflected odorant toxicity across species. We conclude that in parallel with spatiotemporal pattern coding, the olfactory system can use simple global computations to read vital olfactory information from the neural population response.

Assuntos

Modelos Neurológicos , Neurônios/fisiologia , Bulbo Olfatório/citologia , Percepção Olfatória/fisiologia , Potenciais de Ação , Adulto , Simulação por Computador , Bases de Dados Factuais/estatística & dados numéricos , Feminino , Humanos , Masculino , Odorantes , Valor Preditivo dos Testes , Olfato/fisiologia , Estatística como Assunto , Adulto Jovem

8.

Spectral neighbor joining for reconstruction of latent tree Models.

Jaffe, Ariel; Amsel, Noah; Aizenbud, Yariv; Nadler, Boaz; Chang, Joseph T; Kluger, Yuval.

SIAM J Math Data Sci ; 3(1): 113-141, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34124606

RESUMO

A common assumption in multiple scientific applications is that the distribution of observed data can be modeled by a latent tree graphical model. An important example is phylogenetics, where the tree models the evolutionary lineages of a set of observed organisms. Given a set of independent realizations of the random variables at the leaves of the tree, a key challenge is to infer the underlying tree topology. In this work we develop Spectral Neighbor Joining (SNJ), a novel method to recover the structure of latent tree graphical models. Given a matrix that contains a measure of similarity between all pairs of observed variables, SNJ computes a spectral measure of cohesion between groups of observed variables. We prove that SNJ is consistent, and derive a sufficient condition for correct tree recovery from an estimated similarity matrix. Combining this condition with a concentration of measure result on the similarity matrix, we bound the number of samples required to recover the tree with high probability. We illustrate via extensive simulations that in comparison to several other reconstruction methods, SNJ requires fewer samples to accurately recover trees with a large number of leaves or long edges.

9.

On Detection of Faint Edges in Noisy Images.

Ofir, Nati; Galun, Meirav; Alpert, Sharon; Brandt, Achi; Nadler, Boaz; Basri, Ronen.

IEEE Trans Pattern Anal Mach Intell ; 42(4): 894-908, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-30629496

RESUMO

A fundamental question for edge detection in noisy images is how faint can an edge be and still be detected. In this paper we offer a formalism to study this question and subsequently introduce computationally efficient multiscale edge detection algorithms designed to detect faint edges in noisy images. In our formalism we view edge detection as a search in a discrete, though potentially large, set of feasible curves. First, we derive approximate expressions for the detection threshold as a function of curve length and the complexity of the search space. We then present two edge detection algorithms, one for straight edges, and the second for curved ones. Both algorithms efficiently search for edges in a large set of candidates by hierarchically constructing difference filters that match the curves traced by the sought edges. We demonstrate the utility of our algorithms in both simulations and applications involving challenging real images. Finally, based on these principles, we develop an algorithm for fiber detection and enhancement. We exemplify its utility to reveal and enhance nerve axons in light microscopy images.

10.

Principal component analysis, hierarchical clustering, and decision tree assessment of plasma mRNA and hormone levels as an early detection strategy for small intestinal neuroendocrine (carcinoid) tumors.

Modlin, Irvin M; Gustafsson, Björn I; Drozdov, Ignat; Nadler, Boaz; Pfragner, Roswitha; Kidd, Mark.

Ann Surg Oncol ; 16(2): 487-98, 2009 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-19050963

RESUMO

Incidence of neuroendocrine tumors (NETs) is increasing (approximately 6%/year), but clinical presentation is nonspecific, resulting in delays in diagnosis (5-7 years; approximately 70% have metastases). This reflects absence of a sensitive plasma marker. The aim of this study is to investigate whether detection of circulating messenger RNA (mRNA) alone or in combination with circulating NET-related hormones and growth factors can detect gastrointestinal NET disease. The small intestinal (SI) NET cell line KRJ-I was used to define the sensitivity of real-time polymerase chain reaction (PCR) for mRNA detection in blood. NSE, Tph-1, and VMAT ( 2 ) transcripts were identified from one KRJ-I cell/ml blood. mRNA from the tissue and plasma of SI-NETs (n = 12) and gastric NETs (n = 7), and plasma from healthy controls (n = 9) was isolated and real-time PCR performed. Tph-1 was a specific marker of SI-NETs (58%, p < 0.03) whereas CgA transcripts did not differentiate tumors from controls. Patients with metastatic disease expressed more marker transcripts than localized tumors (75% versus 18%, p < 0.02). Plasma 5-hydroxytryptamine (5-HT), chromogranin A (CgA), ghrelin, and connective tissue growth factor (CTGF) fragments were measured, combined with mRNA levels, and a predictive mathematical model for NET diagnosis developed using decision trees. The sensitivity and specificity to diagnose SI-NETs and gastric NETs were 81.2% and 100%, and 71.4% and 55.6%, respectively. We conclude that mRNA from one NET cell/ml blood can be detected. Circulating plasma Tph-1 is a promising marker gene for SI-NET disease (specificity 100%) while an increased number of marker transcripts (>2) correlated with disease spread. Including NET-related circulating hormones and growth factors in the algorithm increased the sensitivity of detection of SI-NETs from 58 to 82%.

Assuntos

Biomarcadores Tumorais/sangue , Tumor Carcinoide/sangue , Cromogranina A/sangue , Fator de Crescimento do Tecido Conjuntivo/sangue , Árvores de Decisões , Neoplasias Intestinais/sangue , Intestino Delgado , RNA Mensageiro/sangue , Adulto , Idoso , Idoso de 80 Anos ou mais , Biomarcadores Tumorais/genética , Estudos de Casos e Controles , Análise por Conglomerados , Diagnóstico Precoce , Ensaio de Imunoadsorção Enzimática , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Análise de Componente Principal , Prognóstico , RNA Mensageiro/genética , RNA Neoplásico/sangue , RNA Neoplásico/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Sensibilidade e Especificidade

11.

Roy's largest root under rank-one perturbations: the complex valued case and applications.

Dharmawansa, Prathapasinghe; Nadler, Boaz; Shwartz, Ofer.

J Multivar Anal ; 1742019 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-31474779

RESUMO

The largest eigenvalue of a single or a double Wishart matrix, both known as Roy's largest root, plays an important role in a variety of applications. Recently, via a small noise perturbation approach with fixed dimension and degrees of freedom, Johnstone and Nadler derived simple yet accurate approximations to its distribution in the real valued case, under a rank-one alternative. In this paper, we extend their results to the complex valued case for five common single matrix and double matrix settings. In addition, we study the finite sample distribution of the leading eigenvector. We present the utility of our results in several signal detection and communication applications, and illustrate their accuracy via simulations.

12.

NEWTON CORRECTION METHODS FOR COMPUTING REAL EIGENPAIRS OF SYMMETRIC TENSORS.

Jaffe, Ariel; Weiss, Roi; Nadler, Boaz.

SIAM J Matrix Anal Appl ; 39(3): 1071-1094, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-34295018

RESUMO

Real eigenpairs of symmetric tensors play an important role in multiple applications. In this paper we propose and analyze a fast iterative Newton-based method to compute real eigenpairs of symmetric tensors. We derive sufficient conditions for a real eigenpair to be a stable fixed point for our method and prove that given a sufficiently close initial guess, the convergence rate is quadratic. Empirically, our method converges to a significantly larger number of eigenpairs compared with previously proposed iterative methods, and with enough random initializations typically finds all real eigenpairs. In particular, for a generic symmetric tensor, the sufficient conditions for local convergence of our Newton-based method hold simultaneously for all its real eigenpairs.

13.

GeneChip, geNorm, and gastrointestinal tumors: novel reference genes for real-time PCR.

Kidd, Mark; Nadler, Boaz; Mane, Shrikant; Eick, Geeta; Malfertheiner, Maximillian; Champaneria, Manish; Pfragner, Roswitha; Modlin, Irvin.

Physiol Genomics ; 30(3): 363-70, 2007 Aug 20.

Artigo em Inglês | MEDLINE | ID: mdl-17456737

RESUMO

Accurate quantitation of target genes depends on correct normalization. Use of genes with variable tissue transcription (GAPDH) is problematic, particularly in clinical samples, which are derived from different tissue sources. Using a large-scale gene database (Affymetrix U133A) data set of 36 gastrointestinal (GI) tumors and normal tissues, we identified 8 candidate reference genes and established expression levels by real-time RT-PCR in an independent data set (n = 42). A geometric averaging method (geNorm) identified ALG9, TFCP2, and ZNF410 as the most robustly expressed control genes. Examination of raw C(T) values demonstrated that these genes were tightly correlated between themselves (R2 > 0.86, P < 0.0001), with low variability [coefficient of variation (CV) <12.7%] and high interassay reproducibility (r = 0.93, P = 0.001). In comparison, the alternative control gene, GAPDH, exhibited the highest variability (CV = 18.1%), was significantly differently expressed between tissue types (P = 0.05), was poorly correlated with the three reference genes (R2 < 0.4), and was considered the least stable gene. To illustrate the importance of correct normalization, the target gene, MTA1, was significantly overexpressed (P = 0.0006) in primary GI neuroendocrine tumor (NET) samples (vs. normal GI samples) when normalized by geNorm(ATZ) but not when normalized using GAPDH. The geNorm(ATZ) approach was, in addition, applicable to adenocarcinomas; MTA1 was overexpressed (P < 0.04) in malignant colon, pancreas, and breast tumors compared with normal tissues. We provide a robust basis for the establishment of a reference gene set using GeneChip data and provide evidence for the utility of normalizing a malignancy-associated gene (MTA1) using novel reference genes and the geNorm approach in GI NETs as well as in adenocarcinomas and breast tumors.

Assuntos

Adenocarcinoma/genética , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/isolamento & purificação , Neoplasias Gastrointestinais/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genes Neoplásicos , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase Via Transcriptase Reversa/métodos , Calibragem , Humanos , Células Tumorais Cultivadas

14.

Direct single-shot phase retrieval from the diffraction pattern of separated objects.

Leshem, Ben; Xu, Rui; Dallal, Yehonatan; Miao, Jianwei; Nadler, Boaz; Oron, Dan; Dudovich, Nirit; Raz, Oren.

Nat Commun ; 7: 10820, 2016 Feb 22.

Artigo em Inglês | MEDLINE | ID: mdl-26899582

RESUMO

The non-crystallographic phase problem arises in numerous scientific and technological fields. An important application is coherent diffractive imaging. Recent advances in X-ray free-electron lasers allow capturing of the diffraction pattern from a single nanoparticle before it disintegrates, in so-called 'diffraction before destruction' experiments. Presently, the phase is reconstructed by iterative algorithms, imposing a non-convex computational challenge, or by Fourier holography, requiring a well-characterized reference field. Here we present a convex scheme for single-shot phase retrieval for two (or more) sufficiently separated objects, demonstrated in two dimensions. In our approach, the objects serve as unknown references to one another, reducing the phase problem to a solvable set of linear equations. We establish our method numerically and experimentally in the optical domain and demonstrate a proof-of-principle single-shot coherent diffractive imaging using X-ray free-electron lasers pulses. Our scheme alleviates several limitations of current methods, offering a new pathway towards direct reconstruction of complex objects.

15.

Dielectric boundary force and its crucial role in gramicidin.

Nadler, Boaz; Hollerbach, Uwe; Eisenberg, R S.

Phys Rev E Stat Nonlin Soft Matter Phys ; 68(2 Pt 1): 021905, 2003 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-14525004

RESUMO

In an electrostatic problem with nonuniform geometry, a charge Q in one region induces surface charges [called dielectric boundary charges (DBC)] at boundaries between different dielectrics. These induced surface charges, in return, exert a force [called dielectric boundary force (DBF)] on the charge Q that induced them. The DBF is often overlooked. It is not present in standard continuum theories of (point) ions in or near membranes and proteins, such as Gouy-Chapman, Debye-Huckel, Poisson-Boltzmann or Poisson-Nernst- Planck. The DBF is important when a charge Q is near dielectric interfaces, for example, when ions permeate through protein channels embedded in biological membranes. In this paper, we define the DBF and calculate it explicitly for a planar dielectric wall and for a tunnel geometry resembling the ionic channel gramicidin. In general, we formulate the DBF in a form useful for continuum theories, namely, as a solution of a partial differential equation with boundary conditions. The DBF plays a crucial role in the permeation of ions through the gramicidin channel. A positive ion in the channel produces a DBF of opposite sign to that of the fixed charge force (FCF) produced by the permanent charge of the gramicidin polypeptide, and so the net force on the positive ion is reduced. A negative ion creates a DBF of the same sign as the FCF and so the net (repulsive) force on the negative ion is increased. Thus, a positive ion can permeate the channel, while a negative ion is excluded from it. In gramicidin, it is this balance between the FCF and DBF that allows only singly charged positive ions to move into and through the channel. The DBF is not directly responsible, however, for selectivity between the alkali metal ions (e.g., Li+, Na+, K+): we prove that the DBF on a mobile spherical ion is independent of the ion's radius.

Assuntos

Gramicidina/química , Fenômenos Biofísicos , Biofísica , Cálcio/química , Íons , Modelos Químicos

16.

Saturation of conductance in single ion channels: the blocking effect of the near reaction field.

Nadler, Boaz; Schuss, Zeev; Hollerbach, Uwe; Eisenberg, R S.

Phys Rev E Stat Nonlin Soft Matter Phys ; 70(5 Pt 1): 051912, 2004 Nov.

Artigo em Inglês | MEDLINE | ID: mdl-15600661

RESUMO

The ionic current flowing through a protein channel in the membrane of a biological cell depends on the concentration of the permeant ion, as well as on many other variables. As the concentration increases, the rate of arrival of bath ions to the channel's entrance increases, and typically so does the net current. This concentration dependence is part of traditional diffusion and rate models that predict Michaelis-Menten current-concentration relations for a single ion channel. Such models, however, neglect other effects of bath concentrations on the net current. The net current depends not only on the entrance rate of ions into the channel, but also on forces acting on ions inside the channel. These forces, in turn, depend not only on the applied potential and charge distribution of the channel, but also on the long-range Coulombic interactions with the surrounding bath ions. In this paper, we study the effects of bath concentrations on the average force on an ion in a single ion channel. We show that the force of the reaction field on a discrete ion inside a channel embedded in an uncharged lipid membrane contains a blocking (shielding) term that is proportional to the square root of the ionic bath concentration. We then show that different blocking strengths yield different behavior of the current-concentration and conductance-concentration curves. Our theory shows that at low concentrations, when the blocking force is weak, conductance grows linearly with concentration, as in traditional models, e.g., Michaelis-Menten formulations. As the concentration increases to a range of moderate shielding, conductance grows as the square root of concentration, whereas at high concentrations, with high shielding, conductance may actually decrease with increasing concentrations: the conductance-concentration curve can invert. Therefore, electrostatic interactions between bath ions and the single ion inside the channel can explain the different regimes of conductance-concentration relations observed in experiments.

Assuntos

Membrana Celular/química , Membrana Celular/fisiologia , Ativação do Canal Iônico/fisiologia , Canais Iônicos/química , Canais Iônicos/fisiologia , Modelos Biológicos , Modelos Químicos , Animais , Simulação por Computador , Humanos , Potenciais da Membrana/fisiologia

17.

Clustering of high-dimensional gene expression data with feature filtering methods and diffusion maps.

Xu, Rui; Damelin, Steven; Nadler, Boaz; Wunsch, Donald C.

Artif Intell Med ; 48(2-3): 91-8, 2010.

Artigo em Inglês | MEDLINE | ID: mdl-19962867

RESUMO

OBJECTIVE: The importance of gene expression data in cancer diagnosis and treatment has become widely known by cancer researchers in recent years. However, one of the major challenges in the computational analysis of such data is the curse of dimensionality because of the overwhelming number of variables measured (genes) versus the small number of samples. Here, we use a two-step method to reduce the dimension of gene expression data and aim to address the problem of high dimensionality. METHODS: First, we extract a subset of genes based on statistical characteristics of their corresponding gene expression levels. Then, for further dimensionality reduction, we apply diffusion maps, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set, in order to obtain efficient representation of data geometric descriptions. Finally, a neural network clustering theory, fuzzy ART, is applied to the resulting data to generate clusters of cancer samples. RESULTS: Experimental results on the small round blue-cell tumor data set, compared with other widely used clustering algorithms, such as the hierarchical clustering algorithm and K-means, show that our proposed method can effectively identify different cancer types and generate high-quality cancer sample clusters. CONCLUSION: The proposed feature selection methods and diffusion maps can achieve useful information from the multidimensional gene expression data and prove effective at addressing the problem of high dimensionality inherent in gene expression data analysis.

Assuntos

Inteligência Artificial , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos , Biologia de Sistemas , Integração de Sistemas , Algoritmos , Animais , Lógica Fuzzy , Marcadores Genéticos , Testes Genéticos , Humanos , Cadeias de Markov , Redes Neurais de Computação

18.

Predicting neuroendocrine tumor (carcinoid) neoplasia using gene expression profiling and supervised machine learning.

Drozdov, Ignat; Kidd, Mark; Nadler, Boaz; Camp, Robert L; Mane, Shrikant M; Hauso, Oyvind; Gustafsson, Bjorn I; Modlin, Irvin M.

Cancer ; 115(8): 1638-50, 2009 Apr 15.

Artigo em Inglês | MEDLINE | ID: mdl-19197975

RESUMO

BACKGROUND: A more accurate taxonomy of small intestinal (SI) neuroendocrine tumors (NETs) is necessary to accurately predict tumor behavior and prognosis and to define therapeutic strategy. In this study, the authors identified a panel of such markers that have been implicated in tumorigenicity, metastasis, and hormone production and hypothesized that transcript levels of the genes melanoma antigen family D2 (MAGE-D2), metastasis-associated 1 (MTA1), nucleosome assembly protein 1-like (NAP1L1), Ki-67 (a marker of proliferation), survivin, frizzled homolog 7 (FZD7), the Kiss1 metastasis suppressor (Kiss1), neuropilin 2 (NRP2), and chromogranin A (CgA) could be used to define primary SI NETs and to predict the development of metastases. METHODS: Seventy-three clinically and World Health Organization pathologically classified NET samples (primary tumor, n = 44 samples; liver metastases, n = 29 samples) and 30 normal human enterochromaffin (EC) cell preparations were analyzed using real-time polymerase chain reaction. Transcript levels were normalized to 3 NET housekeeping genes (asparagine-linked glycosylation 9 or ALG9, transcription factor CP2 or TFCP2, and zinc finger protein 410 or ZNF410) using geNorm analysis. A predictive gene-based model was constructed using supervised learning algorithms from the transcript expression levels. RESULTS: Primary SI NETs could be differentiated from normal human EC cell preparations with 100% specificity and 92% sensitivity. Well differentiated NETs (WDNETs), well differentiated neuroendocrine carcinomas, and poorly differentiated NETs (PDNETs) were classified with a specificity of 78%, 78%, and 71%, respectively; whereas poorly differentiated neuroendocrine carcinomas were misclassified as either WDNETs or PDNETs. Metastases were predicted in all cases with 100% sensitivity and specificity. CONCLUSIONS: The current results indicated that gene expression profiling and supervised machine learning can be used to classify SI NET subtypes and accurately predict metastasis. The authors believe that the application of this technique will facilitate accurate molecular pathologic delineation of NET disease, better define its extent, facilitate the assessment of prognosis, and provide a guide for the identification of appropriate strategies for individualized patient treatment.

Assuntos

Tumor Carcinoide/classificação , Tumor Carcinoide/genética , Processamento Eletrônico de Dados , Perfilação da Expressão Gênica , Algoritmos , Células Enterocromafins/classificação , Secções Congeladas , Humanos , Neoplasias Intestinais/classificação , Neoplasias Intestinais/genética , Metástase Neoplásica , Valor Preditivo dos Testes , Sensibilidade e Especificidade , Estudos de Validação como Assunto

19.

Variable-free exploration of stochastic models: a gene regulatory network example.

Erban, Radek; Frewen, Thomas A; Wang, Xiao; Elston, Timothy C; Coifman, Ronald; Nadler, Boaz; Kevrekidis, Ioannis G.

J Chem Phys ; 126(15): 155103, 2007 Apr 21.

Artigo em Inglês | MEDLINE | ID: mdl-17461667

RESUMO

Finding coarse-grained, low-dimensional descriptions is an important task in the analysis of complex, stochastic models of gene regulatory networks. This task involves (a) identifying observables that best describe the state of these complex systems and (b) characterizing the dynamics of the observables. In a previous paper [R. Erban et al., J. Chem. Phys. 124, 084106 (2006)] the authors assumed that good observables were known a priori, and presented an equation-free approach to approximate coarse-grained quantities (i.e., effective drift and diffusion coefficients) that characterize the long-time behavior of the observables. Here we use diffusion maps [R. Coifman et al., Proc. Natl. Acad. Sci. U.S.A. 102, 7426 (2005)] to extract appropriate observables ("reduction coordinates") in an automated fashion; these involve the leading eigenvectors of a weighted Laplacian on a graph constructed from network simulation data. We present lifting and restriction procedures for translating between physical variables and these data-based observables. These procedures allow us to perform equation-free, coarse-grained computations characterizing the long-term dynamics through the design and processing of short bursts of stochastic simulation initialized at appropriate values of the data-based observables.

Assuntos

Algoritmos , Regulação da Expressão Gênica/fisiologia , Modelos Biológicos , Proteoma/metabolismo , Transdução de Sinais/fisiologia , Modelos Estatísticos , Processos Estocásticos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA