Búsqueda | BVS Bolivia

Sparse Principal Component Analysis With Preserved Sparsity Pattern.

Seghouane, Abd-Krim; Shokouhi, Navid; Koch, Inge.

IEEE Trans Image Process ; 28(7): 3274-3285, 2019 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-30703025

RESUMEN

Principal component analysis (PCA) is widely used for feature extraction and dimension reduction in pattern recognition and data analysis. Despite its popularity, the reduced dimension obtained from the PCA is difficult to interpret due to the dense structure of principal loading vectors. To address this issue, several methods have been proposed for sparse PCA, all of which estimate loading vectors with few non-zero elements. However, when more than one principal component is estimated, the associated loading vectors do not possess the same sparsity pattern. Therefore, it becomes difficult to determine a small subset of variables from the original feature space that have the highest contribution in the principal components. To address this issue, an adaptive block sparse PCA method is proposed. The proposed method is guaranteed to obtain the same sparsity pattern across all principal components. Experiments show that applying the proposed sparse PCA method can help improve the performance of feature selection for image processing applications. We further demonstrate that our proposed sparse PCA method can be used to improve the performance of blind source separation for functional magnetic resonance imaging data.

Evaluating the Contributions of Individual Variables to a Quadratic Form.

Garthwaite, Paul H; Koch, Inge.

Aust N Z J Stat ; 58(1): 99-119, 2016 Mar.

Artículo en Inglés | MEDLINE | ID: mdl-27478405

RESUMEN

Quadratic forms capture multivariate information in a single number, making them useful, for example, in hypothesis testing. When a quadratic form is large and hence interesting, it might be informative to partition the quadratic form into contributions of individual variables. In this paper it is argued that meaningful partitions can be formed, though the precise partition that is determined will depend on the criterion used to select it. An intuitively reasonable criterion is proposed and the partition to which it leads is determined. The partition is based on a transformation that maximises the sum of the correlations between individual variables and the variables to which they transform under a constraint. Properties of the partition, including optimality properties, are examined. The contributions of individual variables to a quadratic form are less clear-cut when variables are collinear, and forming new variables through rotation can lead to greater transparency. The transformation is adapted so that it has an invariance property under such rotation, whereby the assessed contributions are unchanged for variables that the rotation does not affect directly. Application of the partition to Hotelling's one- and two-sample test statistics, Mahalanobis distance and discriminant analysis is described and illustrated through examples. It is shown that bootstrap confidence intervals for the contributions of individual variables to a partition are readily obtained.

Classification of MALDI-MS imaging data of tissue microarrays using canonical correlation analysis-based variable selection.

Winderbaum, Lyron; Koch, Inge; Mittal, Parul; Hoffmann, Peter.

Proteomics ; 16(11-12): 1731-5, 2016 06.

Artículo en Inglés | MEDLINE | ID: mdl-27028088

RESUMEN

Applying MALDI-MS imaging to tissue microarrays (TMAs) provides access to proteomics data from large cohorts of patients in a cost- and time-efficient way, and opens the potential for applying this technology in clinical diagnosis. The complexity of these TMA data-high-dimensional low sample size-provides challenges for the statistical analysis, as classical methods typically require a nonsingular covariance matrix that cannot be satisfied if the dimension is greater than the sample size. We use TMAs to collect data from endometrial primary carcinomas from 43 patients. Each patient has a lymph node metastasis (LNM) status of positive or negative, which we predict on the basis of the MALDI-MS imaging TMA data. We propose a variable selection approach based on canonical correlation analysis that explicitly uses the LNM information. We apply LDA to the selected variables only. Our method misclassifies 2.3-20.9% of patients by leave-one-out cross-validation and strongly outperforms LDA after reduction of the original data with principle component analysis.

Asunto(s)

Neoplasias Endometriales/diagnóstico por imagen , Proteómica/métodos , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción/métodos , Análisis de Matrices Tisulares/métodos , Neoplasias Endometriales/diagnóstico , Neoplasias Endometriales/patología , Femenino , Humanos , Metástasis Linfática , Estadificación de Neoplasias , Análisis de Componente Principal

Computationally efficient multidimensional analysis of complex flow cytometry data using second order polynomial histograms.

Zaunders, John; Jing, Junmei; Leipold, Michael; Maecker, Holden; Kelleher, Anthony D; Koch, Inge.

Cytometry A ; 89(1): 44-58, 2016 Jan.

Artículo en Inglés | MEDLINE | ID: mdl-26097104

RESUMEN

Many methods have been described for automated clustering analysis of complex flow cytometry data, but so far the goal to efficiently estimate multivariate densities and their modes for a moderate number of dimensions and potentially millions of data points has not been attained. We have devised a novel approach to describing modes using second order polynomial histogram estimators (SOPHE). The method divides the data into multivariate bins and determines the shape of the data in each bin based on second order polynomials, which is an efficient computation. These calculations yield local maxima and allow joining of adjacent bins to identify clusters. The use of second order polynomials also optimally uses wide bins, such that in most cases each parameter (dimension) need only be divided into 4-8 bins, again reducing computational load. We have validated this method using defined mixtures of up to 17 fluorescent beads in 16 dimensions, correctly identifying all populations in data files of 100,000 beads in <10 s, on a standard laptop. The method also correctly clustered granulocytes, lymphocytes, including standard T, B, and NK cell subsets, and monocytes in 9-color stained peripheral blood, within seconds. SOPHE successfully clustered up to 36 subsets of memory CD4 T cells using differentiation and trafficking markers, in 14-color flow analysis, and up to 65 subpopulations of PBMC in 33-dimensional CyTOF data, showing its usefulness in discovery research. SOPHE has the potential to greatly increase efficiency of analysing complex mixtures of cells in higher dimensions.

Asunto(s)

Análisis por Conglomerados , Biología Computacional/métodos , Citometría de Flujo/métodos , Adulto , Algoritmos , Linfocitos B/citología , Biomarcadores/análisis , Interpretación Estadística de Datos , Procesamiento Automatizado de Datos/métodos , Granulocitos/citología , Humanos , Células Asesinas Naturales/citología , Subgrupos de Linfocitos T/citología

Alignment of time course gene expression data and the classification of developmentally driven genes with hidden Markov models.

Robinson, Sean; Glonek, Garique; Koch, Inge; Thomas, Mark; Davies, Christopher.

BMC Bioinformatics ; 16: 196, 2015 Jun 18.

Artículo en Inglés | MEDLINE | ID: mdl-26084333

RESUMEN

BACKGROUND: We consider data from a time course microarray experiment that was conducted on grapevines over the development cycle of the grape berries at two different vineyards in South Australia. Although the underlying biological process of berry development is the same at both vineyards, there are differences in the timing of the development due to local conditions. We aim to align the data from the two vineyards to enable an integrated analysis of the gene expression and use the alignment of the expression profiles to classify likely developmental function. RESULTS: We present a novel alignment method based on hidden Markov models (HMMs) and use the method to align the motivating grapevine data. We show that our alignment method is robust against subsets of profiles that are not suitable for alignment, investigate alignment diagnostics under the model and demonstrate the classification of developmentally driven genes. CONCLUSIONS: The classification of developmentally driven genes both validates that the alignment we obtain is meaningful and also gives new evidence that can be used to identify the role of genes with unknown function. Using our alignment methodology, we find at least 1279 grapevine probe sets with no current annotated function that are likely to be controlled in a developmental manner.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica/métodos , Regulación del Desarrollo de la Expresión Génica , Genes de Plantas/genética , Vitis/crecimiento & desarrollo , Vitis/genética , Genoma de Planta , Humanos , Funciones de Verosimilitud , Cadenas de Markov , Factores de Tiempo , Vino

Highest density difference region estimation with application to flow cytometric data.

Duong, Tarn; Koch, Inge; Wand, M P.

Biom J ; 51(3): 504-21, 2009 Jun.

Artículo en Inglés | MEDLINE | ID: mdl-19588456

RESUMEN

Motivated by the needs of scientists using flow cytometry, we study the problem of estimating the region where two multivariate samples differ in density. We call this problem highest density difference region estimation and recognise it as a two-sample analogue of highest density region or excess set estimation. Flow cytometry samples are typically in the order of 10,000 and 100,000 and with dimension ranging from about 3 to 20. The industry standard for the problem being studied is called Frequency Difference Gating, due to Roederer and Hardy (2001). After couching the problem in a formal statistical framework we devise an alternative estimator that draws upon recent statistical developments such as patient rule induction methods. Improved performance is illustrated in simulations. While motivated by flow cytometry, the methodology is suitable for general multivariate random samples where density difference regions are of interest.

Asunto(s)

Recuento de Células/métodos , Células Cultivadas/citología , Células Cultivadas/fisiología , Citometría de Flujo/métodos , Interpretación de Imagen Asistida por Computador/métodos , Interpretación Estadística de Datos , Distribuciones Estadísticas

Dimension selection for feature selection and dimension reduction with principal and independent component analysis.

Koch, Inge; Naito, Kanta.

Neural Comput ; 19(2): 513-45, 2007 Feb.

Artículo en Inglés | MEDLINE | ID: mdl-17206873

RESUMEN

This letter is concerned with the problem of selecting the best or most informative dimension for dimension reduction and feature extraction in high-dimensional data. The dimension of the data is reduced by principal component analysis; subsequent application of independent component analysis to the principal component scores determines the most nongaussian directions in the lower-dimensional space. A criterion for choosing the optimal dimension based on bias-adjusted skewness and kurtosis is proposed. This new dimension selector is applied to real data sets and compared to existing methods. Simulation studies for a range of densities show that the proposed method performs well and is more appropriate for nongaussian data than existing methods.

Asunto(s)

Interpretación Estadística de Datos , Modelos Estadísticos , Análisis de Componente Principal , Algoritmos , Análisis Numérico Asistido por Computador

Identification and quantification of change in Australian illicit drug markets.

Gilmour, Stuart; Koch, Inge; Degenhardt, Louisa; Day, Carolyn.

BMC Public Health ; 6: 200, 2006 Aug 03.

Artículo en Inglés | MEDLINE | ID: mdl-16884546

RESUMEN

BACKGROUND: In early 2001 Australia experienced a sudden reduction in the availability of heroin which had widespread effects on illicit drug markets across the country. The consequences of this event, commonly referred to as the Australian 'heroin shortage', have been extensively studied and there has been considerable debate as to the causes of the shortage and its implications for drug policy. This paper aims to investigate the presence of these epidemic patterns, to quantify the scale over which they occur and to estimate the relative importance of the 'heroin shortage' and any epidemic patterns in the drug markets. METHOD: Key indicator data series from the New South Wales illicit drug market were analysed using the statistical methods Principal Component Analysis and SiZer. RESULTS: The 'heroin shortage' represents the single most important source of variation in this illicit drug market. Furthermore the size of the effect of the heroin shortage is more than three times that evidenced by long-term 'epidemic' patterns. CONCLUSION: The 'heroin shortage' was unlikely to have been a simple correction at the end of a long period of reduced heroin availability, and represents a separate non-random shock which strongly affected the markets.

Asunto(s)

Trastornos Relacionados con Anfetaminas/epidemiología , Trastornos Relacionados con Cocaína/epidemiología , Control de Medicamentos y Narcóticos/tendencias , Dependencia de Heroína/mortalidad , Heroína/provisión & distribución , Drogas Ilícitas/provisión & distribución , Aplicación de la Ley , Anfetamina/economía , Anfetamina/provisión & distribución , Trastornos Relacionados con Anfetaminas/economía , Análisis por Conglomerados , Cocaína/economía , Cocaína/provisión & distribución , Trastornos Relacionados con Cocaína/economía , Control de Medicamentos y Narcóticos/economía , Heroína/economía , Dependencia de Heroína/economía , Humanos , Drogas Ilícitas/economía , Nueva Gales del Sur/epidemiología , Distribución Normal , Análisis de Componente Principal , Factores de Tiempo

Zur Wirkung von Endoxan auf die Entwicklung des Unterkieferskelets Beim Haushuhn.

Koch, Inge; Heydecke, Rolf.

Wilhelm Roux Arch Entwickl Mech Org ; 158(2): 195-204, 1967 Jun.

Artículo en Alemán | MEDLINE | ID: mdl-28304644

RESUMEN

RESUMEN

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA