Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 41
Filtrar
Más filtros

Bases de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Biostatistics ; 24(1): 108-123, 2022 12 12.
Artículo en Inglés | MEDLINE | ID: mdl-34752610

RESUMEN

Multimorbidity constitutes a serious challenge on the healthcare systems in the world, due to its association with poorer health-related outcomes, more complex clinical management, increases in health service utilization and costs, but a decrease in productivity. However, to date, most evidence on multimorbidity is derived from cross-sectional studies that have limited capacity to understand the pathway of multimorbid conditions. In this article, we present an innovative perspective on analyzing longitudinal data within a statistical framework of survival analysis of time-to-event recurrent data. The proposed methodology is based on a joint frailty modeling approach with multivariate random effects to account for the heterogeneous risk of failure and the presence of informative censoring due to a terminal event. We develop a generalized linear mixed model method for the efficient estimation of parameters. We demonstrate the capacity of our approach using a real cancer registry data set on the multimorbidity of melanoma patients and document the relative performance of the proposed joint frailty model to the natural competitor of a standard frailty model via extensive simulation studies. Our new approach is timely to advance evidence-based knowledge to address increasingly complex needs related to multimorbidity and develop interventions that are most effective and viable to better help a large number of individuals with multiple conditions.


Asunto(s)
Fragilidad , Humanos , Estudios Transversales , Análisis de Supervivencia , Simulación por Computador , Modelos Lineales
2.
Biometrics ; 76(3): 753-766, 2020 09.
Artículo en Inglés | MEDLINE | ID: mdl-31863594

RESUMEN

In the study of multiple failure time data with recurrent clinical endpoints, the classical independent censoring assumption in survival analysis can be violated when the evolution of the recurrent events is correlated with a censoring mechanism such as death. Moreover, in some situations, a cure fraction appears in the data because a tangible proportion of the study population benefits from treatment and becomes recurrence free and insusceptible to death related to the disease. A bivariate joint frailty mixture cure model is proposed to allow for dependent censoring and cure fraction in recurrent event data. The latency part of the model consists of two intensity functions for the hazard rates of recurrent events and death, wherein a bivariate frailty is introduced by means of the generalized linear mixed model methodology to adjust for dependent censoring. The model allows covariates and frailties in both the incidence and the latency parts, and it further accounts for the possibility of cure after each recurrence. It includes the joint frailty model and other related models as special cases. An expectation-maximization (EM)-type algorithm is developed to provide residual maximum likelihood estimation of model parameters. Through simulation studies, the performance of the model is investigated under different magnitudes of dependent censoring and cure rate. The model is applied to data sets from two colorectal cancer studies to illustrate its practical value.


Asunto(s)
Fragilidad , Simulación por Computador , Humanos , Modelos Estadísticos , Recurrencia , Análisis de Supervivencia
3.
Stat Med ; 38(6): 1036-1055, 2019 03 15.
Artículo en Inglés | MEDLINE | ID: mdl-30474216

RESUMEN

We present a multilevel frailty model for handling serial dependence and simultaneous heterogeneity in survival data with a multilevel structure attributed to clustering of subjects and the presence of multiple failure outcomes. One commonly observes such data, for example, in multi-institutional, randomized placebo-controlled trials in which patients suffer repeated episodes (eg, recurrent migraines) of the disease outcome being measured. The model extends the proportional hazards model by incorporating a random covariate and unobservable random institution effect to respectively account for treatment-by-institution interaction and institutional variation in the baseline risk. Moreover, a random effect term with correlation structure driven by a first-order autoregressive process is attached to the model to facilitate estimation of between patient heterogeneity and serial dependence. By means of the generalized linear mixed model methodology, the random effects distribution is assumed normal and the residual maximum likelihood and the maximum likelihood methods are extended for estimation of model parameters. Simulation studies are carried out to evaluate the performance of the residual maximum likelihood and the maximum likelihood estimators and to assess the impact of misspecifying random effects distribution on the proposed inference. We demonstrate the practical feasibility of the modeling methodology by analyzing real data from a double-blind randomized multi-institutional clinical trial, designed to examine the effect of rhDNase on the occurrence of respiratory exacerbations among patients with cystic fibrosis.


Asunto(s)
Análisis por Conglomerados , Modelos Estadísticos , Análisis de Supervivencia , Fibrosis Quística/complicaciones , Fibrosis Quística/tratamiento farmacológico , Interpretación Estadística de Datos , Desoxirribonucleasa I/uso terapéutico , Humanos , Modelos de Riesgos Proporcionales , Ensayos Clínicos Controlados Aleatorios como Asunto/métodos , Proteínas Recombinantes/uso terapéutico , Enfermedades Respiratorias/etiología , Enfermedades Respiratorias/prevención & control , Insuficiencia del Tratamiento
4.
Neural Comput ; 29(4): 990-1020, 2017 04.
Artículo en Inglés | MEDLINE | ID: mdl-28095191

RESUMEN

Mixture of autoregressions (MoAR) models provide a model-based approach to the clustering of time series data. The maximum likelihood (ML) estimation of MoAR models requires evaluating products of large numbers of densities of normal random variables. In practical scenarios, these products converge to zero as the length of the time series increases, and thus the ML estimation of MoAR models becomes infeasible without the use of numerical tricks. We propose a maximum pseudolikelihood (MPL) estimation approach as an alternative to the use of numerical tricks. The MPL estimator is proved to be consistent and can be computed with an EM (expectation-maximization) algorithm. Simulations are used to assess the performance of the MPL estimator against that of the ML estimator in cases where the latter was able to be calculated. An application to the clustering of time series data arising from a resting state fMRI experiment is presented as a demonstration of the methodology.

5.
Biostatistics ; 16(1): 98-112, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-24963011

RESUMEN

The detection of differentially expressed (DE) genes, that is, genes whose expression levels vary between two or more classes representing different experimental conditions (say, diseases), is one of the most commonly studied problems in bioinformatics. For example, the identification of DE genes between distinct disease phenotypes is an important first step in understanding and developing treatment drugs for the disease. We present a novel approach to the problem of detecting DE genes that is based on a test statistic formed as a weighted (normalized) cluster-specific contrast in the mixed effects of the mixture model used in the first instance to cluster the gene profiles into a manageable number of clusters. The key factor in the formation of our test statistic is the use of gene-specific mixed effects in the cluster-specific contrast. It thus means that the (soft) assignment of a given gene to a cluster is not crucial. This is because in addition to class differences between the (estimated) fixed effects terms for a cluster, gene-specific class differences also contribute to the cluster-specific contributions to the final form of the test statistic. The proposed test statistic can be used where the primary aim is to rank the genes in order of evidence against the null hypothesis of no DE. We also show how a P-value can be calculated for each gene for use in multiple hypothesis testing where the intent is to control the false discovery rate (FDR) at some desired level. With the use of publicly available and simulated datasets, we show that the proposed contrast-based approach outperforms other methods commonly used for the detection of DE genes both in a ranking context with lower proportion of false discoveries and in a multiple hypothesis testing context with higher power for a specified level of the FDR.


Asunto(s)
Análisis por Conglomerados , Interpretación Estadística de Datos , Perfilación de la Expresión Génica/estadística & datos numéricos , Expresión Génica/genética , Modelos Genéticos , Neoplasias de la Mama/genética , Femenino , Humanos
6.
Cytometry A ; 89(1): 30-43, 2016 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-26492316

RESUMEN

We present an algorithm for modeling flow cytometry data in the presence of large inter-sample variation. Large-scale cytometry datasets often exhibit some within-class variation due to technical effects such as instrumental differences and variations in data acquisition, as well as subtle biological heterogeneity within the class of samples. Failure to account for such variations in the model may lead to inaccurate matching of populations across a batch of samples and poor performance in classification of unlabeled samples. In this paper, we describe the Joint Clustering and Matching (JCM) procedure for simultaneous segmentation and alignment of cell populations across multiple samples. Under the JCM framework, a multivariate mixture distribution is used to model the distribution of the expressions of a fixed set of markers for each cell in a sample such that the components in the mixture model may correspond to the various populations of cells, which have similar expressions of markers (that is, clusters), in the composition of the sample. For each class of samples, an overall class template is formed by the adoption of random-effects terms to model the inter-sample variation within a class. The construction of a parametric template for each class allows for direct quantification of the differences between the template and each sample, and also between each pair of samples, both within or between classes. The classification of a new unclassified sample is then undertaken by assigning the unclassified sample to the class that minimizes the distance between its fitted mixture density and each class density as provided by the class templates. For illustration, we use a symmetric form of the Kullback-Leibler divergence as a distance measure between two densities, but other distance measures can also be applied. We show and demonstrate on four real datasets how the JCM procedure can be used to carry out the tasks of automated clustering and alignment of cell populations, and supervised classification of samples.


Asunto(s)
Biomarcadores/sangre , Biología Computacional/métodos , Procesamiento Automatizado de Datos/métodos , Citometría de Flujo/métodos , Proteínas de la Membrana/análisis , Reconocimiento de Normas Patrones Automatizadas/métodos , Algoritmos , Análisis por Conglomerados , Interpretación Estadística de Datos , Humanos , Leucemia Mieloide Aguda/diagnóstico , Linfoma Folicular/diagnóstico , Modelos Teóricos , Fiebre del Nilo Occidental/diagnóstico
7.
Neural Comput ; 28(12): 2585-2593, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27626962

RESUMEN

The mixture-of-experts (MoE) model is a popular neural network architecture for nonlinear regression and classification. The class of MoE mean functions is known to be uniformly convergent to any unknown target function, assuming that the target function is from a Sobolev space that is sufficiently differentiable and that the domain of estimation is a compact unit hypercube. We provide an alternative result, which shows that the class of MoE mean functions is dense in the class of all continuous functions over arbitrary compact domains of estimation. Our result can be viewed as a universal approximation theorem for MoE models. The theorem we present allows MoE users to be confident in applying such models for estimation when data arise from nonlinear and nondifferentiable generative processes.

8.
Biometrics ; 72(4): 1255-1265, 2016 12.
Artículo en Inglés | MEDLINE | ID: mdl-27123964

RESUMEN

Understanding how aquatic species grow is fundamental in fisheries because stock assessment often relies on growth dependent statistical models. Length-frequency-based methods become important when more applicable data for growth model estimation are either not available or very expensive. In this article, we develop a new framework for growth estimation from length-frequency data using a generalized von Bertalanffy growth model (VBGM) framework that allows for time-dependent covariates to be incorporated. A finite mixture of normal distributions is used to model the length-frequency cohorts of each month with the means constrained to follow a VBGM. The variances of the finite mixture components are constrained to be a function of mean length, reducing the number of parameters and allowing for an estimate of the variance at any length. To optimize the likelihood, we use a minorization-maximization (MM) algorithm with a Nelder-Mead sub-step. This work was motivated by the decline in catches of the blue swimmer crab (BSC) (Portunus armatus) off the east coast of Queensland, Australia. We test the method with a simulation study and then apply it to the BSC fishery data.


Asunto(s)
Braquiuros/crecimiento & desarrollo , Explotaciones Pesqueras/estadística & datos numéricos , Modelos Biológicos , Modelos Estadísticos , Algoritmos , Animales , Distribución Normal , Factores de Tiempo
9.
Comput Stat Data Anal ; 104: 79-90, 2016 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-28496285

RESUMEN

The statistical matching problem involves the integration of multiple datasets where some variables are not observed jointly. This missing data pattern leaves most statistical models unidentifiable. Statistical inference is still possible when operating under the framework of partially identified models, where the goal is to bound the parameters rather than to estimate them precisely. In many matching problems, developing feasible bounds on the parameters is equivalent to finding the set of positive-definite completions of a partially specified covariance matrix. Existing methods for characterising the set of possible completions do not extend to high-dimensional problems. A Gibbs sampler to draw from the set of possible completions is proposed. The variation in the observed samples gives an estimate of the feasible region of the parameters. The Gibbs sampler extends easily to high-dimensional statistical matching problems.

10.
Brief Bioinform ; 14(4): 402-10, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22988257

RESUMEN

We consider the classification of microarray gene-expression data. First, attention is given to the supervised case, where the tissue samples are classified with respect to a number of predefined classes and the intent is to assign a new unclassified tissue to one of these classes. The problems of forming a classifier and estimating its error rate are addressed in the context of there being a relatively small number of observations (tissue samples) compared to the number of variables (that is, the genes, which can number in the tens of thousands). We then proceed to the unsupervised case and consider the clustering of the tissue samples and also the clustering of the gene profiles. Both problems can be viewed as being non-standard ones in statistics and we address some of the key issues involved. The focus is on the use of mixture models to effect the clustering for both problems.


Asunto(s)
Expresión Génica , Genómica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Niño , Análisis por Conglomerados , Bases de Datos Genéticas , Humanos , Especificidad de Órganos , Leucemia-Linfoma Linfoblástico de Células Precursoras/metabolismo , Transcriptoma
11.
Proc Natl Acad Sci U S A ; 109(16): E944-53, 2012 Apr 17.
Artículo en Inglés | MEDLINE | ID: mdl-22451944

RESUMEN

Evolutionary change in gene expression is generally considered to be a major driver of phenotypic differences between species. We investigated innate immune diversification by analyzing interspecies differences in the transcriptional responses of primary human and mouse macrophages to the Toll-like receptor (TLR)-4 agonist lipopolysaccharide (LPS). By using a custom platform permitting cross-species interrogation coupled with deep sequencing of mRNA 5' ends, we identified extensive divergence in LPS-regulated orthologous gene expression between humans and mice (24% of orthologues were identified as "divergently regulated"). We further demonstrate concordant regulation of human-specific LPS target genes in primary pig macrophages. Divergently regulated orthologues were enriched for genes encoding cellular "inputs" such as cell surface receptors (e.g., TLR6, IL-7Rα) and functional "outputs" such as inflammatory cytokines/chemokines (e.g., CCL20, CXCL13). Conversely, intracellular signaling components linking inputs to outputs were typically concordantly regulated. Functional consequences of divergent gene regulation were confirmed by showing LPS pretreatment boosts subsequent TLR6 responses in mouse but not human macrophages, in keeping with mouse-specific TLR6 induction. Divergently regulated genes were associated with a large dynamic range of gene expression, and specific promoter architectural features (TATA box enrichment, CpG island depletion). Surprisingly, regulatory divergence was also associated with enhanced interspecies promoter conservation. Thus, the genes controlled by complex, highly conserved promoters that facilitate dynamic regulation are also the most susceptible to evolutionary change.


Asunto(s)
Perfilación de la Expresión Génica , Variación Genética , Macrófagos/metabolismo , Receptor Toll-Like 4/genética , Animales , Línea Celular , Células Cultivadas , Quimiocina CCL20/genética , Quimiocina CXCL13/genética , Evolución Molecular , Femenino , Regulación de la Expresión Génica/efectos de los fármacos , Interacciones Huésped-Patógeno , Humanos , Lipopolisacáridos/farmacología , Macrófagos/efectos de los fármacos , Macrófagos/microbiología , Masculino , Ratones , Ratones Endogámicos BALB C , Ratones Endogámicos C57BL , Ratones Noqueados , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Salmonella typhimurium/fisiología , Especificidad de la Especie , Porcinos , Receptor Toll-Like 4/agonistas
12.
BMC Bioinformatics ; 13: 300, 2012 Nov 14.
Artículo en Inglés | MEDLINE | ID: mdl-23151154

RESUMEN

BACKGROUND: Time-course gene expression data such as yeast cell cycle data may be periodically expressed. To cluster such data, currently used Fourier series approximations of periodic gene expressions have been found not to be sufficiently adequate to model the complexity of the time-course data, partly due to their ignoring the dependence between the expression measurements over time and the correlation among gene expression profiles. We further investigate the advantages and limitations of available models in the literature and propose a new mixture model with autoregressive random effects of the first order for the clustering of time-course gene-expression profiles. Some simulations and real examples are given to demonstrate the usefulness of the proposed models. RESULTS: We illustrate the applicability of our new model using synthetic and real time-course datasets. We show that our model outperforms existing models to provide more reliable and robust clustering of time-course data. Our model provides superior results when genetic profiles are correlated. It also gives comparable results when the correlation between the gene profiles is weak. In the applications to real time-course data, relevant clusters of coregulated genes are obtained, which are supported by gene-function annotation databases. CONCLUSIONS: Our new model under our extension of the EMMIX-WIRE procedure is more reliable and robust for clustering time-course data because it adopts a random effects model that allows for the correlation among observations at different time points. It postulates gene-specific random effects with an autocorrelation variance structure that models coregulation within the clusters. The developed R package is flexible in its specification of the random effects through user-input parameters that enables improved modelling and consequent clustering of time-course data.


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Programas Informáticos , Transcriptoma , Algoritmos , Ciclo Celular/genética , Análisis por Conglomerados , Bases de Datos Factuales , Expresión Génica , Modelos Genéticos , Saccharomyces cerevisiae/genética
13.
Bioinformatics ; 27(9): 1269-76, 2011 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-21372081

RESUMEN

MOTIVATION: Mixtures of factor analyzers enable model-based clustering to be undertaken for high-dimensional microarray data, where the number of observations n is small relative to the number of genes p. Moreover, when the number of clusters is not small, for example, where there are several different types of cancer, there may be the need to reduce further the number of parameters in the specification of the component-covariance matrices. A further reduction can be achieved by using mixtures of factor analyzers with common component-factor loadings (MCFA), which is a more parsimonious model. However, this approach is sensitive to both non-normality and outliers, which are commonly observed in microarray experiments. This sensitivity of the MCFA approach is due to its being based on a mixture model in which the multivariate normal family of distributions is assumed for the component-error and factor distributions. RESULTS: An extension to mixtures of t-factor analyzers with common component-factor loadings is considered, whereby the multivariate t-family is adopted for the component-error and factor distributions. An EM algorithm is developed for the fitting of mixtures of common t-factor analyzers. The model can handle data with tails longer than that of the normal distribution, is robust against outliers and allows the data to be displayed in low-dimensional plots. It is applied here to both synthetic data and some microarray gene expression data for clustering and shows its better performance over several existing methods. AVAILABILITY: The algorithms were implemented in Matlab. The Matlab code is available at http://blog.naver.com/aggie100.


Asunto(s)
Algoritmos , Análisis por Conglomerados , Análisis Factorial , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Perfilación de la Expresión Génica/métodos , Humanos , Modelos Estadísticos , Distribución Normal , Sensibilidad y Especificidad , Programas Informáticos
14.
Proc Natl Acad Sci U S A ; 106(21): 8519-24, 2009 May 26.
Artículo en Inglés | MEDLINE | ID: mdl-19443687

RESUMEN

Flow cytometric analysis allows rapid single cell interrogation of surface and intracellular determinants by measuring fluorescence intensity of fluorophore-conjugated reagents. The availability of new platforms, allowing detection of increasing numbers of cell surface markers, has challenged the traditional technique of identifying cell populations by manual gating and resulted in a growing need for the development of automated, high-dimensional analytical methods. We present a direct multivariate finite mixture modeling approach, using skew and heavy-tailed distributions, to address the complexities of flow cytometric analysis and to deal with high-dimensional cytometric data without the need for projection or transformation. We demonstrate its ability to detect rare populations, to model robustly in the presence of outliers and skew, and to perform the critical task of matching cell populations across samples that enables downstream analysis. This advance will facilitate the application of flow cytometry to new, complex biological and clinical problems.


Asunto(s)
Citometría de Flujo/métodos , Biomarcadores , Línea Celular , Membrana Celular/metabolismo , Inmunidad Innata/inmunología , Memoria Inmunológica/inmunología , Modelos Biológicos , Fenotipo , Fosforilación , Estadística como Asunto , Linfocitos T/citología , Linfocitos T/inmunología
15.
Bioinformatics ; 26(9): 1192-8, 2010 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-20223834

RESUMEN

MOTIVATION: Microarrays are being increasingly used in cancer research to better characterize and classify tumors by selecting marker genes. However, as very few of these genes have been validated as predictive biomarkers so far, it is mostly conventional clinical and pathological factors that are being used as prognostic indicators of clinical course. Combining clinical data with gene expression data may add valuable information, but it is a challenging task due to their categorical versus continuous characteristics. We have further developed the mixture of experts (ME) methodology, a promising approach to tackle complex non-linear problems. Several variants are proposed in integrative ME as well as the inclusion of various gene selection methods to select a hybrid signature. RESULTS: We show on three cancer studies that prediction accuracy can be improved when combining both types of variables. Furthermore, the selected genes were found to be of high relevance and can be considered as potential biomarkers for the prognostic selection of cancer therapy. AVAILABILITY: Integrative ME is implemented in the R package integrativeME (http://cran.r-project.org/).


Asunto(s)
Biomarcadores de Tumor/metabolismo , Biología Computacional/métodos , Marcadores Genéticos , Oncología Médica/métodos , Algoritmos , Teorema de Bayes , Perfilación de la Expresión Génica , Humanos , Masculino , Modelos Biológicos , Modelos Genéticos , Modelos Estadísticos , Análisis de Secuencia por Matrices de Oligonucleótidos , Neoplasias de la Próstata/metabolismo , Reproducibilidad de los Resultados
16.
Stat Methods Med Res ; 29(5): 1368-1385, 2020 05.
Artículo en Inglés | MEDLINE | ID: mdl-31293217

RESUMEN

Many medical studies yield data on recurrent clinical events from populations which consist of a proportion of cured patients in the presence of those who experience the event at several times (uncured). A frailty mixture cure model has recently been postulated for such data, with an assumption that the random subject effect (frailty) of each uncured patient is constant across successive gap times between recurrent events. We propose two new models in a more general setting, assuming a multivariate time-varying frailty with an AR(1) correlation structure for each uncured patient and addressing multilevel recurrent event data originated from multi-institutional (multi-centre) clinical trials, using extra random effect terms to adjust for institution effect and treatment-by-institution interaction. To solve the difficulties in parameter estimation due to these highly complex correlation structures, we develop an efficient estimation procedure via an EM-type algorithm based on residual maximum likelihood (REML) through the generalised linear mixed model (GLMM) methodology. Simulation studies are presented to assess the performances of the models. Data sets from a colorectal cancer study and rhDNase multi-institutional clinical trial were analyzed to exemplify the proposed models. The results demonstrate a large positive AR(1) correlation among frailties across successive gap times, indicating a constant frailty may not be realistic in some situations. Comparisons of findings with existing frailty models are discussed.


Asunto(s)
Fragilidad , Modelos Estadísticos , Humanos , Análisis de Supervivencia , Simulación por Computador , Modelos Lineales
17.
J Appl Stat ; 47(5): 804-826, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-35707324

RESUMEN

This paper proposes a new regression model for the analysis of spatial panel data in the case of spatial heterogeneity and non-normality. In empirical economic research, the normality of error components is a routine assumption for the models with continuous responses. However, such an assumption may not be appropriate in many applications. This work relaxes the normality assumption by using a multivariate skew-normal distribution, which includes the normal distribution as a special case. The methodology is illustrated through a simulation study and application to insurance and gasoline demand data sets. In these analyses, a simple Bayesian framework that implements a Markov chain Monte Carlo algorithm is derived for parameter estimation and inference.

18.
Stat Med ; 28(27): 3454-66, 2009 Nov 30.
Artículo en Inglés | MEDLINE | ID: mdl-19697291

RESUMEN

The long-term survivor mixture model is commonly applied to analyse survival data when some individuals may never experience the failure event of interest. A score test is presented to assess whether the cured proportion is significant to justify the long-term survivor mixture model. Sampling distribution and power of the test statistic are evaluated by simulation studies. The results confirm that the proposed test statistic performs well in finite sample situations. The test procedure is illustrated using a breast cancer survival data set and the clustered multivariate failure times from a multi-centre clinical trial of carcinoma.


Asunto(s)
Simulación por Computador , Modelos Biológicos , Modelos Estadísticos , Sobrevivientes , Neoplasias de la Mama/mortalidad , Femenino , Histocitoquímica , Humanos , Lectinas/química
19.
P R Health Sci J ; 28(2): 89-104, 2009 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-19530550

RESUMEN

DNA microarray is a technology that simultaneously evaluates quantitative measurements for the expression of thousands of genes. DNA microarrays have been used to assess gene expression between groups of cells of different organs or different populations. In order to understand the role and function of the genes, one needs the complete information about their mRNA transcripts and proteins. Unfortunately, exploring the protein functions is very difficult, due to their unique 3-dimentional complicated structure. To overcome this difficulty, one may concentrate on the mRNA molecules produced by the gene expression. In this paper, we describe some of the methods for preprocessing data for gene expression and for pairwise comparison from genomic experiments. Previous studies to assess the efficiency of different methods for pairwise comparisons have found little agreement in the lists of significant genes. Finally, we describe the procedures to control false discovery rates, sample size approach for these experiments, and available software for microarray data analysis. This paper is written for those professionals who are new in microarray data analysis for differential expression and want to have an overview of the specific steps or the different approaches for this sort of analysis.


Asunto(s)
Hibridación Genómica Comparativa/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Algoritmos , Humanos , Procesamiento de Imagen Asistido por Computador , Programas Informáticos
20.
Bioinformatics ; 23(4): 458-65, 2007 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-17166856

RESUMEN

MOTIVATION: We present a new approach to the analysis of images for complementary DNA microarray experiments. The image segmentation and intensity estimation are performed simultaneously by adopting a two-component mixture model. One component of this mixture corresponds to the distribution of the background intensity, while the other corresponds to the distribution of the foreground intensity. The intensity measurement is a bivariate vector consisting of red and green intensities. The background intensity component is modeled by the bivariate gamma distribution, whose marginal densities for the red and green intensities are independent three-parameter gamma distributions with different parameters. The foreground intensity component is taken to be the bivariate t distribution, with the constraint that the mean of the foreground is greater than that of the background for each of the two colors. The degrees of freedom of this t distribution are inferred from the data but they could be specified in advance to reduce the computation time. Also, the covariance matrix is not restricted to being diagonal and so it allows for nonzero correlation between R and G foreground intensities. This gamma-t mixture model is fitted by maximum likelihood via the EM algorithm. A final step is executed whereby nonparametric (kernel) smoothing is undertaken of the posterior probabilities of component membership. The main advantages of this approach are: (1) it enjoys the well-known strengths of a mixture model, namely flexibility and adaptability to the data; (2) it considers the segmentation and intensity simultaneously and not separately as in commonly used existing software, and it also works with the red and green intensities in a bivariate framework as opposed to their separate estimation via univariate methods; (3) the use of the three-parameter gamma distribution for the background red and green intensities provides a much better fit than the normal (log normal) or t distributions; (4) the use of the bivariate t distribution for the foreground intensity provides a model that is less sensitive to extreme observations; (5) as a consequence of the aforementioned properties, it allows segmentation to be undertaken for a wide range of spot shapes, including doughnut, sickle shape and artifacts. RESULTS: We apply our method for gridding, segmentation and estimation to cDNA microarray real images and artificial data. Our method provides better segmentation results in spot shapes as well as intensity estimation than Spot and spotSegmentation R language softwares. It detected blank spots as well as bright artifact for the real data, and estimated spot intensities with high-accuracy for the synthetic data. AVAILABILITY: The algorithms were implemented in Matlab. The Matlab codes implementing both the gridding and segmentation/estimation are available upon request. SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.


Asunto(s)
Algoritmos , Perfilación de la Expresión Génica/métodos , Interpretación de Imagen Asistida por Computador/métodos , Microscopía Fluorescente/métodos , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Fotometría/métodos , Espectrometría de Fluorescencia/métodos , Interpretación Estadística de Datos , Hibridación Fluorescente in Situ/métodos , Modelos Estadísticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA