Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 11 de 11
Filtrar
1.
Brief Bioinform ; 23(1)2022 01 17.
Artigo em Inglês | MEDLINE | ID: mdl-34849574

RESUMO

Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics make it feasible to examine the complex multicellular systems of different microenvironments. To answer scientific questions with spatial transcriptomics and expand our understanding of how cell types and states are regulated by microenvironment, the first step is to identify cell clusters by integrating the available spatial information. Here, we introduce SC-MEB, an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field. We have also derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB. In contrast to BayesSpace, a recently developed method, SC-MEB is not only computationally efficient and scalable to large sample sizes but is also capable of choosing the smoothness parameter and the number of clusters. We performed comprehensive simulation studies to demonstrate the superiority of SC-MEB over some existing methods. We applied SC-MEB to analyze the spatial transcriptome of human dorsolateral prefrontal cortex tissues and mouse hypothalamic preoptic region. Our analysis results showed that SC-MEB can achieve a similar or better clustering performance to BayesSpace, which uses the true number of clusters and a fixed smoothness parameter. Moreover, SC-MEB is scalable to large 'sample sizes'. We then employed SC-MEB to analyze a colon dataset from a patient with colorectal cancer (CRC) and COVID-19, and further performed differential expression analysis to identify signature genes related to the clustering results. The heatmap of identified signature genes showed that the clusters identified using SC-MEB were more separable than those obtained with BayesSpace. Using pathway analysis, we identified three immune-related clusters, and in a further comparison, found the mean expression of COVID-19 signature genes was greater in immune than non-immune regions of colon tissue. SC-MEB provides a valuable computational tool for investigating the structural organizations of tissues from spatial transcriptomic data.


Assuntos
Algoritmos , COVID-19/metabolismo , Simulação por Computador , Perfilação da Expressão Gênica , SARS-CoV-2/metabolismo , Animais , Colo/metabolismo , Neoplasias Colorretais/metabolismo , Córtex Pré-Frontal Dorsolateral/metabolismo , Humanos , Hipotálamo/metabolismo , Cadeias de Markov , Camundongos
2.
Stat Methods Med Res ; 29(1): 15-28, 2020 01.
Artigo em Inglês | MEDLINE | ID: mdl-30600776

RESUMO

In survival analysis, when a subset of subjects has extremely long survival, the two-part cure rate model has been commonly adopted. In the two-part model, the first part is for a binary response and describes the probability of cure. The second part is for a survival response and describes the probability of survival. Despite their intuitive interconnections, most of the existing works estimate the two parts without any constraint. The existing works on proportionality promote similarity in magnitudes (i.e. quantitative similarity) and can be too restrictive. In this study, for the two-part cure rate model, we propose imposing a sign-based penalty to promote similarity in signs (i.e. qualitative similarity). The proposed strategy can be more informative than those that neglect the two-part interconnections and be less restrictive than the existing proportionality works. Penalty is also imposed to select relevant variables and accommodate high-dimensional data. Numerical studies, including simulation and two data analyses, demonstrate the advantageous performance of the proposed approach.


Assuntos
Modelos Estatísticos , Análise de Sobrevida , Neoplasias da Mama/mortalidade , Carcinogênese/genética , Carcinoma de Células Renais/genética , Carcinoma de Células Renais/mortalidade , Simulação por Computador , Feminino , Humanos , Neoplasias Renais/genética , Neoplasias Renais/mortalidade , Modelos Lineares , Masculino , Probabilidade
3.
Genet Epidemiol ; 41(8): 779-789, 2017 12.
Artigo em Inglês | MEDLINE | ID: mdl-28913902

RESUMO

Gene expression (GE) studies have been playing a critical role in cancer research. Despite tremendous effort, the analysis results are still often unsatisfactory, because of the weak signals and high data dimensionality. Analysis is often further challenged by the long-tailed distributions of the outcome variables. In recent multidimensional studies, data have been collected on GEs as well as their regulators (e.g., copy number alterations (CNAs), methylation, and microRNAs), which can provide additional information on the associations between GEs and cancer outcomes. In this study, we develop an ARMI (assisted robust marker identification) approach for analyzing cancer studies with measurements on GEs as well as regulators. The proposed approach borrows information from regulators and can be more effective than analyzing GE data alone. A robust objective function is adopted to accommodate long-tailed distributions. Marker identification is effectively realized using penalization. The proposed approach has an intuitive formulation and is computationally much affordable. Simulation shows its satisfactory performance under a variety of settings. TCGA (The Cancer Genome Atlas) data on melanoma and lung cancer are analyzed, which leads to biologically plausible marker identification and superior prediction.


Assuntos
Biomarcadores Tumorais/genética , Modelos Genéticos , Neoplasias/genética , Biomarcadores Tumorais/metabolismo , Regulação Neoplásica da Expressão Gênica , Genes Neoplásicos , Humanos , Melanoma/genética , Melanoma/metabolismo , Melanoma/patologia , Neoplasias/metabolismo , Neoplasias/patologia , Fenótipo , Neoplasias Cutâneas/genética , Neoplasias Cutâneas/metabolismo , Neoplasias Cutâneas/patologia
4.
Genomics ; 107(6): 223-30, 2016 06.
Artigo em Inglês | MEDLINE | ID: mdl-27141884

RESUMO

Multiple types of genetic, epigenetic, and genomic changes have been implicated in cutaneous melanoma prognosis. Many of the existing studies are limited in analyzing a single type of omics measurement and cannot comprehensively describe the biological processes underlying prognosis. As a result, the obtained prognostic models may be less satisfactory, and the identified prognostic markers may be less informative. The recently collected TCGA (The Cancer Genome Atlas) data have a high quality and comprehensive omics measurements, making it possible to more comprehensively and more accurately model prognosis. In this study, we first describe the statistical approaches that can integrate multiple types of omics measurements with the assistance of variable selection and dimension reduction techniques. Data analysis suggests that, for cutaneous melanoma, integrating multiple types of measurements leads to prognostic models with an improved prediction performance. Informative individual markers and pathways are identified, which can provide valuable insights into melanoma prognosis.


Assuntos
Melanoma/genética , Prognóstico , Transcriptoma/genética , Biomarcadores Tumorais/genética , Genômica , Humanos , Melanoma/diagnóstico , Melanoma/patologia , Proteômica , Neoplasias Cutâneas , Melanoma Maligno Cutâneo
5.
Bioinformatics ; 31(24): 3977-83, 2015 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-26342102

RESUMO

MOTIVATION: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. RESULTS: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. AVAILABILITY AND IMPLEMENTATION: R code is available at http://works.bepress.com/shuangge/49/.


Assuntos
Dosagem de Genes , Regulação da Expressão Gênica , Expressão Gênica , Modelos Teóricos , Humanos , Neoplasias/genética
6.
Stat Med ; 34(30): 4016-30, 2015 Dec 30.
Artigo em Inglês | MEDLINE | ID: mdl-26239060

RESUMO

In genetic and genomic studies, gene-environment (G×E) interactions have important implications. Some of the existing G×E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection techniques. In this study, we propose a new approach for identifying important G×E interactions. It jointly models the effects of all E and G factors and their interactions. A partially linear varying coefficient model is adopted to accommodate possible nonlinear effects of E factors. A rank-based loss function is used to accommodate possible data contamination. Penalization, which has been extensively used with high-dimensional data, is adopted for selection. The proposed penalized estimation approach can automatically determine if a G factor has an interaction with an E factor, main effect but not interaction, or no effect at all. The proposed approach can be effectively realized using a coordinate descent algorithm. Simulation shows that it has satisfactory performance and outperforms several competing alternatives. The proposed approach is used to analyze a lung cancer study with gene expression measurements and clinical variables. Copyright © 2015 John Wiley & Sons, Ltd.


Assuntos
Interação Gene-Ambiente , Modelos Genéticos , Modelos Estatísticos , Algoritmos , Biomarcadores Tumorais/genética , Bioestatística , Simulação por Computador , Bases de Dados Genéticas , Feminino , Expressão Gênica , Humanos , Modelos Lineares , Neoplasias Pulmonares/etiologia , Neoplasias Pulmonares/genética , Masculino , Polimorfismo de Nucleotídeo Único
7.
Brief Bioinform ; 16(5): 735-44, 2015 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-25552438

RESUMO

For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple independent studies on the same cancer type and outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis data under the Cox model. Lasso, which is representative of a large number of regularization methods, is adopted for generating gene signatures. We examine two families of measures for quantifying the degree of overlap. The first family is based on the Cox-Lasso estimates at the optimal tunings, and the second family is based on estimates across the whole solution paths. Within each family, multiple measures, which describe the overlap from different perspectives, are introduced. The analysis of TCGA (The Cancer Genome Atlas) data on five cancer types shows that the degree of overlap varies across measures, cancer types and types of (epi)genetic measurements. More investigations are needed to better describe and understand the overlaps among gene signatures.


Assuntos
Perfilação da Expressão Gênica , Neoplasias/genética , Humanos , Prognóstico , Modelos de Riscos Proporcionais
8.
Brief Bioinform ; 16(2): 291-303, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24632304

RESUMO

With accumulating research on the interconnections among different types of genomic regulations, researchers have found that multidimensional genomic studies outperform one-dimensional studies in multiple aspects. Among many sources of multidimensional genomic data, The Cancer Genome Atlas (TCGA) provides the public with comprehensive profiling data on >30 cancer types, making it an ideal test bed for conducting and comparing different analyses. In this article, the analysis goal is to apply several existing methods and associate multidimensional genomic measurements with cancer outcomes in particular prognosis, with special focus on the predictive power of genomic signatures. We exploit clinical data and four types of genomic measurement including mRNA gene expression, DNA methylation, microRNA and copy number alterations for breast invasive carcinoma, glioblastoma multiforme, acute myeloid leukemia and lung squamous cell carcinoma collected by TCGA. To accommodate the high dimensionality, we extract important features using Principal Component Analysis, Partial Least Squares and Least Absolute Shrinkage and Selection Operator (Lasso), which are representative of dimension reduction and variable selection techniques and have been extensively adopted, and fit Cox survival models with combined important features. We calibrate the predictive power of each type of genomic measurement for the prognosis of four cancer types and find that the results vary across cancers. Our analysis also suggests that for most of the cancers in our study and the adopted methods, there is no substantial improvement in prediction when adding other genomic measurement after gene expression and clinical covariates have been included in the model. This is consistent with the findings that molecular features measured at the transcription level affect clinical outcomes more directly than those measured at the DNA/epigenetic level.


Assuntos
Genômica/estatística & dados numéricos , Neoplasias/genética , Neoplasias Encefálicas/genética , Neoplasias da Mama/genética , Carcinoma de Células Escamosas/genética , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Feminino , Glioblastoma/genética , Humanos , Análise dos Mínimos Quadrados , Leucemia Mieloide Aguda/genética , Neoplasias Pulmonares/genética , Masculino , Neoplasias/mortalidade , Análise de Componente Principal , Prognóstico , Modelos de Riscos Proporcionais
9.
Genet Epidemiol ; 38(3): 220-30, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24616063

RESUMO

In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model misspecification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example, with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.


Assuntos
Interação Gene-Ambiente , Modelos Genéticos , Meio Ambiente , Humanos , Neoplasias Pulmonares/genética , Masculino , Prognóstico , Fatores de Risco , Fatores de Tempo
10.
Genet Epidemiol ; 38(2): 144-51, 2014 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-24395534

RESUMO

In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms "classic" meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance.


Assuntos
Neoplasias/genética , Algoritmos , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Simulação por Computador , Feminino , Marcadores Genéticos , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Modelos Genéticos , Neoplasias/diagnóstico , Prognóstico
11.
Brief Bioinform ; 15(5): 671-84, 2014 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23788798

RESUMO

Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data.


Assuntos
Biomarcadores Tumorais/metabolismo , Perfilação da Expressão Gênica , Neoplasias/genética , Humanos , Modelos Teóricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA