Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 28
Filter
1.
Nucleic Acids Res ; 51(22): e115, 2023 Dec 11.
Article in English | MEDLINE | ID: mdl-37941153

ABSTRACT

In the analysis of both single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data, classifying cells/spots into cell/domain types is an essential analytic step for many secondary analyses. Most of the existing annotation methods have been developed for scRNA-seq datasets without any consideration of spatial information. Here, we present SpatialAnno, an efficient and accurate annotation method for spatial transcriptomics datasets, with the capability to effectively leverage a large number of non-marker genes as well as 'qualitative' information about marker genes without using a reference dataset. Uniquely, SpatialAnno estimates low-dimensional embeddings for a large number of non-marker genes via a factor model while promoting spatial smoothness among neighboring spots via a Potts model. Using both simulated and four real spatial transcriptomics datasets from the 10x Visium, ST, Slide-seqV1/2, and seqFISH platforms, we showcase the method's improved spatial annotation accuracy, including its robustness to the inclusion of marker genes for irrelevant cell/domain types and to various degrees of marker gene misspecification. SpatialAnno is computationally scalable and applicable to SRT datasets from different platforms. Furthermore, the estimated embeddings for cellular biological effects facilitate many downstream analyses.


Subject(s)
Gene Expression Profiling , Single-Cell Analysis , Software , Gene Expression Profiling/methods , Single-Cell Analysis/methods , Transcriptome
2.
Brief Bioinform ; 23(1)2022 01 17.
Article in English | MEDLINE | ID: mdl-34849574

ABSTRACT

Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics make it feasible to examine the complex multicellular systems of different microenvironments. To answer scientific questions with spatial transcriptomics and expand our understanding of how cell types and states are regulated by microenvironment, the first step is to identify cell clusters by integrating the available spatial information. Here, we introduce SC-MEB, an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field. We have also derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB. In contrast to BayesSpace, a recently developed method, SC-MEB is not only computationally efficient and scalable to large sample sizes but is also capable of choosing the smoothness parameter and the number of clusters. We performed comprehensive simulation studies to demonstrate the superiority of SC-MEB over some existing methods. We applied SC-MEB to analyze the spatial transcriptome of human dorsolateral prefrontal cortex tissues and mouse hypothalamic preoptic region. Our analysis results showed that SC-MEB can achieve a similar or better clustering performance to BayesSpace, which uses the true number of clusters and a fixed smoothness parameter. Moreover, SC-MEB is scalable to large 'sample sizes'. We then employed SC-MEB to analyze a colon dataset from a patient with colorectal cancer (CRC) and COVID-19, and further performed differential expression analysis to identify signature genes related to the clustering results. The heatmap of identified signature genes showed that the clusters identified using SC-MEB were more separable than those obtained with BayesSpace. Using pathway analysis, we identified three immune-related clusters, and in a further comparison, found the mean expression of COVID-19 signature genes was greater in immune than non-immune regions of colon tissue. SC-MEB provides a valuable computational tool for investigating the structural organizations of tissues from spatial transcriptomic data.


Subject(s)
Algorithms , COVID-19/metabolism , Computer Simulation , Gene Expression Profiling , SARS-CoV-2/metabolism , Animals , Colon/metabolism , Colorectal Neoplasms/metabolism , Dorsolateral Prefrontal Cortex/metabolism , Humans , Hypothalamus/metabolism , Markov Chains , Mice
3.
Nucleic Acids Res ; 50(12): e72, 2022 07 08.
Article in English | MEDLINE | ID: mdl-35349708

ABSTRACT

Dimension reduction and (spatial) clustering is usually performed sequentially; however, the low-dimensional embeddings estimated in the dimension-reduction step may not be relevant to the class labels inferred in the clustering step. We therefore developed a computation method, Dimension-Reduction Spatial-Clustering (DR-SC), that can simultaneously perform dimension reduction and (spatial) clustering within a unified framework. Joint analysis by DR-SC produces accurate (spatial) clustering results and ensures the effective extraction of biologically informative low-dimensional features. DR-SC is applicable to spatial clustering in spatial transcriptomics that characterizes the spatial organization of the tissue by segregating it into multiple tissue structures. Here, DR-SC relies on a latent hidden Markov random field model to encourage the spatial smoothness of the detected spatial cluster boundaries. Underlying DR-SC is an efficient expectation-maximization algorithm based on an iterative conditional mode. As such, DR-SC is scalable to large sample sizes and can optimize the spatial smoothness parameter in a data-driven manner. With comprehensive simulations and real data applications, we show that DR-SC outperforms existing clustering and spatial clustering methods: it extracts more biologically relevant features than conventional dimension reduction methods, improves clustering performance, and offers improved trajectory inference and visualization for downstream trajectory inference analyses.


Subject(s)
Algorithms , Transcriptome , Cluster Analysis , RNA-Seq , Single-Cell Analysis/methods , Transcriptome/genetics , Exome Sequencing
4.
Bioinformatics ; 38(2): 303-310, 2022 01 03.
Article in English | MEDLINE | ID: mdl-34499127

ABSTRACT

MOTIVATION: Mendelian randomization (MR) is a valuable tool to examine the causal relationships between health risk factors and outcomes from observational studies. Along with the proliferation of genome-wide association studies, a variety of two-sample MR methods for summary data have been developed to account for horizontal pleiotropy (HP), primarily based on the assumption that the effects of variants on exposure (γ) and HP (α) are independent. In practice, this assumption is too strict and can be easily violated because of the correlated HP. RESULTS: To account for this correlated HP, we propose a Bayesian approach, MR-Corr2, that uses the orthogonal projection to reparameterize the bivariate normal distribution for γ and α, and a spike-slab prior to mitigate the impact of correlated HP. We have also developed an efficient algorithm with paralleled Gibbs sampling. To demonstrate the advantages of MR-Corr2 over existing methods, we conducted comprehensive simulation studies to compare for both type-I error control and point estimates in various scenarios. By applying MR-Corr2 to study the relationships between exposure-outcome pairs in complex traits, we did not identify the contradictory causal relationship between HDL-c and CAD. Moreover, the results provide a new perspective of the causal network among complex traits. AVAILABILITY AND IMPLEMENTATION: The developed R package and code to reproduce all the results are available at https://github.com/QingCheng0218/MR.Corr2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Mendelian Randomization Analysis , Mendelian Randomization Analysis/methods , Bayes Theorem , Risk Factors , Computer Simulation
5.
Nucleic Acids Res ; 48(19): e109, 2020 11 04.
Article in English | MEDLINE | ID: mdl-32978944

ABSTRACT

Transcriptome-wide association studies (TWASs) integrate expression quantitative trait loci (eQTLs) studies with genome-wide association studies (GWASs) to prioritize candidate target genes for complex traits. Several statistical methods have been recently proposed to improve the performance of TWASs in gene prioritization by integrating the expression regulatory information imputed from multiple tissues, and made significant achievements in improving the ability to detect gene-trait associations. Unfortunately, most existing multi-tissue methods focus on prioritization of candidate genes, and cannot directly infer the specific functional effects of candidate genes across different tissues. Here, we propose a tissue-specific collaborative mixed model (TisCoMM) for TWASs, leveraging the co-regulation of genetic variations across different tissues explicitly via a unified probabilistic model. TisCoMM not only performs hypothesis testing to prioritize gene-trait associations, but also detects the tissue-specific role of candidate target genes in complex traits. To make full use of widely available GWASs summary statistics, we extend TisCoMM to use summary-level data, namely, TisCoMM-S2. Using extensive simulation studies, we show that type I error is controlled at the nominal level, the statistical power of identifying associated genes is greatly improved, and the false-positive rate (FPR) for non-causal tissues is well controlled at decent levels. We further illustrate the benefits of our methods in applications to summary-level GWASs data of 33 complex traits. Notably, apart from better identifying potential trait-associated genes, we can elucidate the tissue-specific role of candidate target genes. The follow-up pathway analysis from tissue-specific genes for asthma shows that the immune system plays an essential function for asthma development in both thyroid and lung tissues.


Subject(s)
Genome-Wide Association Study , Models, Statistical , Quantitative Trait Loci , Transcriptome , Asthma/genetics , Asthma/immunology , Genetic Predisposition to Disease , Humans , Lung/immunology , Multifactorial Inheritance/genetics , Organ Specificity , Thyroid Gland/immunology
6.
Bioinformatics ; 36(7): 2009-2016, 2020 04 01.
Article in English | MEDLINE | ID: mdl-31755899

ABSTRACT

MOTIVATION: Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. RESULTS: In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. AVAILABILITY AND IMPLEMENTATION: The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Transcriptome , Genotype , Phenotype , Polymorphism, Single Nucleotide , Quantitative Trait Loci
7.
Bioinformatics ; 35(19): 3693-3700, 2019 10 01.
Article in English | MEDLINE | ID: mdl-30851102

ABSTRACT

MOTIVATION: In genome-wide association studies (GWASs) where multiple correlated traits have been measured on participants, a joint analysis strategy, whereby the traits are analyzed jointly, can improve statistical power over a single-trait analysis strategy. There are two questions of interest to be addressed when conducting a joint GWAS analysis with multiple traits. The first question examines whether a genetic loci is significantly associated with any of the traits being tested. The second question focuses on identifying the specific trait(s) that is associated with the genetic loci. Since existing methods primarily focus on the first question, this article seeks to provide a complementary method that addresses the second question. RESULTS: We propose a novel method, Variational Inference for Multiple Correlated Outcomes (VIMCO) that focuses on identifying the specific trait that is associated with the genetic loci, when performing a joint GWAS analysis of multiple traits, while accounting for correlation among the multiple traits. We performed extensive numerical studies and also applied VIMCO to analyze two datasets. The numerical studies and real data analysis demonstrate that VIMCO improves statistical power over single-trait analysis strategies when the multiple traits are correlated and has comparable performance when the traits are not correlated. AVAILABILITY AND IMPLEMENTATION: The VIMCO software can be downloaded from: https://github.com/XingjieShi/VIMCO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Genome-Wide Association Study , Software , Genetic Loci , Phenotype , Polymorphism, Single Nucleotide , Research Design
8.
Genet Epidemiol ; 41(8): 779-789, 2017 12.
Article in English | MEDLINE | ID: mdl-28913902

ABSTRACT

Gene expression (GE) studies have been playing a critical role in cancer research. Despite tremendous effort, the analysis results are still often unsatisfactory, because of the weak signals and high data dimensionality. Analysis is often further challenged by the long-tailed distributions of the outcome variables. In recent multidimensional studies, data have been collected on GEs as well as their regulators (e.g., copy number alterations (CNAs), methylation, and microRNAs), which can provide additional information on the associations between GEs and cancer outcomes. In this study, we develop an ARMI (assisted robust marker identification) approach for analyzing cancer studies with measurements on GEs as well as regulators. The proposed approach borrows information from regulators and can be more effective than analyzing GE data alone. A robust objective function is adopted to accommodate long-tailed distributions. Marker identification is effectively realized using penalization. The proposed approach has an intuitive formulation and is computationally much affordable. Simulation shows its satisfactory performance under a variety of settings. TCGA (The Cancer Genome Atlas) data on melanoma and lung cancer are analyzed, which leads to biologically plausible marker identification and superior prediction.


Subject(s)
Biomarkers, Tumor/genetics , Models, Genetic , Neoplasms/genetics , Biomarkers, Tumor/metabolism , Gene Expression Regulation, Neoplastic , Genes, Neoplasm , Humans , Melanoma/genetics , Melanoma/metabolism , Melanoma/pathology , Neoplasms/metabolism , Neoplasms/pathology , Phenotype , Skin Neoplasms/genetics , Skin Neoplasms/metabolism , Skin Neoplasms/pathology
9.
Comput Stat Data Anal ; 124: 235-251, 2018 Aug.
Article in English | MEDLINE | ID: mdl-30319163

ABSTRACT

Penalization is a popular tool for multi- and high-dimensional data. Most of the existing computational algorithms have been developed for convex loss functions. Nonconvex loss functions can sometimes generate more robust results and have important applications. Motivated by the BLasso algorithm, this study develops the Forward and Backward Stagewise (Fabs) algorithm for nonconvex loss functions with the adaptive Lasso (aLasso) penalty. It is shown that each point along the Fabs paths is a δ-approximate solution to the aLasso problem and the Fabs paths converge to the stationary points of the aLasso problem when δ goes to zero, given that the loss function has second-order derivatives bounded from above. This study exemplifies the Fabs with an application to the penalized smooth partial rank (SPR) estimation, for which there is still a lack of effective algorithm. Extensive numerical studies are conducted to demonstrate the benefit of penalized SPR estimation using Fabs, especially under high-dimensional settings. Application to the smoothed 0-1 loss in binary classification is introduced to demonstrate its capability to work with other differentiable nonconvex loss function.

10.
Genet Epidemiol ; 40(5): 382-93, 2016 07.
Article in English | MEDLINE | ID: mdl-27247027

ABSTRACT

Genome-wide association studies (GWAS) have led to the identification of many genetic variants associated with complex diseases in the past 10 years. Penalization methods, with significant numerical and statistical advantages, have been extensively adopted in analyzing GWAS. This study has been partly motivated by the analysis of Genetic Analysis Workshop (GAW) 18 data, which have two notable characteristics. First, the subjects are from a small number of pedigrees and hence related. Second, for each subject, multiple correlated traits have been measured. Most of the existing penalization methods assume independence between subjects and traits and can be suboptimal. There are a few methods in the literature based on mixed modeling that can accommodate correlations. However, they cannot fully accommodate the two types of correlations while conducting effective marker selection. In this study, we develop a penalized multitrait mixed modeling approach. It accommodates the two different types of correlations and includes several existing methods as special cases. Effective penalization is adopted for marker selection. Simulation demonstrates its satisfactory performance. The GAW 18 data are analyzed using the proposed method.


Subject(s)
Genome-Wide Association Study , Models, Genetic , Pedigree , Quantitative Trait, Heritable , Area Under Curve , Computer Simulation , Humans , Phenotype , Polymorphism, Single Nucleotide/genetics , ROC Curve
11.
Brief Bioinform ; 16(5): 735-44, 2015 Sep.
Article in English | MEDLINE | ID: mdl-25552438

ABSTRACT

For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple independent studies on the same cancer type and outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis data under the Cox model. Lasso, which is representative of a large number of regularization methods, is adopted for generating gene signatures. We examine two families of measures for quantifying the degree of overlap. The first family is based on the Cox-Lasso estimates at the optimal tunings, and the second family is based on estimates across the whole solution paths. Within each family, multiple measures, which describe the overlap from different perspectives, are introduced. The analysis of TCGA (The Cancer Genome Atlas) data on five cancer types shows that the degree of overlap varies across measures, cancer types and types of (epi)genetic measurements. More investigations are needed to better describe and understand the overlaps among gene signatures.


Subject(s)
Gene Expression Profiling , Neoplasms/genetics , Humans , Prognosis , Proportional Hazards Models
12.
Brief Bioinform ; 16(2): 291-303, 2015 Mar.
Article in English | MEDLINE | ID: mdl-24632304

ABSTRACT

With accumulating research on the interconnections among different types of genomic regulations, researchers have found that multidimensional genomic studies outperform one-dimensional studies in multiple aspects. Among many sources of multidimensional genomic data, The Cancer Genome Atlas (TCGA) provides the public with comprehensive profiling data on >30 cancer types, making it an ideal test bed for conducting and comparing different analyses. In this article, the analysis goal is to apply several existing methods and associate multidimensional genomic measurements with cancer outcomes in particular prognosis, with special focus on the predictive power of genomic signatures. We exploit clinical data and four types of genomic measurement including mRNA gene expression, DNA methylation, microRNA and copy number alterations for breast invasive carcinoma, glioblastoma multiforme, acute myeloid leukemia and lung squamous cell carcinoma collected by TCGA. To accommodate the high dimensionality, we extract important features using Principal Component Analysis, Partial Least Squares and Least Absolute Shrinkage and Selection Operator (Lasso), which are representative of dimension reduction and variable selection techniques and have been extensively adopted, and fit Cox survival models with combined important features. We calibrate the predictive power of each type of genomic measurement for the prognosis of four cancer types and find that the results vary across cancers. Our analysis also suggests that for most of the cancers in our study and the adopted methods, there is no substantial improvement in prediction when adding other genomic measurement after gene expression and clinical covariates have been included in the model. This is consistent with the findings that molecular features measured at the transcription level affect clinical outcomes more directly than those measured at the DNA/epigenetic level.


Subject(s)
Genomics/statistics & numerical data , Neoplasms/genetics , Brain Neoplasms/genetics , Breast Neoplasms/genetics , Carcinoma, Squamous Cell/genetics , Computational Biology , Databases, Genetic/statistics & numerical data , Female , Glioblastoma/genetics , Humans , Least-Squares Analysis , Leukemia, Myeloid, Acute/genetics , Lung Neoplasms/genetics , Male , Neoplasms/mortality , Principal Component Analysis , Prognosis , Proportional Hazards Models
13.
Genomics ; 107(6): 223-30, 2016 06.
Article in English | MEDLINE | ID: mdl-27141884

ABSTRACT

Multiple types of genetic, epigenetic, and genomic changes have been implicated in cutaneous melanoma prognosis. Many of the existing studies are limited in analyzing a single type of omics measurement and cannot comprehensively describe the biological processes underlying prognosis. As a result, the obtained prognostic models may be less satisfactory, and the identified prognostic markers may be less informative. The recently collected TCGA (The Cancer Genome Atlas) data have a high quality and comprehensive omics measurements, making it possible to more comprehensively and more accurately model prognosis. In this study, we first describe the statistical approaches that can integrate multiple types of omics measurements with the assistance of variable selection and dimension reduction techniques. Data analysis suggests that, for cutaneous melanoma, integrating multiple types of measurements leads to prognostic models with an improved prediction performance. Informative individual markers and pathways are identified, which can provide valuable insights into melanoma prognosis.


Subject(s)
Melanoma/genetics , Prognosis , Transcriptome/genetics , Biomarkers, Tumor/genetics , Genomics , Humans , Melanoma/diagnosis , Melanoma/pathology , Proteomics , Skin Neoplasms , Melanoma, Cutaneous Malignant
14.
Brief Bioinform ; 15(5): 671-84, 2014 Sep.
Article in English | MEDLINE | ID: mdl-23788798

ABSTRACT

Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data.


Subject(s)
Biomarkers, Tumor/metabolism , Gene Expression Profiling , Neoplasms/genetics , Humans , Models, Theoretical
15.
Bioinformatics ; 31(24): 3977-83, 2015 Dec 15.
Article in English | MEDLINE | ID: mdl-26342102

ABSTRACT

MOTIVATION: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. RESULTS: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. AVAILABILITY AND IMPLEMENTATION: R code is available at http://works.bepress.com/shuangge/49/.


Subject(s)
Gene Dosage , Gene Expression Regulation , Gene Expression , Models, Theoretical , Humans , Neoplasms/genetics
16.
Genet Epidemiol ; 38(3): 220-30, 2014 Apr.
Article in English | MEDLINE | ID: mdl-24616063

ABSTRACT

In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model misspecification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example, with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.


Subject(s)
Gene-Environment Interaction , Models, Genetic , Environment , Humans , Lung Neoplasms/genetics , Male , Prognosis , Risk Factors , Time Factors
17.
Genet Epidemiol ; 38(2): 144-51, 2014 Feb.
Article in English | MEDLINE | ID: mdl-24395534

ABSTRACT

In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms "classic" meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance.


Subject(s)
Neoplasms/genetics , Algorithms , Breast Neoplasms/diagnosis , Breast Neoplasms/genetics , Computer Simulation , Female , Genetic Markers , Humans , Lung Neoplasms/diagnosis , Lung Neoplasms/genetics , Models, Genetic , Neoplasms/diagnosis , Prognosis
18.
Stat Med ; 34(30): 4016-30, 2015 Dec 30.
Article in English | MEDLINE | ID: mdl-26239060

ABSTRACT

In genetic and genomic studies, gene-environment (G×E) interactions have important implications. Some of the existing G×E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection techniques. In this study, we propose a new approach for identifying important G×E interactions. It jointly models the effects of all E and G factors and their interactions. A partially linear varying coefficient model is adopted to accommodate possible nonlinear effects of E factors. A rank-based loss function is used to accommodate possible data contamination. Penalization, which has been extensively used with high-dimensional data, is adopted for selection. The proposed penalized estimation approach can automatically determine if a G factor has an interaction with an E factor, main effect but not interaction, or no effect at all. The proposed approach can be effectively realized using a coordinate descent algorithm. Simulation shows that it has satisfactory performance and outperforms several competing alternatives. The proposed approach is used to analyze a lung cancer study with gene expression measurements and clinical variables. Copyright © 2015 John Wiley & Sons, Ltd.


Subject(s)
Gene-Environment Interaction , Models, Genetic , Models, Statistical , Algorithms , Biomarkers, Tumor/genetics , Biostatistics , Computer Simulation , Databases, Genetic , Female , Gene Expression , Humans , Linear Models , Lung Neoplasms/etiology , Lung Neoplasms/genetics , Male , Polymorphism, Single Nucleotide
19.
Front Genet ; 15: 1333855, 2024.
Article in English | MEDLINE | ID: mdl-38313677

ABSTRACT

Background: Cerebral aneurysms (CAs) are a significant cerebrovascular ailment with a multifaceted etiology influenced by various factors including heredity and environment. This study aimed to explore the possible link between different types of immune cells and the occurrence of CAs. Methods: We analyzed the connection between 731 immune cell signatures and the risk of CAs by using publicly available genetic data. The analysis included four immune features, specifically median brightness levels (MBL), proportionate cell (PC), definite cell (DC), and morphological attributes (MA). Mendelian randomization (MR) analysis was conducted using the instrumental variables (IVs) derived from the genetic variation linked to CAs. Results: After multiple test adjustment based on the FDR method, the inverse variance weighted (IVW) method revealed that 3 immune cell phenotypes were linked to the risk of CAs. These included CD45 on HLA DR+NK (odds ratio (OR), 1.116; 95% confidence interval (CI), 1.001-1.244; p = 0.0489), CX3CR1 on CD14- CD16- (OR, 0.973; 95% CI, 0.948-0.999; p = 0.0447). An immune cell phenotype CD16- CD56 on NK was found to have a significant association with the risk of CAs in reverse MR study (OR, 0.950; 95% CI, 0.911-0.990; p = 0.0156). Conclusion: Our investigation has yielded findings that support a substantial genetic link between immune cells and CAs, thereby suggesting possible implications for future clinical interventions.

20.
Inflammation ; 46(3): 1047-1060, 2023 Jun.
Article in English | MEDLINE | ID: mdl-36801996

ABSTRACT

Primary Sjogren's syndrome (pSS) is a systemic autoimmune disease that causes dysfunction of secretory glands and the specific pathogenesis is still unknown. The CXCL9, 10, 11/CXCR3 axis and G protein-coupled receptor kinase 2 (GRK2) involved in many inflammation and immunity processes. We used NOD/Ltj mice, a spontaneous SS animal model, to elucidate the pathological mechanism of CXCL9, 10, 11/CXCR3 axis promoting T lymphocyte migration by activating GRK2 in pSS. We found that CD4 + GRK2, Th17 + CXCR3 was apparently increased and Treg + CXCR3 was significantly decreased in the spleen of 4W NOD mice without sicca symptom compared to ICR mice (control group). The protein levels of IFN-γ, CXCL9, 10, 11 increased in submandibular gland (SG) tissue accompanied by obvious lymphocytic infiltration and Th17 cells overwhelmingly infiltrated relative to Treg cells at the sicca symptom occurs, and we found that the proportion of Th17 cells was increased, whereas that of Treg cells was decreased in spleen. In vitro, we used IFN-γ to stimulate human salivary gland epithelial cells (HSGECs) co-cultured with Jurkat cells, and the results showed that CXCL9, 10, 11 was increased by IFN-γ activating JAK2/STAT1 signal pathway and Jurkat cell migration increased with the raised of cell membrane GRK2 expression. HSGECs with tofacitinib or Jurkat cells with GRK2 siRNA can reduce the migration of Jurkat cells. The results indicate that CXCL9, 10, 11 significantly increased in SG tissue through IFN-γ stimulating HSGECs, and the CXCL9, 10, 11/CXCR3 axis contributes to the progress of pSS by activating GRK2 to promote T lymphocyte migration.


Subject(s)
Sjogren's Syndrome , Mice , Animals , Humans , Sjogren's Syndrome/metabolism , Mice, Inbred ICR , Mice, Inbred NOD , T-Lymphocytes, Regulatory/metabolism , Cell Movement , Chemokine CXCL9 , Receptors, CXCR3/metabolism
SELECTION OF CITATIONS
SEARCH DETAIL