Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 28
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Nucleic Acids Res ; 51(22): e115, 2023 Dec 11.
Artículo en Inglés | MEDLINE | ID: mdl-37941153

RESUMEN

In the analysis of both single-cell RNA sequencing (scRNA-seq) and spatially resolved transcriptomics (SRT) data, classifying cells/spots into cell/domain types is an essential analytic step for many secondary analyses. Most of the existing annotation methods have been developed for scRNA-seq datasets without any consideration of spatial information. Here, we present SpatialAnno, an efficient and accurate annotation method for spatial transcriptomics datasets, with the capability to effectively leverage a large number of non-marker genes as well as 'qualitative' information about marker genes without using a reference dataset. Uniquely, SpatialAnno estimates low-dimensional embeddings for a large number of non-marker genes via a factor model while promoting spatial smoothness among neighboring spots via a Potts model. Using both simulated and four real spatial transcriptomics datasets from the 10x Visium, ST, Slide-seqV1/2, and seqFISH platforms, we showcase the method's improved spatial annotation accuracy, including its robustness to the inclusion of marker genes for irrelevant cell/domain types and to various degrees of marker gene misspecification. SpatialAnno is computationally scalable and applicable to SRT datasets from different platforms. Furthermore, the estimated embeddings for cellular biological effects facilitate many downstream analyses.


Asunto(s)
Perfilación de la Expresión Génica , Análisis de la Célula Individual , Programas Informáticos , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual/métodos , Transcriptoma
2.
Brief Bioinform ; 23(1)2022 01 17.
Artículo en Inglés | MEDLINE | ID: mdl-34849574

RESUMEN

Spatial transcriptomics has been emerging as a powerful technique for resolving gene expression profiles while retaining tissue spatial information. These spatially resolved transcriptomics make it feasible to examine the complex multicellular systems of different microenvironments. To answer scientific questions with spatial transcriptomics and expand our understanding of how cell types and states are regulated by microenvironment, the first step is to identify cell clusters by integrating the available spatial information. Here, we introduce SC-MEB, an empirical Bayes approach for spatial clustering analysis using a hidden Markov random field. We have also derived an efficient expectation-maximization algorithm based on an iterative conditional mode for SC-MEB. In contrast to BayesSpace, a recently developed method, SC-MEB is not only computationally efficient and scalable to large sample sizes but is also capable of choosing the smoothness parameter and the number of clusters. We performed comprehensive simulation studies to demonstrate the superiority of SC-MEB over some existing methods. We applied SC-MEB to analyze the spatial transcriptome of human dorsolateral prefrontal cortex tissues and mouse hypothalamic preoptic region. Our analysis results showed that SC-MEB can achieve a similar or better clustering performance to BayesSpace, which uses the true number of clusters and a fixed smoothness parameter. Moreover, SC-MEB is scalable to large 'sample sizes'. We then employed SC-MEB to analyze a colon dataset from a patient with colorectal cancer (CRC) and COVID-19, and further performed differential expression analysis to identify signature genes related to the clustering results. The heatmap of identified signature genes showed that the clusters identified using SC-MEB were more separable than those obtained with BayesSpace. Using pathway analysis, we identified three immune-related clusters, and in a further comparison, found the mean expression of COVID-19 signature genes was greater in immune than non-immune regions of colon tissue. SC-MEB provides a valuable computational tool for investigating the structural organizations of tissues from spatial transcriptomic data.


Asunto(s)
Algoritmos , COVID-19/metabolismo , Simulación por Computador , Perfilación de la Expresión Génica , SARS-CoV-2/metabolismo , Animales , Colon/metabolismo , Neoplasias Colorrectales/metabolismo , Corteza Prefontal Dorsolateral/metabolismo , Humanos , Hipotálamo/metabolismo , Cadenas de Markov , Ratones
3.
Nucleic Acids Res ; 50(12): e72, 2022 07 08.
Artículo en Inglés | MEDLINE | ID: mdl-35349708

RESUMEN

Dimension reduction and (spatial) clustering is usually performed sequentially; however, the low-dimensional embeddings estimated in the dimension-reduction step may not be relevant to the class labels inferred in the clustering step. We therefore developed a computation method, Dimension-Reduction Spatial-Clustering (DR-SC), that can simultaneously perform dimension reduction and (spatial) clustering within a unified framework. Joint analysis by DR-SC produces accurate (spatial) clustering results and ensures the effective extraction of biologically informative low-dimensional features. DR-SC is applicable to spatial clustering in spatial transcriptomics that characterizes the spatial organization of the tissue by segregating it into multiple tissue structures. Here, DR-SC relies on a latent hidden Markov random field model to encourage the spatial smoothness of the detected spatial cluster boundaries. Underlying DR-SC is an efficient expectation-maximization algorithm based on an iterative conditional mode. As such, DR-SC is scalable to large sample sizes and can optimize the spatial smoothness parameter in a data-driven manner. With comprehensive simulations and real data applications, we show that DR-SC outperforms existing clustering and spatial clustering methods: it extracts more biologically relevant features than conventional dimension reduction methods, improves clustering performance, and offers improved trajectory inference and visualization for downstream trajectory inference analyses.


Asunto(s)
Algoritmos , Transcriptoma , Análisis por Conglomerados , RNA-Seq , Análisis de la Célula Individual/métodos , Transcriptoma/genética , Secuenciación del Exoma
4.
Bioinformatics ; 38(2): 303-310, 2022 01 03.
Artículo en Inglés | MEDLINE | ID: mdl-34499127

RESUMEN

MOTIVATION: Mendelian randomization (MR) is a valuable tool to examine the causal relationships between health risk factors and outcomes from observational studies. Along with the proliferation of genome-wide association studies, a variety of two-sample MR methods for summary data have been developed to account for horizontal pleiotropy (HP), primarily based on the assumption that the effects of variants on exposure (γ) and HP (α) are independent. In practice, this assumption is too strict and can be easily violated because of the correlated HP. RESULTS: To account for this correlated HP, we propose a Bayesian approach, MR-Corr2, that uses the orthogonal projection to reparameterize the bivariate normal distribution for γ and α, and a spike-slab prior to mitigate the impact of correlated HP. We have also developed an efficient algorithm with paralleled Gibbs sampling. To demonstrate the advantages of MR-Corr2 over existing methods, we conducted comprehensive simulation studies to compare for both type-I error control and point estimates in various scenarios. By applying MR-Corr2 to study the relationships between exposure-outcome pairs in complex traits, we did not identify the contradictory causal relationship between HDL-c and CAD. Moreover, the results provide a new perspective of the causal network among complex traits. AVAILABILITY AND IMPLEMENTATION: The developed R package and code to reproduce all the results are available at https://github.com/QingCheng0218/MR.Corr2. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Análisis de la Aleatorización Mendeliana , Análisis de la Aleatorización Mendeliana/métodos , Teorema de Bayes , Factores de Riesgo , Simulación por Computador
5.
Nucleic Acids Res ; 48(19): e109, 2020 11 04.
Artículo en Inglés | MEDLINE | ID: mdl-32978944

RESUMEN

Transcriptome-wide association studies (TWASs) integrate expression quantitative trait loci (eQTLs) studies with genome-wide association studies (GWASs) to prioritize candidate target genes for complex traits. Several statistical methods have been recently proposed to improve the performance of TWASs in gene prioritization by integrating the expression regulatory information imputed from multiple tissues, and made significant achievements in improving the ability to detect gene-trait associations. Unfortunately, most existing multi-tissue methods focus on prioritization of candidate genes, and cannot directly infer the specific functional effects of candidate genes across different tissues. Here, we propose a tissue-specific collaborative mixed model (TisCoMM) for TWASs, leveraging the co-regulation of genetic variations across different tissues explicitly via a unified probabilistic model. TisCoMM not only performs hypothesis testing to prioritize gene-trait associations, but also detects the tissue-specific role of candidate target genes in complex traits. To make full use of widely available GWASs summary statistics, we extend TisCoMM to use summary-level data, namely, TisCoMM-S2. Using extensive simulation studies, we show that type I error is controlled at the nominal level, the statistical power of identifying associated genes is greatly improved, and the false-positive rate (FPR) for non-causal tissues is well controlled at decent levels. We further illustrate the benefits of our methods in applications to summary-level GWASs data of 33 complex traits. Notably, apart from better identifying potential trait-associated genes, we can elucidate the tissue-specific role of candidate target genes. The follow-up pathway analysis from tissue-specific genes for asthma shows that the immune system plays an essential function for asthma development in both thyroid and lung tissues.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Estadísticos , Sitios de Carácter Cuantitativo , Transcriptoma , Asma/genética , Asma/inmunología , Predisposición Genética a la Enfermedad , Humanos , Pulmón/inmunología , Herencia Multifactorial/genética , Especificidad de Órganos , Glándula Tiroides/inmunología
6.
Bioinformatics ; 36(7): 2009-2016, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31755899

RESUMEN

MOTIVATION: Although genome-wide association studies (GWAS) have deepened our understanding of the genetic architecture of complex traits, the mechanistic links that underlie how genetic variants cause complex traits remains elusive. To advance our understanding of the underlying mechanistic links, various consortia have collected a vast volume of genomic data that enable us to investigate the role that genetic variants play in gene expression regulation. Recently, a collaborative mixed model (CoMM) was proposed to jointly interrogate genome on complex traits by integrating both the GWAS dataset and the expression quantitative trait loci (eQTL) dataset. Although CoMM is a powerful approach that leverages regulatory information while accounting for the uncertainty in using an eQTL dataset, it requires individual-level GWAS data and cannot fully make use of widely available GWAS summary statistics. Therefore, statistically efficient methods that leverages transcriptome information using only summary statistics information from GWAS data are required. RESULTS: In this study, we propose a novel probabilistic model, CoMM-S2, to examine the mechanistic role that genetic variants play, by using only GWAS summary statistics instead of individual-level GWAS data. Similar to CoMM which uses individual-level GWAS data, CoMM-S2 combines two models: the first model examines the relationship between gene expression and genotype, while the second model examines the relationship between the phenotype and the predicted gene expression from the first model. Distinct from CoMM, CoMM-S2 requires only GWAS summary statistics. Using both simulation studies and real data analysis, we demonstrate that even though CoMM-S2 utilizes GWAS summary statistics, it has comparable performance as CoMM, which uses individual-level GWAS data. AVAILABILITY AND IMPLEMENTATION: The implement of CoMM-S2 is included in the CoMM package that can be downloaded from https://github.com/gordonliu810822/CoMM. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Genotipo , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo
7.
Bioinformatics ; 35(19): 3693-3700, 2019 10 01.
Artículo en Inglés | MEDLINE | ID: mdl-30851102

RESUMEN

MOTIVATION: In genome-wide association studies (GWASs) where multiple correlated traits have been measured on participants, a joint analysis strategy, whereby the traits are analyzed jointly, can improve statistical power over a single-trait analysis strategy. There are two questions of interest to be addressed when conducting a joint GWAS analysis with multiple traits. The first question examines whether a genetic loci is significantly associated with any of the traits being tested. The second question focuses on identifying the specific trait(s) that is associated with the genetic loci. Since existing methods primarily focus on the first question, this article seeks to provide a complementary method that addresses the second question. RESULTS: We propose a novel method, Variational Inference for Multiple Correlated Outcomes (VIMCO) that focuses on identifying the specific trait that is associated with the genetic loci, when performing a joint GWAS analysis of multiple traits, while accounting for correlation among the multiple traits. We performed extensive numerical studies and also applied VIMCO to analyze two datasets. The numerical studies and real data analysis demonstrate that VIMCO improves statistical power over single-trait analysis strategies when the multiple traits are correlated and has comparable performance when the traits are not correlated. AVAILABILITY AND IMPLEMENTATION: The VIMCO software can be downloaded from: https://github.com/XingjieShi/VIMCO. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Estudio de Asociación del Genoma Completo , Programas Informáticos , Sitios Genéticos , Fenotipo , Polimorfismo de Nucleótido Simple , Proyectos de Investigación
8.
Genet Epidemiol ; 41(8): 779-789, 2017 12.
Artículo en Inglés | MEDLINE | ID: mdl-28913902

RESUMEN

Gene expression (GE) studies have been playing a critical role in cancer research. Despite tremendous effort, the analysis results are still often unsatisfactory, because of the weak signals and high data dimensionality. Analysis is often further challenged by the long-tailed distributions of the outcome variables. In recent multidimensional studies, data have been collected on GEs as well as their regulators (e.g., copy number alterations (CNAs), methylation, and microRNAs), which can provide additional information on the associations between GEs and cancer outcomes. In this study, we develop an ARMI (assisted robust marker identification) approach for analyzing cancer studies with measurements on GEs as well as regulators. The proposed approach borrows information from regulators and can be more effective than analyzing GE data alone. A robust objective function is adopted to accommodate long-tailed distributions. Marker identification is effectively realized using penalization. The proposed approach has an intuitive formulation and is computationally much affordable. Simulation shows its satisfactory performance under a variety of settings. TCGA (The Cancer Genome Atlas) data on melanoma and lung cancer are analyzed, which leads to biologically plausible marker identification and superior prediction.


Asunto(s)
Biomarcadores de Tumor/genética , Modelos Genéticos , Neoplasias/genética , Biomarcadores de Tumor/metabolismo , Regulación Neoplásica de la Expresión Génica , Genes Relacionados con las Neoplasias , Humanos , Melanoma/genética , Melanoma/metabolismo , Melanoma/patología , Neoplasias/metabolismo , Neoplasias/patología , Fenotipo , Neoplasias Cutáneas/genética , Neoplasias Cutáneas/metabolismo , Neoplasias Cutáneas/patología
9.
Comput Stat Data Anal ; 124: 235-251, 2018 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-30319163

RESUMEN

Penalization is a popular tool for multi- and high-dimensional data. Most of the existing computational algorithms have been developed for convex loss functions. Nonconvex loss functions can sometimes generate more robust results and have important applications. Motivated by the BLasso algorithm, this study develops the Forward and Backward Stagewise (Fabs) algorithm for nonconvex loss functions with the adaptive Lasso (aLasso) penalty. It is shown that each point along the Fabs paths is a δ-approximate solution to the aLasso problem and the Fabs paths converge to the stationary points of the aLasso problem when δ goes to zero, given that the loss function has second-order derivatives bounded from above. This study exemplifies the Fabs with an application to the penalized smooth partial rank (SPR) estimation, for which there is still a lack of effective algorithm. Extensive numerical studies are conducted to demonstrate the benefit of penalized SPR estimation using Fabs, especially under high-dimensional settings. Application to the smoothed 0-1 loss in binary classification is introduced to demonstrate its capability to work with other differentiable nonconvex loss function.

10.
Genet Epidemiol ; 40(5): 382-93, 2016 07.
Artículo en Inglés | MEDLINE | ID: mdl-27247027

RESUMEN

Genome-wide association studies (GWAS) have led to the identification of many genetic variants associated with complex diseases in the past 10 years. Penalization methods, with significant numerical and statistical advantages, have been extensively adopted in analyzing GWAS. This study has been partly motivated by the analysis of Genetic Analysis Workshop (GAW) 18 data, which have two notable characteristics. First, the subjects are from a small number of pedigrees and hence related. Second, for each subject, multiple correlated traits have been measured. Most of the existing penalization methods assume independence between subjects and traits and can be suboptimal. There are a few methods in the literature based on mixed modeling that can accommodate correlations. However, they cannot fully accommodate the two types of correlations while conducting effective marker selection. In this study, we develop a penalized multitrait mixed modeling approach. It accommodates the two different types of correlations and includes several existing methods as special cases. Effective penalization is adopted for marker selection. Simulation demonstrates its satisfactory performance. The GAW 18 data are analyzed using the proposed method.


Asunto(s)
Estudio de Asociación del Genoma Completo , Modelos Genéticos , Linaje , Carácter Cuantitativo Heredable , Área Bajo la Curva , Simulación por Computador , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple/genética , Curva ROC
11.
Brief Bioinform ; 16(5): 735-44, 2015 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25552438

RESUMEN

For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple independent studies on the same cancer type and outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis data under the Cox model. Lasso, which is representative of a large number of regularization methods, is adopted for generating gene signatures. We examine two families of measures for quantifying the degree of overlap. The first family is based on the Cox-Lasso estimates at the optimal tunings, and the second family is based on estimates across the whole solution paths. Within each family, multiple measures, which describe the overlap from different perspectives, are introduced. The analysis of TCGA (The Cancer Genome Atlas) data on five cancer types shows that the degree of overlap varies across measures, cancer types and types of (epi)genetic measurements. More investigations are needed to better describe and understand the overlaps among gene signatures.


Asunto(s)
Perfilación de la Expresión Génica , Neoplasias/genética , Humanos , Pronóstico , Modelos de Riesgos Proporcionales
12.
Brief Bioinform ; 16(2): 291-303, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24632304

RESUMEN

With accumulating research on the interconnections among different types of genomic regulations, researchers have found that multidimensional genomic studies outperform one-dimensional studies in multiple aspects. Among many sources of multidimensional genomic data, The Cancer Genome Atlas (TCGA) provides the public with comprehensive profiling data on >30 cancer types, making it an ideal test bed for conducting and comparing different analyses. In this article, the analysis goal is to apply several existing methods and associate multidimensional genomic measurements with cancer outcomes in particular prognosis, with special focus on the predictive power of genomic signatures. We exploit clinical data and four types of genomic measurement including mRNA gene expression, DNA methylation, microRNA and copy number alterations for breast invasive carcinoma, glioblastoma multiforme, acute myeloid leukemia and lung squamous cell carcinoma collected by TCGA. To accommodate the high dimensionality, we extract important features using Principal Component Analysis, Partial Least Squares and Least Absolute Shrinkage and Selection Operator (Lasso), which are representative of dimension reduction and variable selection techniques and have been extensively adopted, and fit Cox survival models with combined important features. We calibrate the predictive power of each type of genomic measurement for the prognosis of four cancer types and find that the results vary across cancers. Our analysis also suggests that for most of the cancers in our study and the adopted methods, there is no substantial improvement in prediction when adding other genomic measurement after gene expression and clinical covariates have been included in the model. This is consistent with the findings that molecular features measured at the transcription level affect clinical outcomes more directly than those measured at the DNA/epigenetic level.


Asunto(s)
Genómica/estadística & datos numéricos , Neoplasias/genética , Neoplasias Encefálicas/genética , Neoplasias de la Mama/genética , Carcinoma de Células Escamosas/genética , Biología Computacional , Bases de Datos Genéticas/estadística & datos numéricos , Femenino , Glioblastoma/genética , Humanos , Análisis de los Mínimos Cuadrados , Leucemia Mieloide Aguda/genética , Neoplasias Pulmonares/genética , Masculino , Neoplasias/mortalidad , Análisis de Componente Principal , Pronóstico , Modelos de Riesgos Proporcionales
13.
Genomics ; 107(6): 223-30, 2016 06.
Artículo en Inglés | MEDLINE | ID: mdl-27141884

RESUMEN

Multiple types of genetic, epigenetic, and genomic changes have been implicated in cutaneous melanoma prognosis. Many of the existing studies are limited in analyzing a single type of omics measurement and cannot comprehensively describe the biological processes underlying prognosis. As a result, the obtained prognostic models may be less satisfactory, and the identified prognostic markers may be less informative. The recently collected TCGA (The Cancer Genome Atlas) data have a high quality and comprehensive omics measurements, making it possible to more comprehensively and more accurately model prognosis. In this study, we first describe the statistical approaches that can integrate multiple types of omics measurements with the assistance of variable selection and dimension reduction techniques. Data analysis suggests that, for cutaneous melanoma, integrating multiple types of measurements leads to prognostic models with an improved prediction performance. Informative individual markers and pathways are identified, which can provide valuable insights into melanoma prognosis.


Asunto(s)
Melanoma/genética , Pronóstico , Transcriptoma/genética , Biomarcadores de Tumor/genética , Genómica , Humanos , Melanoma/diagnóstico , Melanoma/patología , Proteómica , Neoplasias Cutáneas , Melanoma Cutáneo Maligno
14.
Brief Bioinform ; 15(5): 671-84, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23788798

RESUMEN

Gene expression profiling has been extensively conducted in cancer research. The analysis of multiple independent cancer gene expression datasets may provide additional information and complement single-dataset analysis. In this study, we conduct multi-dataset analysis and are interested in evaluating the similarity of cancer-associated genes identified from different datasets. The first objective of this study is to briefly review some statistical methods that can be used for such evaluation. Both marginal analysis and joint analysis methods are reviewed. The second objective is to apply those methods to 26 Gene Expression Omnibus (GEO) datasets on five types of cancers. Our analysis suggests that for the same cancer, the marker identification results may vary significantly across datasets, and different datasets share few common genes. In addition, datasets on different cancers share few common genes. The shared genetic basis of datasets on the same or different cancers, which has been suggested in the literature, is not observed in the analysis of GEO data.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Perfilación de la Expresión Génica , Neoplasias/genética , Humanos , Modelos Teóricos
15.
Bioinformatics ; 31(24): 3977-83, 2015 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-26342102

RESUMEN

MOTIVATION: Both gene expression levels (GEs) and copy number alterations (CNAs) have important biological implications. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The regulation analysis is challenging with one gene expression possibly regulated by multiple CNAs and one CNA potentially regulating the expressions of multiple genes. The correlations among GEs and among CNAs make the analysis even more complicated. The existing methods have limitations and cannot comprehensively describe the regulation. RESULTS: A sparse double Laplacian shrinkage method is developed. It jointly models the effects of multiple CNAs on multiple GEs. Penalization is adopted to achieve sparsity and identify the regulation relationships. Network adjacency is computed to describe the interconnections among GEs and among CNAs. Two Laplacian shrinkage penalties are imposed to accommodate the network adjacency measures. Simulation shows that the proposed method outperforms the competing alternatives with more accurate marker identification. The Cancer Genome Atlas data are analysed to further demonstrate advantages of the proposed method. AVAILABILITY AND IMPLEMENTATION: R code is available at http://works.bepress.com/shuangge/49/.


Asunto(s)
Dosificación de Gen , Regulación de la Expresión Génica , Expresión Génica , Modelos Teóricos , Humanos , Neoplasias/genética
16.
Genet Epidemiol ; 38(3): 220-30, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24616063

RESUMEN

In high-throughput studies, an important objective is to identify gene-environment interactions associated with disease outcomes and phenotypes. Many commonly adopted methods assume specific parametric or semiparametric models, which may be subject to model misspecification. In addition, they usually use significance level as the criterion for selecting important interactions. In this study, we adopt the rank-based estimation, which is much less sensitive to model specification than some of the existing methods and includes several commonly encountered data and models as special cases. Penalization is adopted for the identification of gene-environment interactions. It achieves simultaneous estimation and identification and does not rely on significance level. For computation feasibility, a smoothed rank estimation is further proposed. Simulation shows that under certain scenarios, for example, with contaminated or heavy-tailed data, the proposed method can significantly outperform the existing alternatives with more accurate identification. We analyze a lung cancer prognosis study with gene expression measurements under the AFT (accelerated failure time) model. The proposed method identifies interactions different from those using the alternatives. Some of the identified genes have important implications.


Asunto(s)
Interacción Gen-Ambiente , Modelos Genéticos , Ambiente , Humanos , Neoplasias Pulmonares/genética , Masculino , Pronóstico , Factores de Riesgo , Factores de Tiempo
17.
Genet Epidemiol ; 38(2): 144-51, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24395534

RESUMEN

In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms "classic" meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by introducing the contrast penalties, which can accommodate the within- and across-dataset structures of covariates/regression coefficients and, by doing so, further improve marker selection performance. Specifically, we develop a penalization method that accommodates the across-dataset structures by smoothing over regression coefficients. An effective iterative algorithm, which calls an inner coordinate descent iteration, is developed. Simulation shows that the proposed method outperforms the benchmark with more accurate marker identification. The analysis of breast cancer and lung cancer prognosis studies with gene expression measurements shows that the proposed method identifies genes different from those using the benchmark and has better prediction performance.


Asunto(s)
Neoplasias/genética , Algoritmos , Neoplasias de la Mama/diagnóstico , Neoplasias de la Mama/genética , Simulación por Computador , Femenino , Marcadores Genéticos , Humanos , Neoplasias Pulmonares/diagnóstico , Neoplasias Pulmonares/genética , Modelos Genéticos , Neoplasias/diagnóstico , Pronóstico
18.
Stat Med ; 34(30): 4016-30, 2015 Dec 30.
Artículo en Inglés | MEDLINE | ID: mdl-26239060

RESUMEN

In genetic and genomic studies, gene-environment (G×E) interactions have important implications. Some of the existing G×E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection techniques. In this study, we propose a new approach for identifying important G×E interactions. It jointly models the effects of all E and G factors and their interactions. A partially linear varying coefficient model is adopted to accommodate possible nonlinear effects of E factors. A rank-based loss function is used to accommodate possible data contamination. Penalization, which has been extensively used with high-dimensional data, is adopted for selection. The proposed penalized estimation approach can automatically determine if a G factor has an interaction with an E factor, main effect but not interaction, or no effect at all. The proposed approach can be effectively realized using a coordinate descent algorithm. Simulation shows that it has satisfactory performance and outperforms several competing alternatives. The proposed approach is used to analyze a lung cancer study with gene expression measurements and clinical variables. Copyright © 2015 John Wiley & Sons, Ltd.


Asunto(s)
Interacción Gen-Ambiente , Modelos Genéticos , Modelos Estadísticos , Algoritmos , Biomarcadores de Tumor/genética , Bioestadística , Simulación por Computador , Bases de Datos Genéticas , Femenino , Expresión Génica , Humanos , Modelos Lineales , Neoplasias Pulmonares/etiología , Neoplasias Pulmonares/genética , Masculino , Polimorfismo de Nucleótido Simple
19.
Front Genet ; 15: 1333855, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-38313677

RESUMEN

Background: Cerebral aneurysms (CAs) are a significant cerebrovascular ailment with a multifaceted etiology influenced by various factors including heredity and environment. This study aimed to explore the possible link between different types of immune cells and the occurrence of CAs. Methods: We analyzed the connection between 731 immune cell signatures and the risk of CAs by using publicly available genetic data. The analysis included four immune features, specifically median brightness levels (MBL), proportionate cell (PC), definite cell (DC), and morphological attributes (MA). Mendelian randomization (MR) analysis was conducted using the instrumental variables (IVs) derived from the genetic variation linked to CAs. Results: After multiple test adjustment based on the FDR method, the inverse variance weighted (IVW) method revealed that 3 immune cell phenotypes were linked to the risk of CAs. These included CD45 on HLA DR+NK (odds ratio (OR), 1.116; 95% confidence interval (CI), 1.001-1.244; p = 0.0489), CX3CR1 on CD14- CD16- (OR, 0.973; 95% CI, 0.948-0.999; p = 0.0447). An immune cell phenotype CD16- CD56 on NK was found to have a significant association with the risk of CAs in reverse MR study (OR, 0.950; 95% CI, 0.911-0.990; p = 0.0156). Conclusion: Our investigation has yielded findings that support a substantial genetic link between immune cells and CAs, thereby suggesting possible implications for future clinical interventions.

20.
Nat Commun ; 14(1): 296, 2023 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-36653349

RESUMEN

Spatially resolved transcriptomics involves a set of emerging technologies that enable the transcriptomic profiling of tissues with the physical location of expressions. Although a variety of methods have been developed for data integration, most of them are for single-cell RNA-seq datasets without consideration of spatial information. Thus, methods that can integrate spatial transcriptomics data from multiple tissue slides, possibly from multiple individuals, are needed. Here, we present PRECAST, a data integration method for multiple spatial transcriptomics datasets with complex batch effects and/or biological effects between slides. PRECAST unifies spatial factor analysis simultaneously with spatial clustering and embedding alignment, while requiring only partially shared cell/domain clusters across datasets. Using both simulated and four real datasets, we show improved cell/domain detection with outstanding visualization, and the estimated aligned embeddings and cell/domain labels facilitate many downstream analyses. We demonstrate that PRECAST is computationally scalable and applicable to spatial transcriptomics datasets from different platforms.


Asunto(s)
Perfilación de la Expresión Génica , Transcriptoma , Humanos , Transcriptoma/genética , Perfilación de la Expresión Génica/métodos , Análisis por Conglomerados , Análisis Espacial , Secuenciación del Exoma , Análisis de la Célula Individual/métodos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA