Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
Mais filtros










Base de dados
Intervalo de ano de publicação
1.
Front Genet ; 14: 1282824, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-38028629

RESUMO

Background: Pancreatic ductal adenocarcinoma (PDAC) is a lethal disease characterized by a diverse tumor microenvironment. The heterogeneous cellular composition of PDAC makes it challenging to study molecular features of tumor cells using extracts from bulk tumor. The metabolic features in tumor cells from clinical samples are poorly understood, and their impact on clinical outcomes are unknown. Our objective was to identify the metabolic features in the tumor compartment that are most clinically impactful. Methods: A computational deconvolution approach using the DeMixT algorithm was applied to bulk RNASeq data from The Cancer Genome Atlas to determine the proportion of each gene's expression that was attributable to the tumor compartment. A machine learning algorithm designed to identify features most closely associated with survival outcomes was used to identify the most clinically impactful metabolic genes. Results: Two metabolic subtypes (M1 and M2) were identified, based on the pattern of expression of the 26 most important metabolic genes. The M2 phenotype had a significantly worse survival, which was replicated in three external PDAC cohorts. This PDAC subtype was characterized by net glycogen catabolism, accelerated glycolysis, and increased proliferation and cellular migration. Single cell data demonstrated substantial intercellular heterogeneity in the metabolic features that typified this aggressive phenotype. Conclusion: By focusing on features within the tumor compartment, two novel and clinically impactful metabolic subtypes of PDAC were identified. Our study emphasizes the challenges of defining tumor phenotypes in the face of the significant intratumoral heterogeneity that typifies PDAC. Further studies are required to understand the microenvironmental factors that drive the appearance of the metabolic features characteristic of the aggressive M2 PDAC phenotype.

2.
Nat Struct Mol Biol ; 30(12): 1878-1892, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-37932451

RESUMO

Emerging evidence suggests that cryptic translation beyond the annotated translatome produces proteins with developmental or physiological functions. However, functions of cryptic non-canonical open reading frames (ORFs) in cancer remain largely unknown. To fill this gap and systematically identify colorectal cancer (CRC) dependency on non-canonical ORFs, we apply an integrative multiomic strategy, combining ribosome profiling and a CRISPR-Cas9 knockout screen with large-scale analysis of molecular and clinical data. Many such ORFs are upregulated in CRC compared to normal tissues and are associated with clinically relevant molecular subtypes. We confirm the in vivo tumor-promoting function of the microprotein SMIMP, encoded by a primate-specific, long noncoding RNA, the expression of which is associated with poor prognosis in CRC, is low in normal tissues and is specifically elevated in CRC and several other cancer types. Mechanistically, SMIMP interacts with the ATPase-forming domains of SMC1A, the core subunit of the cohesin complex, and facilitates SMC1A binding to cis-regulatory elements to promote epigenetic repression of the tumor-suppressive cell cycle regulators encoded by CDKN1A and CDKN2B. Thus, our study reveals a cryptic microprotein as an important component of cohesin-mediated gene regulation and suggests that the 'dark' proteome, encoded by cryptic non-canonical ORFs, may contain potential therapeutic or diagnostic targets.


Assuntos
Sistemas CRISPR-Cas , Neoplasias , Animais , Humanos , Fases de Leitura Aberta/genética , Sistemas CRISPR-Cas/genética , Neoplasias/genética , Proteoma/genética
3.
J Immunother Cancer ; 11(8)2023 08.
Artigo em Inglês | MEDLINE | ID: mdl-37604640

RESUMO

BACKGROUND: TP53, the most mutated gene in solid cancers, has a profound impact on most hallmarks of cancer. Somatic TP53 mutations occur in high frequencies in head and neck cancers, including oral squamous cell carcinoma (OSCC). Our study aims to understand the role of TP53 gain-of-function mutation in modulating the tumor immune microenvironment (TIME) in OSCC. METHODS: Short hairpin RNA knockdown of mutant p53R172H in syngeneic oral tumors demonstrated changes in tumor growth between immunocompetent and immunodeficient mice. HTG EdgeSeq targeted messenger RNA sequencing was used to analyze cytokine and immune cell markers in tumors with inactivated mutant p53R172H. Flow cytometry and multiplex immunofluorescence (mIF) confirmed the role of mutant p53R172H in the TIME. The gene expression of patients with OSCC was analyzed by CIBERSORT and mIF was used to validate the immune landscape at the protein level. RESULTS: Mutant p53R172H contributes to a cytokine transcriptome network that inhibits the infiltration of cytotoxic CD8+ T cells and promotes intratumoral recruitment of regulatory T cells and M2 macrophages. Moreover, p53R172H also regulates the spatial distribution of immunocyte populations, and their distribution between central and peripheral intratumoral locations. Interestingly, p53R172H-mutated tumors are infiltrated with CD8+ and CD4+ T cells expressing programmed cell death protein 1, and these tumors responded to immune checkpoint inhibitor and stimulator of interferon gene 1 agonist therapy. CIBERSORT analysis of human OSCC samples revealed associations between immune cell populations and the TP53R175H mutation, which paralleled the findings from our syngeneic mouse tumor model. CONCLUSIONS: These findings demonstrate that syngeneic tumors bearing the TP53R172H gain-of-function mutation modulate the TIME to evade tumor immunity, leading to tumor progression and decreased survival.


Assuntos
Carcinoma de Células Escamosas , Neoplasias de Cabeça e Pescoço , Neoplasias Bucais , Microambiente Tumoral , Proteína Supressora de Tumor p53 , Animais , Humanos , Camundongos , Carcinoma de Células Escamosas/genética , Linfócitos T CD8-Positivos , Citocinas , Modelos Animais de Doenças , Mutação com Ganho de Função , Neoplasias Bucais/genética , Mutação , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética , Proteína Supressora de Tumor p53/genética
4.
BMC Genomics ; 24(1): 228, 2023 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-37131143

RESUMO

BACKGROUND: Single-cell RNA sequencing is a state-of-the-art technology to understand gene expression in complex tissues. With the growing amount of data being generated, the standardization and automation of data analysis are critical to generating hypotheses and discovering biological insights. RESULTS: Here, we present scRNASequest, a semi-automated single-cell RNA-seq (scRNA-seq) data analysis workflow which allows (1) preprocessing from raw UMI count data, (2) harmonization by one or multiple methods, (3) reference-dataset-based cell type label transfer and embedding projection, (4) multi-sample, multi-condition single-cell level differential gene expression analysis, and (5) seamless integration with cellxgene VIP for visualization and with CellDepot for data hosting and sharing by generating compatible h5ad files. CONCLUSIONS: We developed scRNASequest, an end-to-end pipeline for single-cell RNA-seq data analysis, visualization, and publishing. The source code under MIT open-source license is provided at https://github.com/interactivereport/scRNASequest . We also prepared a bookdown tutorial for the installation and detailed usage of the pipeline: https://interactivereport.github.io/scRNAsequest/tutorial/docs/ . Users have the option to run it on a local computer with a Linux/Unix system including MacOS, or interact with SGE/Slurm schedulers on high-performance computing (HPC) clusters.


Assuntos
Ecossistema , Perfilação da Expressão Gênica , Perfilação da Expressão Gênica/métodos , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Software , Editoração
6.
iScience ; 25(7): 104551, 2022 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-35747385

RESUMO

Whole-organ mapping was used to study molecular changes in the evolution of bladder cancer from field effects. We identified more than 100 dysregulated pathways, involving immunity, differentiation, and transformation, as initiators of carcinogenesis. Dysregulation of interleukins signified the involvement of inflammation in the incipient phases of the process. An aberrant methylation/expression of multiple HOX genes signified dysregulation of the differentiation program. We identified three types of mutations based on their geographic distribution. The most common were mutations restricted to individual mucosal samples that targeted uroprogenitor cells. Two types of mutations were associated with clonal expansion and involved large areas of mucosa. The α mutations occurred at low frequencies while the ß mutations increased in frequency with disease progression. Modeling revealed that bladder carcinogenesis spans 10-15 years and can be divided into dormant and progressive phases. The progressive phase lasted 1-2 years and was driven by ß mutations.

7.
Nat Biotechnol ; 40(11): 1624-1633, 2022 11.
Artigo em Inglês | MEDLINE | ID: mdl-35697807

RESUMO

Single-cell RNA sequencing studies have suggested that total mRNA content correlates with tumor phenotypes. Technical and analytical challenges, however, have so far impeded at-scale pan-cancer examination of total mRNA content. Here we present a method to quantify tumor-specific total mRNA expression (TmS) from bulk sequencing data, taking into account tumor transcript proportion, purity and ploidy, which are estimated through transcriptomic/genomic deconvolution. We estimate and validate TmS in 6,590 patient tumors across 15 cancer types, identifying significant inter-tumor variability. Across cancers, high TmS is associated with increased risk of disease progression and death. TmS is influenced by cancer-specific patterns of gene alteration and intra-tumor genetic heterogeneity as well as by pan-cancer trends in metabolic dysregulation. Taken together, our results indicate that measuring cell-type-specific total mRNA expression in tumor cells predicts tumor phenotypes and clinical outcomes.


Assuntos
Neoplasias , Humanos , Neoplasias/genética , Neoplasias/metabolismo , Heterogeneidade Genética , Genômica , RNA Mensageiro/genética , Progressão da Doença
9.
PPAR Res ; 2021: 5525091, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34054937

RESUMO

Our previous study showed that the upregulation of peroxisome proliferator-activated receptor gamma (PPARG) could promote chemosensitivity of hypopharyngeal squamous cell carcinoma (HSCC) in chemotherapeutic treatments. Here, we acquired two more independent expression data of PPARG to validate the expression levels of PPARG in chemotherapy-sensitive patients (CSP) and its individualized variations compared to chemotherapy-non-sensitive patients (CNSP). Our results showed that overall PPARG expression was mildly downregulated (log fold change = -0.55; p value = 0.42; overexpression in three CSPs and reduced expression in four CSPs), which was not consistent with previous results (log fold change = 0.50; p = 0.22; overexpression in nine CSPs and reduced expression in three CSPs). Both studies indicated that PPARG expression variation was significantly associated with the Tumor-Node-Metastasis (TNM) stage (p = 7.45e - 7 and 6.50e - 4, for the first and second studies, respectively), which was used as one of the predictors of chemosensitivity. The new dataset analysis revealed 51 genes with significant gene expression changes in CSPs (LFC > 1 or <-1; p value < 0.01), and two of them (TMEM45A and RBP1) demonstrated strong coexpression with PPARG (Pearson correlation coefficient > 0.6 or <-0.6). There were 21 significant genes in the data from the first study, with no significant association with PPARG and no overlap with the 51 genes revealed in this study. Our results support the connection between PPARG and chemosensitivity in HSCC tumor cells. However, significant PPARG variation exists in CSPs, which may be influenced by multiple factors, including the TNM stage.

10.
Nat Commun ; 11(1): 1008, 2020 02 21.
Artigo em Inglês | MEDLINE | ID: mdl-32081846

RESUMO

Limited clinical activity has been seen in osteosarcoma (OS) patients treated with immune checkpoint inhibitors (ICI). To gain insights into the immunogenic potential of these tumors, we conducted whole genome, RNA, and T-cell receptor sequencing, immunohistochemistry and reverse phase protein array profiling (RPPA) on OS specimens from 48 pediatric and adult patients with primary, relapsed, and metastatic OS. Median immune infiltrate level was lower than in other tumor types where ICI are effective, with concomitant low T-cell receptor clonalities. Neoantigen expression in OS was lacking and significantly associated with high levels of nonsense-mediated decay (NMD). Samples with low immune infiltrate had higher number of deleted genes while those with high immune infiltrate expressed higher levels of adaptive resistance pathways. PARP2 expression levels were significantly negatively associated with the immune infiltrate. Together, these data reveal multiple immunosuppressive features of OS and suggest immunotherapeutic opportunities in OS patients.


Assuntos
Neoplasias Ósseas/genética , Neoplasias Ósseas/imunologia , Osteossarcoma/genética , Osteossarcoma/imunologia , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Neoplasias Ósseas/patologia , Criança , Pré-Escolar , Estudos de Coortes , Feminino , Humanos , Fenômenos Imunogenéticos , Masculino , Pessoa de Meia-Idade , Mutação , Osteossarcoma/secundário , RNA-Seq , Receptores de Antígenos de Linfócitos T/genética , Sequenciamento Completo do Genoma , Adulto Jovem
11.
Sci Rep ; 9(1): 10863, 2019 07 26.
Artigo em Inglês | MEDLINE | ID: mdl-31350445

RESUMO

Differential network analysis investigates how the network of connected genes changes from one condition to another and has become a prevalent tool to provide a deeper and more comprehensive understanding of the molecular etiology of complex diseases. Based on the asymptotically normal estimation of large Gaussian graphical model (GGM) in the high-dimensional setting, we developed a computationally efficient test for differential network analysis through testing the equality of two precision matrices, which summarize the conditional dependence network structures of the genes. Additionally, we applied a multiple testing procedure to infer the differential network structure with false discovery rate (FDR) control. Through extensive simulation studies with different combinations of parameters including sample size, number of vertices, level of heterogeneity and graph structure, we demonstrated that our method performed much better than the current available methods in terms of accuracy and computational time. In real data analysis on lung adenocarcinoma, we revealed a differential network with 3503 nodes and 2550 edges, which consisted of 50 clusters with an FDR threshold at 0.05. Many of the top gene pairs in the differential network have been reported relevant to human cancers. Our method represents a powerful tool of network analysis for high-dimensional biological data.


Assuntos
Adenocarcinoma de Pulmão/genética , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Redes Reguladoras de Genes , Neoplasias Pulmonares/genética , Modelos Estatísticos , Algoritmos , Correlação de Dados , Confiabilidade dos Dados , Humanos , Distribuição Normal , RNA-Seq
12.
iScience ; 9: 451-460, 2018 Nov 30.
Artigo em Inglês | MEDLINE | ID: mdl-30469014

RESUMO

Transcriptome deconvolution in cancer and other heterogeneous tissues remains challenging. Available methods lack the ability to estimate both component-specific proportions and expression profiles for individual samples. We present DeMixT, a new tool to deconvolve high-dimensional data from mixtures of more than two components. DeMixT implements an iterated conditional mode algorithm and a novel gene-set-based component merging approach to improve accuracy. In a series of experimental validation studies and application to TCGA data, DeMixT showed high accuracy. Improved deconvolution is an important step toward linking tumor transcriptomic data with clinical outcomes. An R package, scripts, and data are available: https://github.com/wwylab/DeMixTallmaterials.

13.
IEEE/ACM Trans Comput Biol Bioinform ; 15(4): 1066-1078, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29990279

RESUMO

The method of Sorted L-One Penalized Estimation, or SLOPE, is a sparse regression method recently introduced by Bogdan et. al. [1] . It can be used to identify significant predictor variables in a linear model that may have more unknown parameters than observations. When the correlations between predictor variables are small, the SLOPE method is shown to successfully control the false discovery rate (the expected proportion of the irrelevant among all selected predictors) at a user specified level. However, the requirement for nearly uncorrelated predictors is too restrictive for genomic data, as demonstrated in our recent study [2] by an application of SLOPE to realistic simulated DNA sequence data. A possible solution is to divide the predictor variables into nearly uncorrelated groups, and to modify the procedure to select entire groups with an overall significant group effect, rather than individual predictors. Following this motivation, we extend SLOPE in the spirit of Group LASSO to Group SLOPE, a method that can handle group structures between the predictor variables, which are ubiquitous in real genomic data. Our theoretical results show that Group SLOPE controls the group-wise false discovery rate (gFDR), when groups are orthogonal to each other. For use in non-orthogonal settings, we propose two types of Monte Carlo based heuristics, which lead to gFDR control with Group SLOPE in simulations based on real SNP data. As an illustration of the merits of this method, an application of Group SLOPE to a dataset from the Framingham Heart Study results in the identification of some known DNA sequence regions associated with bone health, as well as some new candidate regions. The novel methods are implemented in the R package grpSLOPEMC , which is publicly available at https://github.com/agisga/grpSLOPEMC.


Assuntos
Biologia Computacional/métodos , Análise de Regressão , Algoritmos , Bases de Dados Factuais , Humanos , Aprendizado de Máquina
14.
IEEE Trans Biomed Eng ; 65(2): 390-399, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29364120

RESUMO

Finding correlations across multiple data sets in imaging and (epi)genomics is a common challenge. Sparse multiple canonical correlation analysis (SMCCA) is a multivariate model widely used to extract contributing features from each data while maximizing the cross-modality correlation. The model is achieved by using the combination of pairwise covariances between any two data sets. However, the scales of different pairwise covariances could be quite different and the direct combination of pairwise covariances in SMCCA is unfair. The problem of "unfair combination of pairwise covariances" restricts the power of SMCCA for feature selection. In this paper, we propose a novel formulation of SMCCA, called adaptive SMCCA, to overcome the problem by introducing adaptive weights when combining pairwise covariances. Both simulation and real-data analysis show the outperformance of adaptive SMCCA in terms of feature selection over conventional SMCCA and SMCCA with fixed weights. Large-scale numerical experiments show that adaptive SMCCA converges as fast as conventional SMCCA. When applying it to imaging (epi)genetics study of schizophrenia subjects, we can detect significant (epi)genetic variants and brain regions, which are consistent with other existing reports. In addition, several significant brain-development related pathways, e.g., neural tube development, are detected by our model, demonstrating imaging epigenetic association may be overlooked by conventional SMCCA. All these results demonstrate that adaptive SMCCA are well suited for detecting three-way or multiway correlations and thus can find widespread applications in multiple omics and imaging data integration.


Assuntos
Epigenômica/métodos , Neuroimagem/métodos , Esquizofrenia/diagnóstico por imagem , Esquizofrenia/genética , Algoritmos , Simulação por Computador , Metilação de DNA , Humanos , Processamento de Imagem Assistida por Computador , Modelos Estatísticos , Curva ROC
15.
Sci Rep ; 7(1): 1799, 2017 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-28496128

RESUMO

To explore novel molecular mechanisms underlying obesity, we applied a systems genetics framework to integrate risk genetic loci from the largest body mass index (BMI) genome-wide association studies (GWAS) meta-analysis with mRNA and microRNA profiling in adipose tissue from 200 subjects. One module was identified to be most significantly associated with obesity and other metabolic traits. We identified eight hub genes which likely play important roles in obesity metabolism and identified microRNAs that significantly negatively correlated with hub genes. This module was preserved in other three test gene expression datasets, and all hub genes were consistently downregulated in obese subjects through the meta-analysis. Gene GPD1L had the highest connectivity and was identified a key causal regulator in the module. Gene GPD1L was significantly negatively correlated with the expression of miR-210, which was experimentally validated that miR-210 regulated GPD1L protein level through direct interaction with its mRNA three prime untranslated region (3'-UTR). GPD1L was found to be upregulated during weight loss and weight maintenance induced by low calorie diet (LCD), while downregulated during weight gain induced by high-fat diet (HFD). The results indicated that increased GPD1L in adipose tissue may have a significant therapeutic potential in reducing obesity and insulin resistance.


Assuntos
Tecido Adiposo/metabolismo , Estudos de Associação Genética , Predisposição Genética para Doença , Glicerolfosfato Desidrogenase/genética , Obesidade/genética , Obesidade/metabolismo , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , MicroRNAs/genética , Modelos Moleculares , RNA Mensageiro/genética , Reprodutibilidade dos Testes
16.
IEEE/ACM Trans Comput Biol Bioinform ; 14(5): 1147-1153, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28113675

RESUMO

In this study, in order to take advantage of complementary information from different types of data for better disease status diagnosis, we combined gene expression with DNA methylation data and generated a fused network, based on which the stages of Kidney Renal Cell Carcinoma (KIRC) can be better identified. It is well recognized that a network is important for investigating the connectivity of disease groups. We exploited the potential of the network's features to identify the KIRC stage. We first constructed a patient network from each type of data. We then built a fused network based on network fusion method. Based on the link weights of patients, we used a generalized linear model to predict the group of KIRC subjects. Finally, the group prediction method was applied to test the power of network-based features. The performance (e.g., the accuracy of identifying cancer stages) when using the fused network from two types of data is shown to be superior to that when using two patient networks from only one data type. The work provides a good example for using network based features from multiple data types for a more comprehensive diagnosis.


Assuntos
Biomarcadores Tumorais/genética , Carcinoma de Células Renais , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Neoplasias Renais , Biomarcadores Tumorais/análise , Carcinoma de Células Renais/classificação , Carcinoma de Células Renais/diagnóstico , Carcinoma de Células Renais/genética , Metilação de DNA/genética , Bases de Dados Genéticas , Humanos , Neoplasias Renais/classificação , Neoplasias Renais/diagnóstico , Neoplasias Renais/genética
17.
PLoS One ; 11(1): e0147475, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26808152

RESUMO

BACKGROUND: Existing microarray studies of bone mineral density (BMD) have been critical for understanding the pathophysiology of osteoporosis, and have identified a number of candidate genes. However, these studies were limited by their relatively small sample sizes and were usually analyzed individually. Here, we propose a novel network-based meta-analysis approach that combines data across six microarray studies to identify functional modules from human protein-protein interaction (PPI) data, and highlight several differentially expressed genes (DEGs) and a functional module that may play an important role in BMD regulation in women. METHODS: Expression profiling studies were identified by searching PubMed, Gene Expression Omnibus (GEO) and ArrayExpress. Two meta-analysis methods were applied across different gene expression profiling studies. The first, a nonparametric Fisher's method, combined p-values from individual experiments to identify genes with large effect sizes. The second method combined effect sizes from individual datasets into a meta-effect size to gain a higher precision of effect size estimation across all datasets. Genes with Q test's p-values < 0.05 or I(2) values > 50% were assessed by a random effects model and the remainder by a fixed effects model. Using Fisher's combined p-values, functional modules were identified through an integrated analysis of microarray data in the context of large protein-protein interaction (PPI) networks. Two previously published meta-analysis studies of genome-wide association (GWA) datasets were used to determine whether these module genes were genetically associated with BMD. Pathway enrichment analysis was performed with a hypergeometric test. RESULTS: Six gene expression datasets were identified, which included a total of 249 (129 high BMD and 120 low BMD) female subjects. Using a network-based meta-analysis, a consensus module containing 58 genes (nodes) and 83 edges was detected. Pathway enrichment analysis of the 58 module genes revealed that these genes were enriched in several important KEGG pathways including Osteoclast differentiation, B cell receptor signaling pathway, MAPK signaling pathway, Chemokine signaling pathway and Insulin signaling pathway. The importance of module genes was replicated by demonstrating that most module genes were genetically associated with BMD in the GWAS data sets. Meta-analyses were performed at the individual gene level by combining p-values and effect sizes. Five candidate genes (ESR1, MAP3K3, PYGM, RAC1 and SYK) were identified based on gene expression meta-analysis, and their associations with BMD were also replicated by two BMD meta-analysis studies. CONCLUSIONS: In summary, our network-based meta-analysis not only identified important differentially expressed genes but also discovered biologically meaningful functional modules for BMD determination. Our study may provide novel therapeutic targets for osteoporosis in women.


Assuntos
Densidade Óssea/genética , Perfilação da Expressão Gênica , Osteoporose/genética , Feminino , Humanos
18.
Bioinformatics ; 32(3): 330-7, 2016 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26458888

RESUMO

MOTIVATION: In searching for genetic variants for complex diseases with deep sequencing data, genomic marker sets of high-dimensional genotypic data and sparse functional variants are quite common. Existing sequence association tests are incapable of identifying such marker sets or individual causal loci, although they appeared powerful to identify small marker sets with dense functional variants. In sequence association studies of admixed individuals, cryptic relatedness and population structure are known to confound the association analyses. METHOD: We here propose a unified marker wise test (uFineMap) to accurately localize causal loci and a unified high-dimensional set based test (uHDSet) to identify high-dimensional sparse associations in deep sequencing genomic data of multi-ethnic individuals with random relatedness. These two novel tests are based on scaled sparse linear mixed regressions with Lp (0 < p < 1) norm regularization. They jointly adjust for cryptic relatedness, population structure and other confounders to prevent false discoveries and improve statistical power for identifying promising individual markers and marker sets that harbor functional genetic variants of a complex trait. RESULTS: With large scale simulation data and real data analyses, the proposed tests appropriately controlled Type I error rates and appeared to be more powerful than several prominent methods. We illustrated their practical utilities by the applications to DNA sequence data of Framingham Heart Study for osteoporosis. The proposed tests identified 11 novel significant genes that were missed by the prominent famSKAT and GEMMA. In particular, four out of six most significant pathways identified by the uHDSet but missed by famSKAT have been reported to be related to BMD or osteoporosis in the literature. AVAILABILITY AND IMPLEMENTATION: The computational toolkit is available for academic use: https://sites.google.com/site/shaolongscode/home/uhdset CONTACT: wyp@tulane.edu SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Variação Genética , Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Mapeamento Cromossômico , Genômica/métodos , Técnicas de Genotipagem , Humanos , Modelos Lineares , Osteoporose/genética , Fenótipo
19.
Genet Epidemiol ; 38(8): 671-9, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-25195875

RESUMO

Joint adjustment of cryptic relatedness and population structure is necessary to reduce bias in DNA sequence analysis; however, existent sparse regression methods model these two confounders separately. Incorporating prior biological information has great potential to enhance statistical power but such information is often overlooked in many existent sparse regression models. We developed a unified sparse regression (USR) to incorporate prior information and jointly adjust for cryptic relatedness, population structure, and other environmental covariates. Our USR models cryptic relatedness as a random effect and population structure as fixed effect, and utilize the weighted penalties to incorporate prior knowledge. As demonstrated by extensive simulations, our USR algorithm can discover more true causal variants and maintain a lower false discovery rate than do several commonly used feature selection methods. It can handle both rare and common variants simultaneously. Applying our USR algorithm to DNA sequence data of Mexican Americans from GAW18, we replicated three hypertension pathways, demonstrating the effectiveness in identifying susceptibility genetic variants.


Assuntos
Variação Genética , Análise de Sequência de DNA/métodos , Algoritmos , Loci Gênicos , Estudo de Associação Genômica Ampla , Humanos , Modelos Genéticos , Análise de Regressão
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...