Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 33
Filtrar
1.
PLoS Genet ; 19(12): e1011074, 2023 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-38109434

RESUMO

Linkage disequilibrium (LD) is a fundamental concept in genetics; critical for studying genetic associations and molecular evolution. However, LD measurements are only reliable for common genetic variants, leaving low-frequency variants unanalyzed. In this work, we introduce cumulative LD (cLD), a stable statistic that captures the rare-variant LD between genetic regions, which reflects more biological interactions between variants, in addition to lack of recombination. We derived the theoretical variance of cLD using delta methods to demonstrate its higher stability than LD for rare variants. This property is also verified by bootstrapped simulations using real data. In application, we find cLD reveals an increased genetic association between genes in 3D chromatin interactions, a phenomenon recently reported negatively by calculating standard LD between common variants. Additionally, we show that cLD is higher between gene pairs reported in interaction databases, identifies unreported protein-protein interactions, and reveals interacting genes distinguishing case/control samples in association studies.


Assuntos
Genômica , Polimorfismo de Nucleotídeo Único , Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único/genética
2.
PLoS Comput Biol ; 19(10): e1011476, 2023 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-37782668

RESUMO

Machine Learning models have been frequently used in transcriptome analyses. Particularly, Representation Learning (RL), e.g., autoencoders, are effective in learning critical representations in noisy data. However, learned representations, e.g., the "latent variables" in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up. In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes. Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models. However, the lack of interpretability and individual target genes is an obstacle for RL's broad use in practice. To facilitate interpretable analysis and gene-identification using RL, we propose "Critical genes", defined as genes that contribute highly to learned representations (e.g., latent variables in an autoencoder). As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene's contribution to latent variables, based on which Critical genes are prioritized. Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers. Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET) and a cancer-specific database (COSMIC), evidencing its potential to disclose massive unknown biology. As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues. In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions.


Assuntos
Inteligência Artificial , Neoplasias , Humanos , Genes Essenciais , Bases de Dados Factuais , Perfilação da Expressão Gênica
3.
Am J Pathol ; 189(9): 1732-1743, 2019 09.
Artigo em Inglês | MEDLINE | ID: mdl-31199922

RESUMO

Approximately 15% to 20% of colorectal cancers are developed through the serrated pathway of tumorigenesis, which is associated with BRAF mutation, CpG island methylation phenotype, and MLH1 methylation. However, the detailed process of progression from sessile serrated adenoma (SSA) to dysplasia and carcinoma has not been elucidated. To further characterize mechanisms involved in the dysplastic progression of SSA, we investigated differential expressions of mRNAs between areas with and without dysplasia within the same SSA polyps. Significantly dysregulated genes in paired samples were applied for functional annotation and biological significance. The same lysates from a subset of matched samples were subjected for miRNA expression profiling. Differentially expressed miRNAs were determined, and their targeted mRNAs were compared in parallel to the list of differentially expressed mRNAs from an RNA sequencing study. Fourteen common mRNA targets were identified, which include AXIN2, a known indicator of WNT/ß-catenin pathway activation. Together, in this study, different genes, pathways, and biological processes involved in the initiation and progression of dysplasia in the serrated pathway are documented. One of the most significant findings is the involvement of the WNT/ß-catenin pathway in the dysplastic progression of SSAs with different genes being targeted in early versus advanced dysplasia.


Assuntos
Adenoma/patologia , Pólipos Adenomatosos/patologia , Mutação , Via de Sinalização Wnt , Adenoma/genética , Adenoma/metabolismo , Pólipos Adenomatosos/genética , Pólipos Adenomatosos/metabolismo , Idoso , Progressão da Doença , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino
4.
Nat Genet ; 39(5): 605-13, 2007 May.
Artigo em Inglês | MEDLINE | ID: mdl-17450141

RESUMO

Caspases are important in the life and death of immune cells and therefore influence immune surveillance of malignancies. We tested whether genetic variants in CASP8, CASP10 and CFLAR, three genes important for death receptor-induced cell killing residing in tandem order on chromosome 2q33, are associated with cancer susceptibility. Using a haplotype-tagging SNP approach, we identified a six-nucleotide deletion (-652 6N del) variant in the CASP8 promoter associated with decreased risk of lung cancer. The deletion destroys a stimulatory protein 1 binding site and decreases CASP8 transcription. Biochemical analyses showed that T lymphocytes with the deletion variant had lower caspase-8 activity and activation-induced cell death upon stimulation with cancer cell antigens. Case-control analyses of 4,995 individuals with cancer and 4,972 controls in a Chinese population showed that this genetic variant is associated with reduced susceptibility to multiple cancers, including lung, esophageal, gastric, colorectal, cervical and breast cancers, acting in an allele dose-dependent manner. These results support the hypothesis that genetic variants influencing immune status modify cancer susceptibility.


Assuntos
Caspase 8/genética , Cromossomos Humanos Par 2/genética , Predisposição Genética para Doença/genética , Mutação INDEL/genética , Neoplasias/genética , Neoplasias/imunologia , Regiões Promotoras Genéticas/genética , Povo Asiático , Sítios de Ligação/genética , Caspase 8/metabolismo , China , Imunoprecipitação da Cromatina , Ensaio de Desvio de Mobilidade Eletroforética , Citometria de Fluxo , Humanos , Luciferases , Polimorfismo de Nucleotídeo Único/genética , Linfócitos T/imunologia , Linfócitos T/metabolismo , Transfecção
5.
PLoS Comput Biol ; 10(6): e1003627, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24901472

RESUMO

Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.


Assuntos
Algoritmos , Epistasia Genética , Variação Genética , Estudo de Associação Genômica Ampla/estatística & dados numéricos , Transtorno Bipolar/genética , Fator H do Complemento/genética , Biologia Computacional , Simulação por Computador , Mineração de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Predisposição Genética para Doença , Humanos , Desequilíbrio de Ligação , Modelos Logísticos , Degeneração Macular/genética , Modelos Genéticos
6.
Bioinformatics ; 29(9): 1220-2, 2013 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-23479353

RESUMO

SUMMARY: We present JAWAMix5, an out-of-core open-source toolkit for association mapping using high-throughput sequence data. Taking advantage of its HDF5-based implementation, JAWAMix5 stores genotype data on disk and accesses them as though stored in main memory. Therefore, it offers a scalable and fast analysis without concerns about memory usage, whatever the size of the dataset. We have implemented eight functions for association studies, including standard methods (linear models, linear mixed models, rare variants test, analysis in nested association mapping design and local variance component analysis), as well as a novel Bayesian local variance component analysis. Application to real data demonstrates that JAWAMix5 is reasonably fast compared with traditional solutions that load the complete dataset into memory, and that the memory usage is efficient regardless of the dataset size. AVAILABILITY: The source code, a 'batteries-included' executable and user manual can be freely downloaded from http://code.google.com/p/jawamix5/.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Software , Teorema de Bayes , Genótipo , Sequenciamento de Nucleotídeos em Larga Escala , Modelos Lineares
7.
Genetics ; 226(2)2024 Feb 07.
Artigo em Inglês | MEDLINE | ID: mdl-38001381

RESUMO

Toward the identification of genetic basis of complex traits, transcriptome-wide association study (TWAS) is successful in integrating transcriptome data. However, TWAS is only applicable for common variants, excluding rare variants in exome or whole-genome sequences. This is partly because of the inherent limitation of TWAS protocols that rely on predicting gene expressions. Our previous research has revealed the insight into TWAS: the 2 steps in TWAS, building and applying the expression prediction models, are essentially genetic feature selection and aggregations that do not have to involve predictions. Based on this insight disentangling TWAS, rare variants' inability of predicting expression traits is no longer an obstacle. Herein, we developed "rare variant TWAS," or rvTWAS, that first uses a Bayesian model to conduct expression-directed feature selection and then uses a kernel machine to carry out feature aggregation, forming a model leveraging expressions for association mapping including rare variants. We demonstrated the performance of rvTWAS by thorough simulations and real data analysis in 3 psychiatric disorders, namely schizophrenia, bipolar disorder, and autism spectrum disorder. We confirmed that rvTWAS outperforms existing TWAS protocols and revealed additional genes underlying psychiatric disorders. Particularly, we formed a hypothetical mechanism in which zinc finger genes impact all 3 disorders through transcriptional regulations. rvTWAS will open a door for sequence-based association mappings integrating gene expressions.


Assuntos
Transtorno do Espectro Autista , Transcriptoma , Humanos , Transtorno do Espectro Autista/genética , Teorema de Bayes , Fenótipo , Locos de Características Quantitativas , Estudo de Associação Genômica Ampla/métodos , Predisposição Genética para Doença , Polimorfismo de Nucleotídeo Único
8.
Cancer Epidemiol Biomarkers Prev ; 33(5): 712-720, 2024 May 01.
Artigo em Inglês | MEDLINE | ID: mdl-38393316

RESUMO

BACKGROUND: Microsatellite instability (MSI) and tumor mutational burden (TMB) are predictive biomarkers for pan-cancer immunotherapy. The interrelationship between MSI-high (MSI-H) and TMB-high (TMB-H) in human cancers and their predictive value for immunotherapy in lung cancer remain unclear. METHODS: We analyzed somatic mutation data from the Genomics Evidence Neoplasia Information Exchange (n = 46,320) to determine the relationship between MSI-H and TMB-H in human cancers using adjusted multivariate regression models. Patient survival was examined using the Cox proportional hazards model. The association between MSI and genetic mutations was assessed. RESULTS: Patients (31-89%) with MSI-H had TMB-low phenotypes across 22 cancer types. Colorectal and stomach cancers showed the strongest association between TMB and MSI. TMB-H patients with lung cancer who received immunotherapy exhibited significantly higher overall survival [HR, 0.61; 95% confidence interval (CI), 0.44-0.86] and progression-free survival (HR, 0.65; 95% CI, 0.47-0.91) compared to the TMB-low group; no significant benefit was observed in the MSI-H group. Patients with TMB and MSI phenotypes showed further improvement in overall survival and PFS. We identified several mutated genes associated with MSI-H phenotypes, including known mismatch repair genes and novel mutated genes, such as ARID1A and ARID1B. CONCLUSIONS: Our results demonstrate that TMB-H and/or a combination of MSI-H can serve as biomarkers for immunotherapies in lung cancer. IMPACT: These findings suggest that distinct or combined biomarkers should be considered for immunotherapy in human cancers because notable discrepancies exist between MSI-H and TMB-H across different cancer types.


Assuntos
Biomarcadores Tumorais , Instabilidade de Microssatélites , Mutação , Humanos , Feminino , Masculino , Biomarcadores Tumorais/genética , Neoplasias/genética , Neoplasias/mortalidade , Neoplasias/terapia , Genômica/métodos , Pessoa de Meia-Idade , Idoso
9.
Nat Neurosci ; 27(9): 1708-1720, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-39103557

RESUMO

Astrocyte diversity is greatly influenced by local environmental modulation. Here we report that the majority of astrocytes across the mouse brain possess a singular primary cilium localized to the cell soma. Comparative single-cell transcriptomics reveals that primary cilia mediate canonical SHH signaling to modulate astrocyte subtype-specific core features in synaptic regulation, intracellular transport, energy and metabolism. Independent of canonical SHH signaling, primary cilia are important regulators of astrocyte morphology and intracellular signaling balance. Dendritic spine analysis and transcriptomics reveal that perturbation of astrocytic cilia leads to disruption of neuronal development and global intercellular connectomes in the brain. Mice with primary ciliary-deficient astrocytes show behavioral deficits in sensorimotor function, sociability, learning and memory. Our results uncover a critical role for primary cilia in transmitting local cues that drive the region-specific diversification of astrocytes within the developing brain.


Assuntos
Astrócitos , Cílios , Proteínas Hedgehog , Transdução de Sinais , Animais , Cílios/metabolismo , Cílios/fisiologia , Astrócitos/metabolismo , Camundongos , Transdução de Sinais/fisiologia , Proteínas Hedgehog/metabolismo , Proteínas Hedgehog/genética , Encéfalo/metabolismo , Encéfalo/crescimento & desenvolvimento , Neurogênese/fisiologia , Camundongos Endogâmicos C57BL , Masculino
10.
Cancer Res ; 84(16): 2707-2719, 2024 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-38759092

RESUMO

Alternative polyadenylation (APA) modulates mRNA processing in the 3'-untranslated regions (3' UTR), affecting mRNA stability and translation efficiency. Research into genetically regulated APA has the potential to provide insights into cancer risk. In this study, we conducted large APA-wide association studies to investigate associations between APA levels and cancer risk. Genetic models were built to predict APA levels in multiple tissues using genotype and RNA sequencing data from 1,337 samples from the Genotype-Tissue Expression project. Associations of genetically predicted APA levels with cancer risk were assessed by applying the prediction models to data from large genome-wide association studies of six common cancers among European ancestry populations: breast, ovarian, prostate, colorectal, lung, and pancreatic cancers. A total of 58 risk genes (corresponding to 76 APA sites) were associated with at least one type of cancer, including 25 genes previously not linked to cancer susceptibility. Of the identified risk APAs, 97.4% and 26.3% were supported by 3'-UTR APA quantitative trait loci and colocalization analyses, respectively. Luciferase reporter assays for four selected putative regulatory 3'-UTR variants demonstrated that the risk alleles of 3'-UTR variants, rs324015 (STAT6), rs2280503 (DIP2B), rs1128450 (FBXO38), and rs145220637 (LDHA), significantly increased the posttranscriptional activities of their target genes compared with reference alleles. Furthermore, knockdown of the target genes confirmed their ability to promote proliferation and migration. Overall, this study provides insights into the role of APA in the genetic susceptibility to common cancers. Significance: Systematic evaluation of associations of alternative polyadenylation with cancer risk reveals 58 putative susceptibility genes, highlighting the contribution of genetically regulated alternative polyadenylation of 3'UTRs to genetic susceptibility to cancer.


Assuntos
Regiões 3' não Traduzidas , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla , Neoplasias , Poliadenilação , Humanos , Neoplasias/genética , Regiões 3' não Traduzidas/genética , Locos de Características Quantitativas , Polimorfismo de Nucleotídeo Único , Feminino , Masculino , Regulação Neoplásica da Expressão Gênica , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Linhagem Celular Tumoral
11.
Nature ; 449(7164): 851-61, 2007 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-17943122

RESUMO

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.


Assuntos
Haplótipos/genética , Polimorfismo de Nucleotídeo Único/genética , Feminino , Homozigoto , Humanos , Desequilíbrio de Ligação/genética , Masculino , Grupos Raciais/genética , Recombinação Genética/genética , Seleção Genética
12.
Nature ; 449(7164): 913-8, 2007 Oct 18.
Artigo em Inglês | MEDLINE | ID: mdl-17943131

RESUMO

With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.


Assuntos
Genoma Humano/genética , Seleção Genética , Antiporters/genética , Receptor Edar/química , Receptor Edar/genética , Frequência do Gene , Genética Populacional , Geografia , Haplótipos/genética , Humanos , Modelos Moleculares , Polimorfismo de Nucleotídeo Único/genética , Estrutura Terciária de Proteína
13.
Front Genet ; 14: 1222517, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37693313

RESUMO

To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components, Vpairs and Gpairs, and demonstrate its advantages over existing implementations of such well-known algorithms as Apriori and FP-growth. We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.

14.
Eur J Cell Biol ; 102(3): 151341, 2023 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-37459799

RESUMO

ING1 is a chromatin targeting subunit of the Sin3a histone deacetylase (HDAC) complex that alters chromatin structure to subsequently regulate gene expression. We find that ING1 knockdown increases expression of Twist1, Zeb 1&2, Snai1, Bmi1 and TSHZ1 drivers of EMT, promoting EMT and cell motility. ING1 expression had the opposite effect, promoting epithelial cell morphology and inhibiting basal and TGF-ß-induced motility in 3D organoid cultures. ING1 binds the Twist1 promoter and Twist1 was largely responsible for the ability of ING1 to reduce cell migration. Consistent with ING1 inhibiting Twist1 expression in vivo, an inverse relationship between ING1 and Twist1 levels was seen in breast cancer samples from The Cancer Genome Atlas (TCGA). The HDAC inhibitor vorinostat is approved for treatment of multiple myeloma and cutaneous T cell lymphoma and is in clinical trials for solid tumours as adjuvant therapy. One molecular target of vorinostat is INhibitor of Growth 2 (ING2), that together with ING1 serve as targeting subunits of the Sin3a HDAC complex. Treatment with sublethal (LD25-LD50) levels of vorinostat promoted breast cancer cell migration several-fold, which increased further upon ING1 knockout. These observations indicate that correct targeting of the Sin3a HDAC complex, and HDAC activity in general decreases luminal and basal breast cancer cell motility, suggesting that use of HDAC inhibitors as adjuvant therapies in breast cancers that are prone to metastasize may not be optimal and requires further investigation.


Assuntos
Neoplasias da Mama , Inibidores de Histona Desacetilases , Feminino , Humanos , Neoplasias da Mama/tratamento farmacológico , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Linhagem Celular Tumoral , Cromatina , Transição Epitelial-Mesenquimal , Regulação Neoplásica da Expressão Gênica , Inibidores de Histona Desacetilases/farmacologia , Vorinostat/farmacologia
15.
medRxiv ; 2023 Nov 07.
Artigo em Inglês | MEDLINE | ID: mdl-37986797

RESUMO

Alternative polyadenylation (APA) modulates mRNA processing in the 3' untranslated regions (3'UTR), which affect mRNA stability and translation efficiency. Here, we build genetic models to predict APA levels in multiple tissues using sequencing data of 1,337 samples from the Genotype-Tissue Expression, and apply these models to assess associations between genetically predicted APA levels and cancer risk with data from large genome-wide association studies of six common cancers, including breast, ovary, prostate, colorectum, lung, and pancreas among European-ancestry populations. At a Bonferroni-corrected P □<□0.05, we identify 58 risk genes, including seven in newly identified loci. Using luciferase reporter assays, we demonstrate that risk alleles of 3'UTR variants, rs324015 ( STAT6 ), rs2280503 ( DIP2B ), rs1128450 ( FBXO38 ) and rs145220637 ( LDAH ), could significantly increase post-transcriptional activities of their target genes compared to reference alleles. Further gene knockdown experiments confirm their oncogenic roles. Our study provides additional insight into the genetic susceptibility of these common cancers.

16.
Sci Adv ; 8(51): eabo2846, 2022 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-36542714

RESUMO

Approaches systematically characterizing interactions via transcriptomic data usually follow two systems: (i) coexpression network analyses focusing on correlations between genes and (ii) linear regressions (usually regularized) to select multiple genes jointly. Both suffer from the problem of stability: A slight change of parameterization or dataset could lead to marked alterations of outcomes. Here, we propose Stabilized COre gene and Pathway Election (SCOPE), a tool integrating bootstrapped least absolute shrinkage and selection operator and coexpression analysis, leading to robust outcomes insensitive to variations in data. By applying SCOPE to six cancer expression datasets (BRCA, COAD, KIRC, LUAD, PRAD, and THCA) in The Cancer Genome Atlas, we identified core genes capturing interaction effects in crucial pan-cancer pathways related to genome instability and DNA damage response. Moreover, we highlighted the pivotal role of CD63 as an oncogenic driver and a potential therapeutic target in kidney cancer. SCOPE enables stabilized investigations toward complex interactions using transcriptome data.

17.
Genetics ; 220(2)2022 02 04.
Artigo em Inglês | MEDLINE | ID: mdl-34849857

RESUMO

The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.


Assuntos
Estudo de Associação Genômica Ampla , Transcriptoma , Predisposição Genética para Doença , Estudo de Associação Genômica Ampla/métodos , Humanos , Fenótipo , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas
18.
Front Genet ; 12: 705708, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34322159

RESUMO

DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. BACkPAy is a pre-screening Bayesian approach to detect biological meaningful patterns of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e., non-differential) with flat methylation pattern levels across experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with three tissue types and each type contains three gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Using BACkPAy, we identified eight biological meaningful patterns/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e., predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1, and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.

19.
Genes (Basel) ; 12(8)2021 07 28.
Artigo em Inglês | MEDLINE | ID: mdl-34440333

RESUMO

Some genetic diseases ("digenic traits") are due to the interaction between two DNA variants, which presumably reflects biochemical interactions. For example, certain forms of Retinitis Pigmentosa, a type of blindness, occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while the occurrence of only one such variant results in a normal phenotype. Detecting variant pairs underlying digenic traits by standard genetic methods is difficult and is downright impossible when individual variants alone have minimal effects. Frequent pattern mining (FPM) methods are known to detect patterns of items. We make use of FPM approaches to find pairs of genotypes (from different variants) that can discriminate between cases and controls. Our method is based on genotype patterns of length two, and permutation testing allows assigning p-values to genotype patterns, where the null hypothesis refers to equal pattern frequencies in cases and controls. We compare different interaction search approaches and their properties on the basis of published datasets. Our implementation of FPM to case-control studies is freely available.


Assuntos
DNA/genética , Mineração de Dados , Doenças Genéticas Inatas/genética , Genótipo , Estudos de Casos e Controles , Conjuntos de Dados como Assunto , Humanos , Polimorfismo de Nucleotídeo Único
20.
BMC Bioinformatics ; 10 Suppl 1: S75, 2009 Jan 30.
Artigo em Inglês | MEDLINE | ID: mdl-19208180

RESUMO

BACKGROUND: In addition to single-locus (main) effects of disease variants, there is a growing consensus that gene-gene and gene-environment interactions may play important roles in disease etiology. However, for the very large numbers of genetic markers currently in use, it has proven difficult to develop suitable and efficient approaches for detecting effects other than main effects due to single variants. RESULTS: We developed a method for jointly detecting disease-causing single-locus effects and gene-gene interactions. Our method is based on finding differences of genotype pattern frequencies between case and control individuals. Those single-nucleotide polymorphism markers with largest single-locus association test statistics are included in a pattern. For a logistic regression model comprising three disease variants exerting main and epistatic interaction effects, we demonstrate that our method is vastly superior to the traditional approach of looking for single-locus effects. In addition, our method is suitable for estimating the number of disease variants in a dataset. We successfully apply our approach to data on Parkinson Disease and heroin addiction. CONCLUSION: Our approach is suitable and powerful for detecting disease susceptibility variants with potentially small main effects and strong interaction effects. It can be applied to large numbers of genetic markers.


Assuntos
Predisposição Genética para Doença/genética , Genótipo , Simulação por Computador , Marcadores Genéticos/genética , Humanos , Polimorfismo de Nucleotídeo Único
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa