Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
PLoS Genet ; 19(12): e1011074, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38109434

RESUMEN

Linkage disequilibrium (LD) is a fundamental concept in genetics; critical for studying genetic associations and molecular evolution. However, LD measurements are only reliable for common genetic variants, leaving low-frequency variants unanalyzed. In this work, we introduce cumulative LD (cLD), a stable statistic that captures the rare-variant LD between genetic regions, which reflects more biological interactions between variants, in addition to lack of recombination. We derived the theoretical variance of cLD using delta methods to demonstrate its higher stability than LD for rare variants. This property is also verified by bootstrapped simulations using real data. In application, we find cLD reveals an increased genetic association between genes in 3D chromatin interactions, a phenomenon recently reported negatively by calculating standard LD between common variants. Additionally, we show that cLD is higher between gene pairs reported in interaction databases, identifies unreported protein-protein interactions, and reveals interacting genes distinguishing case/control samples in association studies.


Asunto(s)
Genómica , Polimorfismo de Nucleótido Simple , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple/genética
2.
PLoS Comput Biol ; 19(10): e1011476, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37782668

RESUMEN

Machine Learning models have been frequently used in transcriptome analyses. Particularly, Representation Learning (RL), e.g., autoencoders, are effective in learning critical representations in noisy data. However, learned representations, e.g., the "latent variables" in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up. In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes. Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models. However, the lack of interpretability and individual target genes is an obstacle for RL's broad use in practice. To facilitate interpretable analysis and gene-identification using RL, we propose "Critical genes", defined as genes that contribute highly to learned representations (e.g., latent variables in an autoencoder). As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene's contribution to latent variables, based on which Critical genes are prioritized. Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers. Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET) and a cancer-specific database (COSMIC), evidencing its potential to disclose massive unknown biology. As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues. In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions.


Asunto(s)
Inteligencia Artificial , Neoplasias , Humanos , Genes Esenciales , Bases de Datos Factuales , Perfilación de la Expresión Génica
3.
Am J Pathol ; 189(9): 1732-1743, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31199922

RESUMEN

Approximately 15% to 20% of colorectal cancers are developed through the serrated pathway of tumorigenesis, which is associated with BRAF mutation, CpG island methylation phenotype, and MLH1 methylation. However, the detailed process of progression from sessile serrated adenoma (SSA) to dysplasia and carcinoma has not been elucidated. To further characterize mechanisms involved in the dysplastic progression of SSA, we investigated differential expressions of mRNAs between areas with and without dysplasia within the same SSA polyps. Significantly dysregulated genes in paired samples were applied for functional annotation and biological significance. The same lysates from a subset of matched samples were subjected for miRNA expression profiling. Differentially expressed miRNAs were determined, and their targeted mRNAs were compared in parallel to the list of differentially expressed mRNAs from an RNA sequencing study. Fourteen common mRNA targets were identified, which include AXIN2, a known indicator of WNT/ß-catenin pathway activation. Together, in this study, different genes, pathways, and biological processes involved in the initiation and progression of dysplasia in the serrated pathway are documented. One of the most significant findings is the involvement of the WNT/ß-catenin pathway in the dysplastic progression of SSAs with different genes being targeted in early versus advanced dysplasia.


Asunto(s)
Adenoma/patología , Pólipos Adenomatosos/patología , Mutación , Vía de Señalización Wnt , Adenoma/genética , Adenoma/metabolismo , Pólipos Adenomatosos/genética , Pólipos Adenomatosos/metabolismo , Anciano , Progresión de la Enfermedad , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino
4.
Nat Genet ; 39(5): 605-13, 2007 May.
Artículo en Inglés | MEDLINE | ID: mdl-17450141

RESUMEN

Caspases are important in the life and death of immune cells and therefore influence immune surveillance of malignancies. We tested whether genetic variants in CASP8, CASP10 and CFLAR, three genes important for death receptor-induced cell killing residing in tandem order on chromosome 2q33, are associated with cancer susceptibility. Using a haplotype-tagging SNP approach, we identified a six-nucleotide deletion (-652 6N del) variant in the CASP8 promoter associated with decreased risk of lung cancer. The deletion destroys a stimulatory protein 1 binding site and decreases CASP8 transcription. Biochemical analyses showed that T lymphocytes with the deletion variant had lower caspase-8 activity and activation-induced cell death upon stimulation with cancer cell antigens. Case-control analyses of 4,995 individuals with cancer and 4,972 controls in a Chinese population showed that this genetic variant is associated with reduced susceptibility to multiple cancers, including lung, esophageal, gastric, colorectal, cervical and breast cancers, acting in an allele dose-dependent manner. These results support the hypothesis that genetic variants influencing immune status modify cancer susceptibility.


Asunto(s)
Caspasa 8/genética , Cromosomas Humanos Par 2/genética , Predisposición Genética a la Enfermedad/genética , Mutación INDEL/genética , Neoplasias/genética , Neoplasias/inmunología , Regiones Promotoras Genéticas/genética , Pueblo Asiatico , Sitios de Unión/genética , Caspasa 8/metabolismo , China , Inmunoprecipitación de Cromatina , Ensayo de Cambio de Movilidad Electroforética , Citometría de Flujo , Humanos , Luciferasas , Polimorfismo de Nucleótido Simple/genética , Linfocitos T/inmunología , Linfocitos T/metabolismo , Transfección
5.
PLoS Comput Biol ; 10(6): e1003627, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24901472

RESUMEN

Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.


Asunto(s)
Algoritmos , Epistasis Genética , Variación Genética , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Trastorno Bipolar/genética , Factor H de Complemento/genética , Biología Computacional , Simulación por Computador , Minería de Datos/estadística & datos numéricos , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , Modelos Logísticos , Degeneración Macular/genética , Modelos Genéticos
6.
Bioinformatics ; 29(9): 1220-2, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23479353

RESUMEN

SUMMARY: We present JAWAMix5, an out-of-core open-source toolkit for association mapping using high-throughput sequence data. Taking advantage of its HDF5-based implementation, JAWAMix5 stores genotype data on disk and accesses them as though stored in main memory. Therefore, it offers a scalable and fast analysis without concerns about memory usage, whatever the size of the dataset. We have implemented eight functions for association studies, including standard methods (linear models, linear mixed models, rare variants test, analysis in nested association mapping design and local variance component analysis), as well as a novel Bayesian local variance component analysis. Application to real data demonstrates that JAWAMix5 is reasonably fast compared with traditional solutions that load the complete dataset into memory, and that the memory usage is efficient regardless of the dataset size. AVAILABILITY: The source code, a 'batteries-included' executable and user manual can be freely downloaded from http://code.google.com/p/jawamix5/.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Programas Informáticos , Teorema de Bayes , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Lineales
7.
Genetics ; 226(2)2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38001381

RESUMEN

Toward the identification of genetic basis of complex traits, transcriptome-wide association study (TWAS) is successful in integrating transcriptome data. However, TWAS is only applicable for common variants, excluding rare variants in exome or whole-genome sequences. This is partly because of the inherent limitation of TWAS protocols that rely on predicting gene expressions. Our previous research has revealed the insight into TWAS: the 2 steps in TWAS, building and applying the expression prediction models, are essentially genetic feature selection and aggregations that do not have to involve predictions. Based on this insight disentangling TWAS, rare variants' inability of predicting expression traits is no longer an obstacle. Herein, we developed "rare variant TWAS," or rvTWAS, that first uses a Bayesian model to conduct expression-directed feature selection and then uses a kernel machine to carry out feature aggregation, forming a model leveraging expressions for association mapping including rare variants. We demonstrated the performance of rvTWAS by thorough simulations and real data analysis in 3 psychiatric disorders, namely schizophrenia, bipolar disorder, and autism spectrum disorder. We confirmed that rvTWAS outperforms existing TWAS protocols and revealed additional genes underlying psychiatric disorders. Particularly, we formed a hypothetical mechanism in which zinc finger genes impact all 3 disorders through transcriptional regulations. rvTWAS will open a door for sequence-based association mappings integrating gene expressions.


Asunto(s)
Trastorno del Espectro Autista , Transcriptoma , Humanos , Trastorno del Espectro Autista/genética , Teorema de Bayes , Fenotipo , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple
8.
Cancer Epidemiol Biomarkers Prev ; 33(5): 712-720, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38393316

RESUMEN

BACKGROUND: Microsatellite instability (MSI) and tumor mutational burden (TMB) are predictive biomarkers for pan-cancer immunotherapy. The interrelationship between MSI-high (MSI-H) and TMB-high (TMB-H) in human cancers and their predictive value for immunotherapy in lung cancer remain unclear. METHODS: We analyzed somatic mutation data from the Genomics Evidence Neoplasia Information Exchange (n = 46,320) to determine the relationship between MSI-H and TMB-H in human cancers using adjusted multivariate regression models. Patient survival was examined using the Cox proportional hazards model. The association between MSI and genetic mutations was assessed. RESULTS: Patients (31-89%) with MSI-H had TMB-low phenotypes across 22 cancer types. Colorectal and stomach cancers showed the strongest association between TMB and MSI. TMB-H patients with lung cancer who received immunotherapy exhibited significantly higher overall survival [HR, 0.61; 95% confidence interval (CI), 0.44-0.86] and progression-free survival (HR, 0.65; 95% CI, 0.47-0.91) compared to the TMB-low group; no significant benefit was observed in the MSI-H group. Patients with TMB and MSI phenotypes showed further improvement in overall survival and PFS. We identified several mutated genes associated with MSI-H phenotypes, including known mismatch repair genes and novel mutated genes, such as ARID1A and ARID1B. CONCLUSIONS: Our results demonstrate that TMB-H and/or a combination of MSI-H can serve as biomarkers for immunotherapies in lung cancer. IMPACT: These findings suggest that distinct or combined biomarkers should be considered for immunotherapy in human cancers because notable discrepancies exist between MSI-H and TMB-H across different cancer types.


Asunto(s)
Biomarcadores de Tumor , Inestabilidad de Microsatélites , Mutación , Humanos , Femenino , Masculino , Biomarcadores de Tumor/genética , Neoplasias/genética , Neoplasias/mortalidad , Neoplasias/terapia , Genómica/métodos , Persona de Mediana Edad , Anciano
9.
Cancer Res ; 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38759092

RESUMEN

Alternative polyadenylation (APA) modulates mRNA processing in the 3' untranslated regions (3' UTR), affecting mRNA stability and translation efficiency. Research into genetically regulated APA has the potential to provide insights into cancer risk. Herein, we conducted large alternative polyadenylation-wide association studies (APA-WAS) to investigate associations of APA levels with cancer risk. Genetic models were built to predict APA levels in multiple tissues using genotype and RNA-sequencing data from 1,337 samples from the Genotype-Tissue Expression Project. Associations of genetically predicted APA levels with cancer risk were assessed by applying the prediction models to data from large genome-wide association studies of six common cancers among European-ancestry populations, including breast, ovary, prostate, colorectum, lung, and pancreas. A total of 58 risk genes (corresponding to 76 APA sites) were associated with at least one type of cancer, including 25 genes previously not linked to cancer susceptibility. Of the identified risk APAs, 97.4% and 26.3% were supported by 3' UTR APA quantitative trait loci and co-localization analyses, respectively. Luciferase reporter assays for four selected putative regulatory 3' UTR variants demonstrated that the risk alleles of 3' UTR variants, rs324015 (STAT6), rs2280503 (DIP2B), rs1128450 (FBXO38), and rs145220637 (LDHA), significantly increased the post-transcriptional activities of their target genes compared to reference alleles. Furthermore, knockdown of the target genes confirmed their ability to promote proliferation and migration. Overall, this study provides insights into the role of APA in the genetic susceptibility to common cancers.

10.
Nature ; 449(7164): 851-61, 2007 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-17943122

RESUMEN

We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.


Asunto(s)
Haplotipos/genética , Polimorfismo de Nucleótido Simple/genética , Femenino , Homocigoto , Humanos , Desequilibrio de Ligamiento/genética , Masculino , Grupos Raciales/genética , Recombinación Genética/genética , Selección Genética
11.
Nature ; 449(7164): 913-8, 2007 Oct 18.
Artículo en Inglés | MEDLINE | ID: mdl-17943131

RESUMEN

With the advent of dense maps of human genetic variation, it is now possible to detect positive natural selection across the human genome. Here we report an analysis of over 3 million polymorphisms from the International HapMap Project Phase 2 (HapMap2). We used 'long-range haplotype' methods, which were developed to identify alleles segregating in a population that have undergone recent selection, and we also developed new methods that are based on cross-population comparisons to discover alleles that have swept to near-fixation within a population. The analysis reveals more than 300 strong candidate regions. Focusing on the strongest 22 regions, we develop a heuristic for scrutinizing these regions to identify candidate targets of selection. In a complementary analysis, we identify 26 non-synonymous, coding, single nucleotide polymorphisms showing regional evidence of positive selection. Examination of these candidates highlights three cases in which two genes in a common biological process have apparently undergone positive selection in the same population:LARGE and DMD, both related to infection by the Lassa virus, in West Africa;SLC24A5 and SLC45A2, both involved in skin pigmentation, in Europe; and EDAR and EDA2R, both involved in development of hair follicles, in Asia.


Asunto(s)
Genoma Humano/genética , Selección Genética , Antiportadores/genética , Receptor Edar/química , Receptor Edar/genética , Frecuencia de los Genes , Genética de Población , Geografía , Haplotipos/genética , Humanos , Modelos Moleculares , Polimorfismo de Nucleótido Simple/genética , Estructura Terciaria de Proteína
12.
Front Genet ; 14: 1222517, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37693313

RESUMEN

To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components, Vpairs and Gpairs, and demonstrate its advantages over existing implementations of such well-known algorithms as Apriori and FP-growth. We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.

13.
Eur J Cell Biol ; 102(3): 151341, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37459799

RESUMEN

ING1 is a chromatin targeting subunit of the Sin3a histone deacetylase (HDAC) complex that alters chromatin structure to subsequently regulate gene expression. We find that ING1 knockdown increases expression of Twist1, Zeb 1&2, Snai1, Bmi1 and TSHZ1 drivers of EMT, promoting EMT and cell motility. ING1 expression had the opposite effect, promoting epithelial cell morphology and inhibiting basal and TGF-ß-induced motility in 3D organoid cultures. ING1 binds the Twist1 promoter and Twist1 was largely responsible for the ability of ING1 to reduce cell migration. Consistent with ING1 inhibiting Twist1 expression in vivo, an inverse relationship between ING1 and Twist1 levels was seen in breast cancer samples from The Cancer Genome Atlas (TCGA). The HDAC inhibitor vorinostat is approved for treatment of multiple myeloma and cutaneous T cell lymphoma and is in clinical trials for solid tumours as adjuvant therapy. One molecular target of vorinostat is INhibitor of Growth 2 (ING2), that together with ING1 serve as targeting subunits of the Sin3a HDAC complex. Treatment with sublethal (LD25-LD50) levels of vorinostat promoted breast cancer cell migration several-fold, which increased further upon ING1 knockout. These observations indicate that correct targeting of the Sin3a HDAC complex, and HDAC activity in general decreases luminal and basal breast cancer cell motility, suggesting that use of HDAC inhibitors as adjuvant therapies in breast cancers that are prone to metastasize may not be optimal and requires further investigation.


Asunto(s)
Neoplasias de la Mama , Inhibidores de Histona Desacetilasas , Femenino , Humanos , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Línea Celular Tumoral , Cromatina , Transición Epitelial-Mesenquimal , Regulación Neoplásica de la Expresión Génica , Inhibidores de Histona Desacetilasas/farmacología , Vorinostat/farmacología
14.
medRxiv ; 2023 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-37986797

RESUMEN

Alternative polyadenylation (APA) modulates mRNA processing in the 3' untranslated regions (3'UTR), which affect mRNA stability and translation efficiency. Here, we build genetic models to predict APA levels in multiple tissues using sequencing data of 1,337 samples from the Genotype-Tissue Expression, and apply these models to assess associations between genetically predicted APA levels and cancer risk with data from large genome-wide association studies of six common cancers, including breast, ovary, prostate, colorectum, lung, and pancreas among European-ancestry populations. At a Bonferroni-corrected P □<□0.05, we identify 58 risk genes, including seven in newly identified loci. Using luciferase reporter assays, we demonstrate that risk alleles of 3'UTR variants, rs324015 ( STAT6 ), rs2280503 ( DIP2B ), rs1128450 ( FBXO38 ) and rs145220637 ( LDAH ), could significantly increase post-transcriptional activities of their target genes compared to reference alleles. Further gene knockdown experiments confirm their oncogenic roles. Our study provides additional insight into the genetic susceptibility of these common cancers.

15.
Sci Adv ; 8(51): eabo2846, 2022 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-36542714

RESUMEN

Approaches systematically characterizing interactions via transcriptomic data usually follow two systems: (i) coexpression network analyses focusing on correlations between genes and (ii) linear regressions (usually regularized) to select multiple genes jointly. Both suffer from the problem of stability: A slight change of parameterization or dataset could lead to marked alterations of outcomes. Here, we propose Stabilized COre gene and Pathway Election (SCOPE), a tool integrating bootstrapped least absolute shrinkage and selection operator and coexpression analysis, leading to robust outcomes insensitive to variations in data. By applying SCOPE to six cancer expression datasets (BRCA, COAD, KIRC, LUAD, PRAD, and THCA) in The Cancer Genome Atlas, we identified core genes capturing interaction effects in crucial pan-cancer pathways related to genome instability and DNA damage response. Moreover, we highlighted the pivotal role of CD63 as an oncogenic driver and a potential therapeutic target in kidney cancer. SCOPE enables stabilized investigations toward complex interactions using transcriptome data.

16.
Genetics ; 220(2)2022 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-34849857

RESUMEN

The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo
17.
Front Genet ; 12: 705708, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34322159

RESUMEN

DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. BACkPAy is a pre-screening Bayesian approach to detect biological meaningful patterns of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e., non-differential) with flat methylation pattern levels across experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with three tissue types and each type contains three gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Using BACkPAy, we identified eight biological meaningful patterns/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e., predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1, and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.

18.
Genes (Basel) ; 12(8)2021 07 28.
Artículo en Inglés | MEDLINE | ID: mdl-34440333

RESUMEN

Some genetic diseases ("digenic traits") are due to the interaction between two DNA variants, which presumably reflects biochemical interactions. For example, certain forms of Retinitis Pigmentosa, a type of blindness, occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while the occurrence of only one such variant results in a normal phenotype. Detecting variant pairs underlying digenic traits by standard genetic methods is difficult and is downright impossible when individual variants alone have minimal effects. Frequent pattern mining (FPM) methods are known to detect patterns of items. We make use of FPM approaches to find pairs of genotypes (from different variants) that can discriminate between cases and controls. Our method is based on genotype patterns of length two, and permutation testing allows assigning p-values to genotype patterns, where the null hypothesis refers to equal pattern frequencies in cases and controls. We compare different interaction search approaches and their properties on the basis of published datasets. Our implementation of FPM to case-control studies is freely available.


Asunto(s)
ADN/genética , Minería de Datos , Enfermedades Genéticas Congénitas/genética , Genotipo , Estudios de Casos y Controles , Conjuntos de Datos como Asunto , Humanos , Polimorfismo de Nucleótido Simple
19.
BMC Bioinformatics ; 10 Suppl 1: S75, 2009 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-19208180

RESUMEN

BACKGROUND: In addition to single-locus (main) effects of disease variants, there is a growing consensus that gene-gene and gene-environment interactions may play important roles in disease etiology. However, for the very large numbers of genetic markers currently in use, it has proven difficult to develop suitable and efficient approaches for detecting effects other than main effects due to single variants. RESULTS: We developed a method for jointly detecting disease-causing single-locus effects and gene-gene interactions. Our method is based on finding differences of genotype pattern frequencies between case and control individuals. Those single-nucleotide polymorphism markers with largest single-locus association test statistics are included in a pattern. For a logistic regression model comprising three disease variants exerting main and epistatic interaction effects, we demonstrate that our method is vastly superior to the traditional approach of looking for single-locus effects. In addition, our method is suitable for estimating the number of disease variants in a dataset. We successfully apply our approach to data on Parkinson Disease and heroin addiction. CONCLUSION: Our approach is suitable and powerful for detecting disease susceptibility variants with potentially small main effects and strong interaction effects. It can be applied to large numbers of genetic markers.


Asunto(s)
Predisposición Genética a la Enfermedad/genética , Genotipo , Simulación por Computador , Marcadores Genéticos/genética , Humanos , Polimorfismo de Nucleótido Simple
20.
G3 (Bethesda) ; 9(1): 13-19, 2019 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-30482799

RESUMEN

Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future "bigger-data", we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM (N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix (N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources.


Asunto(s)
Genoma/genética , Genómica , Modelos Genéticos , Algoritmos , Animales , Bancos de Muestras Biológicas/tendencias , Cruzamiento , Simulación por Computador , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA