Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 32
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
Cancer Res ; 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38759092

RESUMEN

Alternative polyadenylation (APA) modulates mRNA processing in the 3' untranslated regions (3' UTR), affecting mRNA stability and translation efficiency. Research into genetically regulated APA has the potential to provide insights into cancer risk. Herein, we conducted large alternative polyadenylation-wide association studies (APA-WAS) to investigate associations of APA levels with cancer risk. Genetic models were built to predict APA levels in multiple tissues using genotype and RNA-sequencing data from 1,337 samples from the Genotype-Tissue Expression Project. Associations of genetically predicted APA levels with cancer risk were assessed by applying the prediction models to data from large genome-wide association studies of six common cancers among European-ancestry populations, including breast, ovary, prostate, colorectum, lung, and pancreas. A total of 58 risk genes (corresponding to 76 APA sites) were associated with at least one type of cancer, including 25 genes previously not linked to cancer susceptibility. Of the identified risk APAs, 97.4% and 26.3% were supported by 3' UTR APA quantitative trait loci and co-localization analyses, respectively. Luciferase reporter assays for four selected putative regulatory 3' UTR variants demonstrated that the risk alleles of 3' UTR variants, rs324015 (STAT6), rs2280503 (DIP2B), rs1128450 (FBXO38), and rs145220637 (LDHA), significantly increased the post-transcriptional activities of their target genes compared to reference alleles. Furthermore, knockdown of the target genes confirmed their ability to promote proliferation and migration. Overall, this study provides insights into the role of APA in the genetic susceptibility to common cancers.

2.
Cancer Epidemiol Biomarkers Prev ; 33(5): 712-720, 2024 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-38393316

RESUMEN

BACKGROUND: Microsatellite instability (MSI) and tumor mutational burden (TMB) are predictive biomarkers for pan-cancer immunotherapy. The interrelationship between MSI-high (MSI-H) and TMB-high (TMB-H) in human cancers and their predictive value for immunotherapy in lung cancer remain unclear. METHODS: We analyzed somatic mutation data from the Genomics Evidence Neoplasia Information Exchange (n = 46,320) to determine the relationship between MSI-H and TMB-H in human cancers using adjusted multivariate regression models. Patient survival was examined using the Cox proportional hazards model. The association between MSI and genetic mutations was assessed. RESULTS: Patients (31-89%) with MSI-H had TMB-low phenotypes across 22 cancer types. Colorectal and stomach cancers showed the strongest association between TMB and MSI. TMB-H patients with lung cancer who received immunotherapy exhibited significantly higher overall survival [HR, 0.61; 95% confidence interval (CI), 0.44-0.86] and progression-free survival (HR, 0.65; 95% CI, 0.47-0.91) compared to the TMB-low group; no significant benefit was observed in the MSI-H group. Patients with TMB and MSI phenotypes showed further improvement in overall survival and PFS. We identified several mutated genes associated with MSI-H phenotypes, including known mismatch repair genes and novel mutated genes, such as ARID1A and ARID1B. CONCLUSIONS: Our results demonstrate that TMB-H and/or a combination of MSI-H can serve as biomarkers for immunotherapies in lung cancer. IMPACT: These findings suggest that distinct or combined biomarkers should be considered for immunotherapy in human cancers because notable discrepancies exist between MSI-H and TMB-H across different cancer types.


Asunto(s)
Biomarcadores de Tumor , Inestabilidad de Microsatélites , Mutación , Humanos , Femenino , Masculino , Biomarcadores de Tumor/genética , Neoplasias/genética , Neoplasias/mortalidad , Neoplasias/terapia , Genómica/métodos , Persona de Mediana Edad , Anciano
3.
Genetics ; 226(2)2024 Feb 07.
Artículo en Inglés | MEDLINE | ID: mdl-38001381

RESUMEN

Toward the identification of genetic basis of complex traits, transcriptome-wide association study (TWAS) is successful in integrating transcriptome data. However, TWAS is only applicable for common variants, excluding rare variants in exome or whole-genome sequences. This is partly because of the inherent limitation of TWAS protocols that rely on predicting gene expressions. Our previous research has revealed the insight into TWAS: the 2 steps in TWAS, building and applying the expression prediction models, are essentially genetic feature selection and aggregations that do not have to involve predictions. Based on this insight disentangling TWAS, rare variants' inability of predicting expression traits is no longer an obstacle. Herein, we developed "rare variant TWAS," or rvTWAS, that first uses a Bayesian model to conduct expression-directed feature selection and then uses a kernel machine to carry out feature aggregation, forming a model leveraging expressions for association mapping including rare variants. We demonstrated the performance of rvTWAS by thorough simulations and real data analysis in 3 psychiatric disorders, namely schizophrenia, bipolar disorder, and autism spectrum disorder. We confirmed that rvTWAS outperforms existing TWAS protocols and revealed additional genes underlying psychiatric disorders. Particularly, we formed a hypothetical mechanism in which zinc finger genes impact all 3 disorders through transcriptional regulations. rvTWAS will open a door for sequence-based association mappings integrating gene expressions.


Asunto(s)
Trastorno del Espectro Autista , Transcriptoma , Humanos , Trastorno del Espectro Autista/genética , Teorema de Bayes , Fenotipo , Sitios de Carácter Cuantitativo , Estudio de Asociación del Genoma Completo/métodos , Predisposición Genética a la Enfermedad , Polimorfismo de Nucleótido Simple
4.
PLoS Genet ; 19(12): e1011074, 2023 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-38109434

RESUMEN

Linkage disequilibrium (LD) is a fundamental concept in genetics; critical for studying genetic associations and molecular evolution. However, LD measurements are only reliable for common genetic variants, leaving low-frequency variants unanalyzed. In this work, we introduce cumulative LD (cLD), a stable statistic that captures the rare-variant LD between genetic regions, which reflects more biological interactions between variants, in addition to lack of recombination. We derived the theoretical variance of cLD using delta methods to demonstrate its higher stability than LD for rare variants. This property is also verified by bootstrapped simulations using real data. In application, we find cLD reveals an increased genetic association between genes in 3D chromatin interactions, a phenomenon recently reported negatively by calculating standard LD between common variants. Additionally, we show that cLD is higher between gene pairs reported in interaction databases, identifies unreported protein-protein interactions, and reveals interacting genes distinguishing case/control samples in association studies.


Asunto(s)
Genómica , Polimorfismo de Nucleótido Simple , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple/genética
5.
medRxiv ; 2023 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-37986797

RESUMEN

Alternative polyadenylation (APA) modulates mRNA processing in the 3' untranslated regions (3'UTR), which affect mRNA stability and translation efficiency. Here, we build genetic models to predict APA levels in multiple tissues using sequencing data of 1,337 samples from the Genotype-Tissue Expression, and apply these models to assess associations between genetically predicted APA levels and cancer risk with data from large genome-wide association studies of six common cancers, including breast, ovary, prostate, colorectum, lung, and pancreas among European-ancestry populations. At a Bonferroni-corrected P □<□0.05, we identify 58 risk genes, including seven in newly identified loci. Using luciferase reporter assays, we demonstrate that risk alleles of 3'UTR variants, rs324015 ( STAT6 ), rs2280503 ( DIP2B ), rs1128450 ( FBXO38 ) and rs145220637 ( LDAH ), could significantly increase post-transcriptional activities of their target genes compared to reference alleles. Further gene knockdown experiments confirm their oncogenic roles. Our study provides additional insight into the genetic susceptibility of these common cancers.

6.
PLoS Comput Biol ; 19(10): e1011476, 2023 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-37782668

RESUMEN

Machine Learning models have been frequently used in transcriptome analyses. Particularly, Representation Learning (RL), e.g., autoencoders, are effective in learning critical representations in noisy data. However, learned representations, e.g., the "latent variables" in an autoencoder, are difficult to interpret, not to mention prioritizing essential genes for functional follow-up. In contrast, in traditional analyses, one may identify important genes such as Differentially Expressed (DiffEx), Differentially Co-Expressed (DiffCoEx), and Hub genes. Intuitively, the complex gene-gene interactions may be beyond the capture of marginal effects (DiffEx) or correlations (DiffCoEx and Hub), indicating the need of powerful RL models. However, the lack of interpretability and individual target genes is an obstacle for RL's broad use in practice. To facilitate interpretable analysis and gene-identification using RL, we propose "Critical genes", defined as genes that contribute highly to learned representations (e.g., latent variables in an autoencoder). As a proof-of-concept, supported by eXplainable Artificial Intelligence (XAI), we implemented eXplainable Autoencoder for Critical genes (XA4C) that quantifies each gene's contribution to latent variables, based on which Critical genes are prioritized. Applying XA4C to gene expression data in six cancers showed that Critical genes capture essential pathways underlying cancers. Remarkably, Critical genes has little overlap with Hub or DiffEx genes, however, has a higher enrichment in a comprehensive disease gene database (DisGeNET) and a cancer-specific database (COSMIC), evidencing its potential to disclose massive unknown biology. As an example, we discovered five Critical genes sitting in the center of Lysine degradation (hsa00310) pathway, displaying distinct interaction patterns in tumor and normal tissues. In conclusion, XA4C facilitates explainable analysis using RL and Critical genes discovered by explainable RL empowers the study of complex interactions.


Asunto(s)
Inteligencia Artificial , Neoplasias , Humanos , Genes Esenciales , Bases de Datos Factuales , Perfilación de la Expresión Génica
7.
Front Genet ; 14: 1222517, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37693313

RESUMEN

To locate disease-causing DNA variants on the human gene map, the customary approach has been to carry out a genome-wide association study for one variant after another by testing for genotype frequency differences between individuals affected and unaffected with disease. So-called digenic traits are due to the combined effects of two variants, often on different chromosomes, while individual variants may have little or no effect on disease. Machine learning approaches have been developed to find variant pairs underlying digenic traits. However, many of these methods have large memory requirements so that only small datasets can be analyzed. The increasing availability of desktop computers with large numbers of processors and suitable programming to distribute the workload evenly over all processors in a machine make a new and relatively straightforward approach possible, that is, to evaluate all existing variant and genotype pairs for disease association. We present a prototype of such a method with two components, Vpairs and Gpairs, and demonstrate its advantages over existing implementations of such well-known algorithms as Apriori and FP-growth. We apply these methods to published case-control datasets on age-related macular degeneration and Parkinson disease and construct an ROC curve for a large set of genotype patterns.

9.
Eur J Cell Biol ; 102(3): 151341, 2023 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-37459799

RESUMEN

ING1 is a chromatin targeting subunit of the Sin3a histone deacetylase (HDAC) complex that alters chromatin structure to subsequently regulate gene expression. We find that ING1 knockdown increases expression of Twist1, Zeb 1&2, Snai1, Bmi1 and TSHZ1 drivers of EMT, promoting EMT and cell motility. ING1 expression had the opposite effect, promoting epithelial cell morphology and inhibiting basal and TGF-ß-induced motility in 3D organoid cultures. ING1 binds the Twist1 promoter and Twist1 was largely responsible for the ability of ING1 to reduce cell migration. Consistent with ING1 inhibiting Twist1 expression in vivo, an inverse relationship between ING1 and Twist1 levels was seen in breast cancer samples from The Cancer Genome Atlas (TCGA). The HDAC inhibitor vorinostat is approved for treatment of multiple myeloma and cutaneous T cell lymphoma and is in clinical trials for solid tumours as adjuvant therapy. One molecular target of vorinostat is INhibitor of Growth 2 (ING2), that together with ING1 serve as targeting subunits of the Sin3a HDAC complex. Treatment with sublethal (LD25-LD50) levels of vorinostat promoted breast cancer cell migration several-fold, which increased further upon ING1 knockout. These observations indicate that correct targeting of the Sin3a HDAC complex, and HDAC activity in general decreases luminal and basal breast cancer cell motility, suggesting that use of HDAC inhibitors as adjuvant therapies in breast cancers that are prone to metastasize may not be optimal and requires further investigation.


Asunto(s)
Neoplasias de la Mama , Inhibidores de Histona Desacetilasas , Femenino , Humanos , Neoplasias de la Mama/tratamiento farmacológico , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Línea Celular Tumoral , Cromatina , Transición Epitelial-Mesenquimal , Regulación Neoplásica de la Expresión Génica , Inhibidores de Histona Desacetilasas/farmacología , Vorinostat/farmacología
10.
Sci Adv ; 8(51): eabo2846, 2022 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-36542714

RESUMEN

Approaches systematically characterizing interactions via transcriptomic data usually follow two systems: (i) coexpression network analyses focusing on correlations between genes and (ii) linear regressions (usually regularized) to select multiple genes jointly. Both suffer from the problem of stability: A slight change of parameterization or dataset could lead to marked alterations of outcomes. Here, we propose Stabilized COre gene and Pathway Election (SCOPE), a tool integrating bootstrapped least absolute shrinkage and selection operator and coexpression analysis, leading to robust outcomes insensitive to variations in data. By applying SCOPE to six cancer expression datasets (BRCA, COAD, KIRC, LUAD, PRAD, and THCA) in The Cancer Genome Atlas, we identified core genes capturing interaction effects in crucial pan-cancer pathways related to genome instability and DNA damage response. Moreover, we highlighted the pivotal role of CD63 as an oncogenic driver and a potential therapeutic target in kidney cancer. SCOPE enables stabilized investigations toward complex interactions using transcriptome data.

11.
Genetics ; 220(2)2022 02 04.
Artículo en Inglés | MEDLINE | ID: mdl-34849857

RESUMEN

The success of transcriptome-wide association studies (TWAS) has led to substantial research toward improving the predictive accuracy of its core component of genetically regulated expression (GReX). GReX links expression information with genotype and phenotype by playing two roles simultaneously: it acts as both the outcome of the genotype-based predictive models (for predicting expressions) and the linear combination of genotypes (as the predicted expressions) for association tests. From the perspective of machine learning (considering SNPs as features), these are actually two separable steps-feature selection and feature aggregation-which can be independently conducted. In this study, we show that the single approach of GReX limits the adaptability of TWAS methodology and practice. By conducting simulations and real data analysis, we demonstrate that disentangled protocols adapting straightforward approaches for feature selection (e.g., simple marker test) and aggregation (e.g., kernel machines) outperform the standard TWAS protocols that rely on GReX. Our development provides more powerful novel tools for conducting TWAS. More importantly, our characterization of the exact nature of TWAS suggests that, instead of questionably binding two distinct steps into the same statistical form (GReX), methodological research focusing on optimal combinations of feature selection and aggregation approaches will bring higher power to TWAS protocols.


Asunto(s)
Estudio de Asociación del Genoma Completo , Transcriptoma , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo/métodos , Humanos , Fenotipo , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo
12.
Genes (Basel) ; 12(8)2021 07 28.
Artículo en Inglés | MEDLINE | ID: mdl-34440333

RESUMEN

Some genetic diseases ("digenic traits") are due to the interaction between two DNA variants, which presumably reflects biochemical interactions. For example, certain forms of Retinitis Pigmentosa, a type of blindness, occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while the occurrence of only one such variant results in a normal phenotype. Detecting variant pairs underlying digenic traits by standard genetic methods is difficult and is downright impossible when individual variants alone have minimal effects. Frequent pattern mining (FPM) methods are known to detect patterns of items. We make use of FPM approaches to find pairs of genotypes (from different variants) that can discriminate between cases and controls. Our method is based on genotype patterns of length two, and permutation testing allows assigning p-values to genotype patterns, where the null hypothesis refers to equal pattern frequencies in cases and controls. We compare different interaction search approaches and their properties on the basis of published datasets. Our implementation of FPM to case-control studies is freely available.


Asunto(s)
ADN/genética , Minería de Datos , Enfermedades Genéticas Congénitas/genética , Genotipo , Estudios de Casos y Controles , Conjuntos de Datos como Asunto , Humanos , Polimorfismo de Nucleótido Simple
13.
Front Genet ; 12: 705708, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34322159

RESUMEN

DNA methylations in critical regions are highly involved in cancer pathogenesis and drug response. However, to identify causal methylations out of a large number of potential polymorphic DNA methylation sites is challenging. This high-dimensional data brings two obstacles: first, many established statistical models are not scalable to so many features; second, multiple-test and overfitting become serious. To this end, a method to quickly filter candidate sites to narrow down targets for downstream analyses is urgently needed. BACkPAy is a pre-screening Bayesian approach to detect biological meaningful patterns of potential differential methylation levels with small sample size. BACkPAy prioritizes potentially important biomarkers by the Bayesian false discovery rate (FDR) approach. It filters non-informative sites (i.e., non-differential) with flat methylation pattern levels across experimental conditions. In this work, we applied BACkPAy to a genome-wide methylation dataset with three tissue types and each type contains three gastric cancer samples. We also applied LIMMA (Linear Models for Microarray and RNA-Seq Data) to compare its results with what we achieved by BACkPAy. Then, Cox proportional hazards regression models were utilized to visualize prognostics significant markers with The Cancer Genome Atlas (TCGA) data for survival analysis. Using BACkPAy, we identified eight biological meaningful patterns/groups of differential probes from the DNA methylation dataset. Using TCGA data, we also identified five prognostic genes (i.e., predictive to the progression of gastric cancer) that contain some differential methylation probes, whereas no significant results was identified using the Benjamin-Hochberg FDR in LIMMA. We showed the importance of using BACkPAy for the analysis of DNA methylation data with extremely small sample size in gastric cancer. We revealed that RDH13, CLDN11, TMTC1, UCHL1, and FOXP2 can serve as predictive biomarkers for gastric cancer treatment and the promoter methylation level of these five genes in serum could have prognostic and diagnostic functions in gastric cancer patients.

14.
Am J Pathol ; 189(9): 1732-1743, 2019 09.
Artículo en Inglés | MEDLINE | ID: mdl-31199922

RESUMEN

Approximately 15% to 20% of colorectal cancers are developed through the serrated pathway of tumorigenesis, which is associated with BRAF mutation, CpG island methylation phenotype, and MLH1 methylation. However, the detailed process of progression from sessile serrated adenoma (SSA) to dysplasia and carcinoma has not been elucidated. To further characterize mechanisms involved in the dysplastic progression of SSA, we investigated differential expressions of mRNAs between areas with and without dysplasia within the same SSA polyps. Significantly dysregulated genes in paired samples were applied for functional annotation and biological significance. The same lysates from a subset of matched samples were subjected for miRNA expression profiling. Differentially expressed miRNAs were determined, and their targeted mRNAs were compared in parallel to the list of differentially expressed mRNAs from an RNA sequencing study. Fourteen common mRNA targets were identified, which include AXIN2, a known indicator of WNT/ß-catenin pathway activation. Together, in this study, different genes, pathways, and biological processes involved in the initiation and progression of dysplasia in the serrated pathway are documented. One of the most significant findings is the involvement of the WNT/ß-catenin pathway in the dysplastic progression of SSAs with different genes being targeted in early versus advanced dysplasia.


Asunto(s)
Adenoma/patología , Pólipos Adenomatosos/patología , Mutación , Vía de Señalización Wnt , Adenoma/genética , Adenoma/metabolismo , Pólipos Adenomatosos/genética , Pólipos Adenomatosos/metabolismo , Anciano , Progresión de la Enfermedad , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino
15.
G3 (Bethesda) ; 9(1): 13-19, 2019 01 09.
Artículo en Inglés | MEDLINE | ID: mdl-30482799

RESUMEN

Matrices representing genetic relatedness among individuals (i.e., Genomic Relationship Matrices, GRMs) play a central role in genetic analysis. The eigen-decomposition of GRMs (or its alternative that generates fewer top singular values using genotype matrices) is a necessary step for many analyses including estimation of SNP-heritability, Principal Component Analysis (PCA), and genomic prediction. However, the GRMs and genotype matrices provided by modern biobanks are too large to be stored in active memory. To accommodate the current and future "bigger-data", we develop a disk-based tool, Out-of-Core Matrices Analyzer (OCMA), using state-of-the-art computational techniques that can nimbly perform eigen and Singular Value Decomposition (SVD) analyses. By integrating memory mapping (mmap) and the latest matrix factorization libraries, our tool is fast and memory-efficient. To demonstrate the impressive performance of OCMA, we test it on a personal computer. For full eigen-decomposition, it solves an ordinary GRM (N = 10,000) in 55 sec. For SVD, a commonly used faster alternative of full eigen-decomposition in genomic analyses, OCMA solves the top 200 singular values (SVs) in half an hour, top 2,000 SVs in 0.95 hr, and all 5,000 SVs in 1.77 hr based on a very large genotype matrix (N = 1,000,000, M = 5,000) on the same personal computer. OCMA also supports multi-threading when running in a desktop or HPC cluster. Our OCMA tool can thus alleviate the computing bottleneck of classical analyses on large genomic matrices, and make it possible to scale up current and emerging analytical methods to big genomics data using lightweight computing resources.


Asunto(s)
Genoma/genética , Genómica , Modelos Genéticos , Algoritmos , Animales , Bancos de Muestras Biológicas/tendencias , Cruzamiento , Simulación por Computador , Genotipo , Humanos , Polimorfismo de Nucleótido Simple/genética , Análisis de Componente Principal , Programas Informáticos
16.
Stat Interface ; 8(4): 447-456, 2015 Oct 01.
Artículo en Inglés | MEDLINE | ID: mdl-26681995

RESUMEN

To identify evolutionary events from the footprints left in the patterns of genetic variation in a population, people use many statistical frameworks, including neutrality tests. In datasets from current high throughput sequencing and genotyping platforms, it is common to have missing data and low-confidence SNP calls at many segregating sites. However, the traditional statistical framework for neutrality tests does not allow for these possibilities; therefore the usual way of treating missing data is to ignore segregating sites with missing/low confidence calls, regardless of the good SNP calls at these sites in other individuals. In this work, we propose a modified neutrality test, Extended Tajima's D, which incorporates missing data and SNP-calling uncertainties. Because we do not specify any particular error-generating mechanism, this approach is robust and widely applicable. Simulations show that in most cases the power of the new test is better than the original Tajima's D, given the same type I error. Applications to real data show that it detects fewer outliers associated with low quality data.

17.
PLoS Comput Biol ; 10(6): e1003627, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24901472

RESUMEN

Identifying gene-gene interaction is a hot topic in genome wide association studies. Two fundamental challenges are: (1) how to smartly identify combinations of variants that may be associated with the trait from astronomical number of all possible combinations; and (2) how to test epistatic interaction when all potential combinations are available. We developed AprioriGWAS, which brings two innovations. (1) Based on Apriori, a successful method in field of Frequent Itemset Mining (FIM) in which a pattern growth strategy is leveraged to effectively and accurately reduce search space, AprioriGWAS can efficiently identify genetically associated genotype patterns. (2) To test the hypotheses of epistasis, we adopt a new conditional permutation procedure to obtain reliable statistical inference of Pearson's chi-square test for the [Formula: see text] contingency table generated by associated variants. By applying AprioriGWAS to age-related macular degeneration (AMD) data, we found that: (1) angiopoietin 1 (ANGPT1) and four retinal genes interact with Complement Factor H (CFH). (2) GO term "glycosaminoglycan biosynthetic process" was enriched in AMD interacting genes. The epistatic interactions newly found by AprioriGWAS on AMD data are likely true interactions, since genes interacting with CFH are retinal genes, and GO term enrichment also verified that interaction between glycosaminoglycans (GAGs) and CFH plays an important role in disease pathology of AMD. By applying AprioriGWAS on Bipolar disorder in WTCCC data, we found variants without marginal effect show significant interactions. For example, multiple-SNP genotype patterns inside gene GABRB2 and GRIA1 (AMPA subunit 1 receptor gene). AMPARs are found in many parts of the brain and are the most commonly found receptor in the nervous system. The GABRB2 mediates the fastest inhibitory synaptic transmission in the central nervous system. GRIA1 and GABRB2 are relevant to mental disorders supported by multiple evidences.


Asunto(s)
Algoritmos , Epistasis Genética , Variación Genética , Estudio de Asociación del Genoma Completo/estadística & datos numéricos , Trastorno Bipolar/genética , Factor H de Complemento/genética , Biología Computacional , Simulación por Computador , Minería de Datos/estadística & datos numéricos , Bases de Datos Genéticas , Predisposición Genética a la Enfermedad , Humanos , Desequilibrio de Ligamiento , Modelos Logísticos , Degeneración Macular/genética , Modelos Genéticos
18.
Nat Genet ; 45(8): 884-890, 2013 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-23793030

RESUMEN

Despite advances in sequencing, the goal of obtaining a comprehensive view of genetic variation in populations is still far from reached. We sequenced 180 lines of A. thaliana from Sweden to obtain as complete a picture as possible of variation in a single region. Whereas simple polymorphisms in the unique portion of the genome are readily identified, other polymorphisms are not. The massive variation in genome size identified by flow cytometry seems largely to be due to 45S rDNA copy number variation, with lines from northern Sweden having particularly large numbers of copies. Strong selection is evident in the form of long-range linkage disequilibrium (LD), as well as in LD between nearby compensatory mutations. Many footprints of selective sweeps were found in lines from northern Sweden, and a massive global sweep was shown to have involved a 700-kb transposition.


Asunto(s)
Arabidopsis/genética , Variación Genética , Genoma de Planta , Selección Genética , Mapeo Cromosómico , Cromosomas de las Plantas , Variaciones en el Número de Copia de ADN , Evolución Molecular , Genética de Población , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Mutación INDEL , Desequilibrio de Ligamiento , Polimorfismo de Nucleótido Simple , Suecia
19.
Bioinformatics ; 29(9): 1220-2, 2013 May 01.
Artículo en Inglés | MEDLINE | ID: mdl-23479353

RESUMEN

SUMMARY: We present JAWAMix5, an out-of-core open-source toolkit for association mapping using high-throughput sequence data. Taking advantage of its HDF5-based implementation, JAWAMix5 stores genotype data on disk and accesses them as though stored in main memory. Therefore, it offers a scalable and fast analysis without concerns about memory usage, whatever the size of the dataset. We have implemented eight functions for association studies, including standard methods (linear models, linear mixed models, rare variants test, analysis in nested association mapping design and local variance component analysis), as well as a novel Bayesian local variance component analysis. Application to real data demonstrates that JAWAMix5 is reasonably fast compared with traditional solutions that load the complete dataset into memory, and that the memory usage is efficient regardless of the dataset size. AVAILABILITY: The source code, a 'batteries-included' executable and user manual can be freely downloaded from http://code.google.com/p/jawamix5/.


Asunto(s)
Estudio de Asociación del Genoma Completo/métodos , Programas Informáticos , Teorema de Bayes , Genotipo , Secuenciación de Nucleótidos de Alto Rendimiento , Modelos Lineales
20.
PLoS One ; 6(1): e15292, 2011 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-21264334

RESUMEN

With the advance of next-generation sequencing (NGS) technologies, increasingly ambitious applications are becoming feasible. A particularly powerful one is the sequencing of polymorphic, pooled samples. The pool can be naturally occurring, as in the case of multiple pathogen strains in a blood sample, multiple types of cells in a cancerous tissue sample, or multiple isoforms of mRNA in a cell. In these cases, it's difficult or impossible to partition the subtypes experimentally before sequencing, and those subtype frequencies must hence be inferred. In addition, investigators may occasionally want to artificially pool the sample of a large number of individuals for reasons of cost-efficiency, e.g., when carrying out genetic mapping using bulked segregant analysis. Here we describe PoolHap, a computational tool for inferring haplotype frequencies from pooled samples when haplotypes are known. The key insight into why PoolHap works is that the large number of SNPs that come with genome-wide coverage can compensate for the uneven coverage across the genome. The performance of PoolHap is illustrated and discussed using simulated and real data. We show that PoolHap is able to accurately estimate the proportions of haplotypes with less than 2% error for 34-strain mixtures with 2X total coverage Arabidopsis thaliana whole genome polymorphism data. This method should facilitate greater biological insight into heterogeneous samples that are difficult or impossible to isolate experimentally. Software and users manual are freely available at http://arabidopsis.gmi.oeaw.ac.at/quan/poolhap/.


Asunto(s)
Frecuencia de los Genes , Haplotipos , Análisis de Secuencia de ADN/métodos , Arabidopsis/genética , Biología Computacional , Genoma , Internet , Métodos , Polimorfismo de Nucleótido Simple , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...