Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 20
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Artículo en Inglés | MEDLINE | ID: mdl-38781064

RESUMEN

Semi-supervised learning (SSL) aims to train a machine learning (ML) model using both labeled and unlabeled data. While the unlabeled data have been used in various ways to improve the prediction accuracy, the reason why unlabeled data could help is not fully understood. One interesting and promising direction is to understand SSL from a causal perspective. In light of the independent causal mechanisms (ICM) principle, the unlabeled data can be helpful when the label causes the features but not vice versa. However, the causal relations between the features and labels can be complex in real world applications. In this article, we propose an SSL framework that works with general causal models in which the variables have flexible causal relations. More specifically, we explore the causal graph structures and design corresponding causal generative models which can be learned with the help of unlabeled data. The learned causal generative model can generate synthetic labeled data for training a more accurate predictive model. We verify the effectiveness of our proposed method by empirical studies on both simulated and real data.

2.
bioRxiv ; 2023 Nov 07.
Artículo en Inglés | MEDLINE | ID: mdl-37986738

RESUMEN

The var multigene family encodes the P. falciparum erythrocyte membrane protein 1 (PfEMP1), which is important in host-parasite interaction as a virulence factor and major surface antigen of the blood stages of the parasite, responsible for maintaining chronic infection. Whilst important in the biology of P. falciparum, these genes (50 to 60 genes per parasite genome) are routinely excluded from whole genome analyses due to their hyper-diversity, achieved primarily through recombination. The PfEMP1 head structure almost always consists of a DBLα-CIDR tandem. Categorised into different groups (upsA, upsB, upsC), different head structures have been associated with different ligand-binding affinities and disease severities. We study how conserved individual DBLα types are at the country, regional, and local scales in Sub-Saharan Africa. Using publicly-available sequence datasets and a novel ups classification algorithm, cUps, we performed an in silico exploration of DBLα conservation through time and space in Africa. In all three ups groups, the population structure of DBLα types in Africa consists of variants occurring at rare, low, moderate, and high frequencies. Non-rare variants were found to be temporally stable in a local area in endemic Ghana. When inspected across different geographical scales, we report different levels of conservation; while some DBLα types were consistently found in high frequencies in multiple African countries, others were conserved only locally, signifying local preservation of specific types. Underlying this population pattern is the composition of DBLα types within each isolate DBLα repertoire, revealed to also consist of a mix of types found at rare, low, moderate, and high frequencies in the population. We further discuss the adaptive forces and balancing selection, including host genetic factors, potentially shaping the evolution and diversity of DBLα types in Africa.

3.
Proc Natl Acad Sci U S A ; 120(33): e2203828120, 2023 08 15.
Artículo en Inglés | MEDLINE | ID: mdl-37549298

RESUMEN

Cellular omics such as single-cell genomics, proteomics, and microbiomics allow the characterization of tissue and microbial community composition, which can be compared between conditions to identify biological drivers. This strategy has been critical to revealing markers of disease progression, such as cancer and pathogen infection. A dedicated statistical method for differential variability analysis is lacking for cellular omics data, and existing methods for differential composition analysis do not model some compositional data properties, suggesting there is room to improve model performance. Here, we introduce sccomp, a method for differential composition and variability analyses that jointly models data count distribution, compositionality, group-specific variability, and proportion mean-variability association, being aware of outliers. sccomp provides a comprehensive analysis framework that offers realistic data simulation and cross-study knowledge transfer. Here, we demonstrate that mean-variability association is ubiquitous across technologies, highlighting the inadequacy of the very popular Dirichlet-multinomial distribution. We show that sccomp accurately fits experimental data, significantly improving performance over state-of-the-art algorithms. Using sccomp, we identified differential constraints and composition in the microenvironment of primary breast cancer.


Asunto(s)
Genómica , Microbiota , Proteómica/métodos , Simulación por Computador , Algoritmos
4.
Genome Biol ; 24(1): 66, 2023 04 06.
Artículo en Inglés | MEDLINE | ID: mdl-37024980

RESUMEN

Long-read single-cell RNA sequencing (scRNA-seq) enables the quantification of RNA isoforms in individual cells. However, long-read scRNA-seq using the Oxford Nanopore platform has largely relied upon matched short-read data to identify cell barcodes. We introduce BLAZE, which accurately and efficiently identifies 10x cell barcodes using only nanopore long-read scRNA-seq data. BLAZE outperforms the existing tools and provides an accurate representation of the cells present in long-read scRNA-seq when compared to matched short reads. BLAZE simplifies long-read scRNA-seq while improving the results, is compatible with downstream tools accepting a cell barcode file, and is available at https://github.com/shimlab/BLAZE .


Asunto(s)
Isoformas de ARN , Análisis de Expresión Génica de una Sola Célula , Análisis de la Célula Individual/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Perfilación de la Expresión Génica/métodos
5.
bioRxiv ; 2023 Dec 17.
Artículo en Inglés | MEDLINE | ID: mdl-38168317

RESUMEN

The human lung is structurally complex, with a diversity of specialized epithelial, stromal and immune cells playing specific functional roles in anatomically distinct locations, and large-scale changes in the structure and cellular makeup of this distal lung is a hallmark of pulmonary fibrosis (PF) and other progressive chronic lung diseases. Single-cell transcriptomic studies have revealed numerous disease-emergent/enriched cell types/states in PF lungs, but the spatial contexts wherein these cells contribute to disease pathogenesis has remained uncertain. Using sub-cellular resolution image-based spatial transcriptomics, we analyzed the gene expression of more than 1 million cells from 19 unique lungs. Through complementary cell-based and innovative cell-agnostic analyses, we characterized the localization of PF-emergent cell-types, established the cellular and molecular basis of classical PF histopathologic disease features, and identified a diversity of distinct molecularly-defined spatial niches in control and PF lungs. Using machine-learning and trajectory analysis methods to segment and rank airspaces on a gradient from normal to most severely remodeled, we identified a sequence of compositional and molecular changes that associate with progressive distal lung pathology, beginning with alveolar epithelial dysregulation and culminating with changes in macrophage polarization. Together, these results provide a unique, spatially-resolved characterization of the cellular and molecular programs of PF and control lungs, provide new insights into the heterogeneous pathobiology of PF, and establish analytical approaches which should be broadly applicable to other imaging-based spatial transcriptomic studies.

6.
BMC Bioinformatics ; 23(1): 460, 2022 Nov 03.
Artículo en Inglés | MEDLINE | ID: mdl-36329399

RESUMEN

BACKGROUND: Single-cell RNA sequencing (scRNA-seq) technology has contributed significantly to diverse research areas in biology, from cancer to development. Since scRNA-seq data is high-dimensional, a common strategy is to learn low-dimensional latent representations better to understand overall structure in the data. In this work, we build upon scVI, a powerful deep generative model which can learn biologically meaningful latent representations, but which has limited explicit control of batch effects. Rather than prioritizing batch effect removal over conservation of biological variation, or vice versa, our goal is to provide a bird's eye view of the trade-offs between these two conflicting objectives. Specifically, using the well established concept of Pareto front from economics and engineering, we seek to learn the entire trade-off curve between conservation of biological variation and removal of batch effects. RESULTS: A multi-objective optimisation technique known as Pareto multi-task learning (Pareto MTL) is used to obtain the Pareto front between conservation of biological variation and batch effect removal. Our results indicate Pareto MTL can obtain a better Pareto front than the naive scalarization approach typically encountered in the literature. In addition, we propose to measure batch effect by applying a neural-network based estimator called Mutual Information Neural Estimation (MINE) and show benefits over the more standard maximum mean discrepancy measure. CONCLUSION: The Pareto front between conservation of biological variation and batch effect removal is a valuable tool for researchers in computational biology. Our results demonstrate the efficacy of applying Pareto MTL to estimate the Pareto front in conjunction with applying MINE to measure the batch effect.


Asunto(s)
Algoritmos , Transcriptoma , Biología Computacional/métodos , Análisis de la Célula Individual
7.
Nucleic Acids Res ; 50(20): e118, 2022 11 11.
Artículo en Inglés | MEDLINE | ID: mdl-36107768

RESUMEN

Profiling gametes of an individual enables the construction of personalised haplotypes and meiotic crossover landscapes, now achievable at larger scale than ever through the availability of high-throughput single-cell sequencing technologies. However, high-throughput single-gamete data commonly have low depth of coverage per gamete, which challenges existing gamete-based haplotype phasing methods. In addition, haplotyping a large number of single gametes from high-throughput single-cell DNA sequencing data and constructing meiotic crossover profiles using existing methods requires intensive processing. Here, we introduce efficient software tools for the essential tasks of generating personalised haplotypes and calling crossovers in gametes from single-gamete DNA sequencing data (sgcocaller), and constructing, visualising, and comparing individualised crossover landscapes from single gametes (comapr). With additional data pre-possessing, the tools can also be applied to bulk-sequenced samples. We demonstrate that sgcocaller is able to generate impeccable phasing results for high-coverage datasets, on which it is more accurate and stable than existing methods, and also performs well on low-coverage single-gamete sequencing datasets for which current methods fail. Our tools achieve highly accurate results with user-friendly installation, comprehensive documentation, efficient computation times and minimal memory usage.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Algoritmos , Células Germinativas , Haplotipos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Expresión Génica de una Sola Célula , Programas Informáticos , Intercambio Genético
8.
Bioinformatics ; 38(15): 3741-3748, 2022 08 02.
Artículo en Inglés | MEDLINE | ID: mdl-35639973

RESUMEN

MOTIVATION: Long-read sequencing methods have considerable advantages for characterizing RNA isoforms. Oxford Nanopore sequencing records changes in electrical current when nucleic acid traverses through a pore. However, basecalling of this raw signal (known as a squiggle) is error prone, making it challenging to accurately identify splice junctions. Existing strategies include utilizing matched short-read data and/or annotated splice junctions to correct nanopore reads but add expense or limit junctions to known (incomplete) annotations. Therefore, a method that could accurately identify splice junctions solely from nanopore data would have numerous advantages. RESULTS: We developed 'NanoSplicer' to identify splice junctions using raw nanopore signal (squiggles). For each splice junction, the observed squiggle is compared to candidate squiggles representing potential junctions to identify the correct candidate. Measuring squiggle similarity enables us to compute the probability of each candidate junction and find the most likely one. We tested our method using (i) synthetic mRNAs with known splice junctions and (ii) biological mRNAs from a lung-cancer cell-line. The results from both datasets demonstrate NanoSplicer improves splice junction identification, especially when the basecalling error rate near the splice junction is elevated. AVAILABILITY AND IMPLEMENTATION: NanoSplicer is available at https://github.com/shimlab/NanoSplicer and archived at https://doi.org/10.5281/zenodo.6403849. Data is available from ENA: ERS7273757 and ERS7273453. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Secuenciación de Nanoporos , Nanoporos , Secuenciación de Nucleótidos de Alto Rendimiento , Probabilidad , Análisis de Secuencia de ADN , Programas Informáticos
9.
Bioinformatics ; 38(7): 1823-1829, 2022 03 28.
Artículo en Inglés | MEDLINE | ID: mdl-35025988

RESUMEN

MOTIVATION: Recombination is a fundamental process in molecular evolution, and the identification of recombinant sequences is thus of major interest. However, current methods for detecting recombinants are primarily designed for aligned sequences. Thus, they struggle with analyses of highly diverse genes, such as the var genes of the malaria parasite Plasmodium falciparum, which are known to diversify primarily through recombination. RESULTS: We introduce an algorithm to detect recent recombinant sequences from a dataset without a full multiple alignment. Our algorithm can handle thousands of gene-length sequences without the need for a reference panel. We demonstrate the accuracy of our algorithm through extensive numerical simulations; in particular, it maintains its effectiveness in the presence of insertions and deletions. We apply our algorithm to a dataset of 17 335 DBLα types in var genes from Ghana, observing that sequences belonging to the same ups group or domain subclass recombine amongst themselves more frequently, and that non-recombinant DBLα types are more conserved than recombinant ones. AVAILABILITY AND IMPLEMENTATION: Source code is freely available at https://github.com/qianfeng2/detREC_program. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Variación Genética , Proteínas Protozoarias , Proteínas Protozoarias/genética , Plasmodium falciparum/genética , Programas Informáticos , Evolución Molecular
10.
Artículo en Inglés | MEDLINE | ID: mdl-36998722

RESUMEN

The enormous diversity and complexity of var genes that diversify rapidly by recombination has led to the exclusion of assembly of these genes from major genome initiatives (e.g., Pf6). A scalable solution in epidemiological surveillance of var genes is to use a small 'tag' region encoding the immunogenic DBLα domain as a marker to estimate var diversity. As var genes diversify by recombination, it is not clear the extent to which the same tag can appear in multiple var genes. This relationship between marker and gene has not been investigated in natural populations. Analyses of in vitro recombination within and between var genes have suggested that this relationship would not be exclusive. Using a dataset of publicly-available assembled var sequences, we test this hypothesis by studying DBLα-var relationships for four study sites in four countries: Pursat (Cambodia) and Mae Sot (Thailand), representing low malaria transmission, and Navrongo (Ghana) and Chikwawa (Malawi), representing high malaria transmission. In all study sites, DBLα-var relationships were shown to be predominantly 1-to-1, followed by a second largest proportion of 1-to-2 DBLα-var relationships. This finding indicates that DBLα tags can be used to estimate not just DBLα diversity but var gene diversity when applied in a local endemic area. Epidemiological applications of this result are discussed.

11.
Bioinformatics ; 2021 Jan 30.
Artículo en Inglés | MEDLINE | ID: mdl-33515239

RESUMEN

MOTIVATION: Alternative splicing removes intronic sequences from pre-mRNAs in alternative ways to produce different forms (isoforms) of mature mRNA. The composition of expressed transcripts gives specific functionalities to cells in a particular condition or developmental stage. In addition, a large fraction of human disease mutations affect splicing and lead to aberrant mRNA and protein products. Current methods that interrogate the transcriptome based on RNA-seq either suffer from short read length when trying to infer full-length transcripts, or are restricted to predefined units of alternative splicing that they quantify from local read evidence. RESULTS: Instead of attempting to quantify individual outcomes of the splicing process such as local splicing events or full-length transcripts, we propose to quantify alternative splicing using a simplified probabilistic model of the underlying splicing process. Our model is based on the usage of individual splice sites and can generate arbitrarily complex types of splicing patterns. In our implementation, McSplicer, we estimate the parameters of our model using all read data at once and we demonstrate in our experiments that this yields more accurate estimates compared to competing methods. Our model is able to describe multiple effects of splicing mutations using few, easy to interpret parameters, as we illustrate in an experiment on RNA-seq data from autism spectrum disorder patients. AVAILABILITY: McSplicer source code is available at https://github.com/canzarlab/McSplicer and has been deposited in archived format at https://doi.org/10.5281/zenodo.4449881. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

12.
Biometrics ; 74(1): 270-279, 2018 03.
Artículo en Inglés | MEDLINE | ID: mdl-28099991

RESUMEN

Traditionally, phylogeny and sequence alignment are estimated separately: first estimate a multiple sequence alignment and then infer a phylogeny based on the sequence alignment estimated in the previous step. However, uncertainty in the alignment is ignored, resulting, possibly, in overstated certainty in phylogeny estimates. We develop a joint model for co-estimating phylogeny and sequence alignment which improves estimates from the traditional approach by accounting for uncertainty in the alignment in phylogenetic inferences. Our insertion and deletion (indel) model allows arbitrary-length overlapping indel events and a general distribution for indel fragment size. We employ a Bayesian approach using MCMC to estimate the joint posterior distribution of a phylogenetic tree and a multiple sequence alignment. Our approach has a tree and a complete history of indel events mapped onto the tree as the state space of the Markov Chain while alternative previous approaches have a tree and an alignment. A large state space containing a complete history of indel events makes our MCMC approach more challenging, but it enables us to infer more information about the indel process. The performances of this joint method and traditional sequential methods are compared using simulated data as well as real data. Software named BayesCAT (Bayesian Co-estimation of Alignment and Tree) is available at https://github.com/heejungshim/BayesCAT.


Asunto(s)
Teorema de Bayes , Filogenia , Alineación de Secuencia , Humanos , Mutación INDEL , Cadenas de Markov , Programas Informáticos
13.
Nat Genet ; 49(4): 550-558, 2017 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-28191888

RESUMEN

Animal promoters initiate transcription either at precise positions (narrow promoters) or dispersed regions (broad promoters), a distinction referred to as promoter shape. Although highly conserved, the functional properties of promoters with different shapes and the genetic basis of their evolution remain unclear. Here we used natural genetic variation across a panel of 81 Drosophila lines to measure changes in transcriptional start site (TSS) usage, identifying thousands of genetic variants affecting transcript levels (strength) or the distribution of TSSs within a promoter (shape). Our results identify promoter shape as a molecular trait that can evolve independently of promoter strength. Broad promoters typically harbor shape-associated variants, with signatures of adaptive selection. Single-cell measurements demonstrate that variants modulating promoter shape often increase expression noise, whereas heteroallelic interactions with other promoter variants alleviate these effects. These results uncover new functional properties of natural promoters and suggest the minimization of expression noise as an important factor in promoter evolution.


Asunto(s)
Variación Genética/genética , Regiones Promotoras Genéticas/genética , Animales , Evolución Biológica , Drosophila/genética , Ruido , Sitio de Iniciación de la Transcripción/fisiología , Transcripción Genética/genética
14.
Elife ; 52016 05 27.
Artículo en Inglés | MEDLINE | ID: mdl-27232982

RESUMEN

Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.


Asunto(s)
Biología Molecular/métodos , Sistemas de Lectura Abierta , Biosíntesis de Proteínas , Ribosomas/metabolismo , Línea Celular , Humanos , Linfocitos/fisiología
15.
PLoS One ; 10(9): e0138030, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26406244

RESUMEN

Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.


Asunto(s)
Algoritmos , Regulación de la Expresión Génica/fisiología , Modelos Genéticos , Elementos de Respuesta/fisiología , Factores de Transcripción/metabolismo , Línea Celular Tumoral , Humanos , Unión Proteica
16.
PLoS One ; 10(4): e0120758, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25898129

RESUMEN

We conducted a genome-wide association analysis of 7 subfractions of low density lipoproteins (LDLs) and 3 subfractions of intermediate density lipoproteins (IDLs) measured by gradient gel electrophoresis, and their response to statin treatment, in 1868 individuals of European ancestry from the Pharmacogenomics and Risk of Cardiovascular Disease study. Our analyses identified four previously-implicated loci (SORT1, APOE, LPA, and CETP) as containing variants that are very strongly associated with lipoprotein subfractions (log(10)Bayes Factor > 15). Subsequent conditional analyses suggest that three of these (APOE, LPA and CETP) likely harbor multiple independently associated SNPs. Further, while different variants typically showed different characteristic patterns of association with combinations of subfractions, the two SNPs in CETP show strikingly similar patterns--both in our original data and in a replication cohort--consistent with a common underlying molecular mechanism. Notably, the CETP variants are very strongly associated with LDL subfractions, despite showing no association with total LDLs in our study, illustrating the potential value of the more detailed phenotypic measurements. In contrast with these strong subfraction associations, genetic association analysis of subfraction response to statins showed much weaker signals (none exceeding log(10)Bayes Factor of 6). However, two SNPs (in APOE and LPA) previously-reported to be associated with LDL statin response do show some modest evidence for association in our data, and the subfraction response proles at the LPA SNP are consistent with the LPA association, with response likely being due primarily to resistance of Lp(a) particles to statin therapy. An additional important feature of our analysis is that, unlike most previous analyses of multiple related phenotypes, we analyzed the subfractions jointly, rather than one at a time. Comparisons of our multivariate analyses with standard univariate analyses demonstrate that multivariate analyses can substantially increase power to detect associations. Software implementing our multivariate analysis methods is available at http://stephenslab.uchicago.edu/software.html.


Asunto(s)
Estudio de Asociación del Genoma Completo , Inhibidores de Hidroximetilglutaril-CoA Reductasas/uso terapéutico , Lipoproteínas LDL/genética , Polimorfismo de Nucleótido Simple/genética , Anciano , Teorema de Bayes , Femenino , Genoma Humano , Humanos , Masculino , Persona de Mediana Edad , Fenotipo , Población Blanca
17.
Ann Appl Stat ; 9(2): 655-686, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-29399242

RESUMEN

Understanding how genetic variants influence cellular-level processes is an important step toward understanding how they influence important organismal-level traits, or "phenotypes," including human disease susceptibility. To this end, scientists are undertaking large-scale genetic association studies that aim to identify genetic variants associated with molecular and cellular phenotypes, such as gene expression, transcription factor binding, or chromatin accessibility. These studies use high-throughput sequencing assays (e.g., RNA-seq, ChIP-seq, DNase-seq) to obtain high-resolution data on how the traits vary along the genome in each sample. However, typical association analyses fail to exploit these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length. Here we develop and apply statistical methods that better exploit the high-resolution data. The key idea is to treat the sequence data as measuring an underlying "function" that varies along the genome, and then, building on wavelet-based methods for functional data analysis, test for association between genetic variants and the underlying function. Applying these methods to identify genetic variants associated with chromatin accessibility (dsQTLs), we find that they identify substantially more associations than a simpler window-based analysis, and in total we identify 772 novel dsQTLs not identified by the original analysis.

18.
Nature ; 502(7471): 377-80, 2013 Oct 17.
Artículo en Inglés | MEDLINE | ID: mdl-23995691

RESUMEN

Statins are prescribed widely to lower plasma low-density lipoprotein (LDL) concentrations and cardiovascular disease risk and have been shown to have beneficial effects in a broad range of patients. However, statins are associated with an increased risk, albeit small, of clinical myopathy and type 2 diabetes. Despite evidence for substantial genetic influence on LDL concentrations, pharmacogenomic trials have failed to identify genetic variations with large effects on either statin efficacy or toxicity, and have produced little information regarding mechanisms that modulate statin response. Here we identify a downstream target of statin treatment by screening for the effects of in vitro statin exposure on genetic associations with gene expression levels in lymphoblastoid cell lines derived from 480 participants of a clinical trial of simvastatin treatment. This analysis identified six expression quantitative trait loci (eQTLs) that interacted with simvastatin exposure, including rs9806699, a cis-eQTL for the gene glycine amidinotransferase (GATM) that encodes the rate-limiting enzyme in creatine synthesis. We found this locus to be associated with incidence of statin-induced myotoxicity in two separate populations (meta-analysis odds ratio = 0.60). Furthermore, we found that GATM knockdown in hepatocyte-derived cell lines attenuated transcriptional response to sterol depletion, demonstrating that GATM may act as a functional link between statin-mediated lowering of cholesterol and susceptibility to statin-induced myopathy.


Asunto(s)
Amidinotransferasas/genética , Regulación de la Expresión Génica/efectos de los fármacos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/efectos adversos , Enfermedades Musculares/inducido químicamente , Sitios de Carácter Cuantitativo/genética , Simvastatina/efectos adversos , Amidinotransferasas/deficiencia , Amidinotransferasas/metabolismo , Línea Celular , Colesterol/deficiencia , Colesterol/metabolismo , Colesterol/farmacología , Técnicas de Silenciamiento del Gen , Humanos , Inhibidores de Hidroximetilglutaril-CoA Reductasas/farmacología , Linfocitos/citología , Linfocitos/efectos de los fármacos , Linfocitos/metabolismo , Enfermedades Musculares/genética , Enfermedades Musculares/metabolismo , Polimorfismo de Nucleótido Simple/genética , Simvastatina/farmacología , Proteínas de Unión a los Elementos Reguladores de Esteroles/metabolismo , Transcripción Genética/efectos de los fármacos
19.
BMC Proc ; 3 Suppl 7: S35, 2009 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-20018026

RESUMEN

The high genomic density of the single-nucleotide polymorphism (SNP) sets that are typically surveyed in genome-wide association studies (GWAS) now allows the application of haplotype-based methods. Although the choice of haplotype-based vs. individual-SNP approaches is expected to affect the results of association studies, few empirical comparisons of method performance have been reported on the genome-wide scale in the same set of individuals. To measure the relative ability of the two strategies to detect associations, we used a large dataset from the North American Rheumatoid Arthritis Consortium to: 1) partition the genome into haplotype blocks, 2) associate haplotypes with disease, and 3) compare the results with individual-SNP association mapping. Although some associations were shared across methods, each approach uniquely identified several strong candidate regions. Our results suggest that the application of both haplotype-based and individual-SNP testing to GWAS should be adopted as a routine procedure.

20.
Biostatistics ; 9(1): 51-65, 2008 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-17533175

RESUMEN

Identifying binding locations of transcription factors (TFs) within long segments of noncoding DNA is a challenging task. Recent chromatin immunoprecipitation on microarray (ChIP-chip) experiments utilizing tiling arrays are especially promising for this task since they provide high-resolution genome-wide maps of the interactions between the TFs and the DNA. Data from these experiments are invaluable for characterizing DNA recognition profiles (regulatory motifs) of TFs. A 2-step paradigm is commonly used for performing motif searches based on ChIP-chip data. First, candidate bound sequences that are in the order of 500-1000 bp are identified from ChIP-chip data. Then, motif searches are performed among these sequences. These 2 steps are typically carried out in a disconnected fashion in the sense that the quantitative nature of the ChIP-chip information is ignored in the second step. More specifically, all bound regions are assumed to equally likely have the motif(s), and the motifs are assumed to reside at any position of the bound regions with equal probability. We develop a conditional two-component mixture (CTCM) model that relaxes both these common assumptions by adaptively incorporating ChIP-chip information. The performances of the new and existing methods are compared using simulated data and ChIP-chip data from recently available ENCODE studies (Consortium, 2004). These studies indicate that CTCM efficiently utilizes the information available in the ChIP-chip experiments and has superior sensitivity and specificity especially when the motif of interest has low abundance among the ChIP-chip bound regions and/or low information content.


Asunto(s)
Inmunoprecipitación de Cromatina/métodos , ADN/análisis , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , Factores de Transcripción/análisis , Sitios de Unión , Simulación por Computador , Humanos , Modelos Estadísticos , Proteínas Proto-Oncogénicas c-jun/análisis , Factor de Transcripción STAT1/análisis
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...