Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 14 de 14
Filtrar
1.
Bioinformatics ; 29(6): 711-6, 2013 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-23297036

RESUMO

MOTIVATION: The identification of nucleosomes along the chromatin is key to understanding their role in the regulation of gene expression and other DNA-related processes. However, current experimental methods (MNase-ChIP, MNase-Seq) sample nucleosome positions from a cell population and contain biases, making thus the precise identification of individual nucleosomes not straightforward. Recent works have only focused on the first point, where noise reduction approaches have been developed to identify nucleosome positions. RESULTS: In this article, we propose a new approach, termed NucleoFinder, that addresses both the positional heterogeneity across cells and experimental biases by seeking nucleosomes consistently positioned in a cell population and showing a significant enrichment relative to a control sample. Despite the absence of validated dataset, we show that our approach (i) detects fewer false positives than two other nucleosome calling methods and (ii) identifies two important features of the nucleosome organization (the nucleosome spacing downstream of active promoters and the enrichment/depletion of GC/AT dinucleotides at the centre of in vitro nucleosomes) with equal or greater ability than the other two methods.


Assuntos
Modelos Estatísticos , Nucleossomos/química , Linhagem Celular , Genoma Humano , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Análise de Sequência de DNA
2.
Genet Epidemiol ; 36(5): 451-62, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22549815

RESUMO

Significance testing one SNP at a time has proven useful for identifying genomic regions that harbor variants affecting human disease. But after an initial genome scan has identified a "hit region" of association, single-locus approaches can falter. Local linkage disequilibrium (LD) can make both the number of underlying true signals and their identities ambiguous. Simultaneous modeling of multiple loci should help. However, it is typically applied ad hoc: conditioning on the top SNPs, with limited exploration of the model space and no assessment of how sensitive model choice was to sampling variability. Formal alternatives exist but are seldom used. Bayesian variable selection is coherent but requires specifying a full joint model, including priors on parameters and the model space. Penalized regression methods (e.g., LASSO) appear promising but require calibration, and, once calibrated, lead to a choice of SNPs that can be misleadingly decisive. We present a general method for characterizing uncertainty in model choice that is tailored to reprioritizing SNPs within a hit region under strong LD. Our method, LASSO local automatic regularization resample model averaging (LLARRMA), combines LASSO shrinkage with resample model averaging and multiple imputation, estimating for each SNP the probability that it would be included in a multi-SNP model in alternative realizations of the data. We apply LLARRMA to simulations based on case-control genome-wide association studies data, and find that when there are several causal loci and strong LD, LLARRMA identifies a set of candidates that is enriched for true signals relative to single locus analysis and to the recently proposed method of Stability Selection.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Algoritmos , Teorema de Bayes , Calibragem , Estudos de Casos e Controles , Mapeamento Cromossômico , Simulação por Computador , Genótipo , Humanos , Modelos Genéticos , Modelos Estatísticos , Modelos Teóricos , Epidemiologia Molecular/métodos , Curva ROC , Análise de Regressão
3.
Am J Gastroenterol ; 108(11): 1785-93, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-24042191

RESUMO

OBJECTIVES: Microsatellite instability (MSI) is an established marker of good prognosis in colorectal cancer (CRC). Chromosomal instability (CIN) is strongly negatively associated with MSI and has been shown to be a marker of poor prognosis in a small number of studies. However, a substantial group of "double-negative" (MSI-/CIN-) CRCs exists. The prognosis of these patients is unclear. Furthermore, MSI and CIN are each associated with specific molecular changes, such as mutations in KRAS and BRAF, that have been associated with prognosis. It is not known which of MSI, CIN, and the specific gene mutations are primary predictors of survival. METHODS: We evaluated the prognostic value (disease-free survival, DFS) of CIN, MSI, mutations in KRAS, NRAS, BRAF, PIK3CA, FBXW7, and TP53, and chromosome 18q loss-of-heterozygosity (LOH) in 822 patients from the VICTOR trial of stage II/III CRC. We followed up promising associations in an Australian community-based cohort (N=375). RESULTS: In the VICTOR patients, no specific mutation was associated with DFS, but individually MSI and CIN showed significant associations after adjusting for stage, age, gender, tumor location, and therapy. A combined analysis of the VICTOR and community-based cohorts showed that MSI and CIN were independent predictors of DFS (for MSI, hazard ratio (HR)=0.58, 95% confidence interval (CI) 0.36-0.93, and P=0.021; for CIN, HR=1.54, 95% CI 1.14-2.08, and P=0.005), and joint CIN/MSI testing significantly improved the prognostic prediction of MSI alone (P=0.028). Higher levels of CIN were monotonically associated with progressively poorer DFS, and a semi-quantitative measure of CIN was a better predictor of outcome than a simple CIN+/- variable. All measures of CIN predicted DFS better than the recently described Watanabe LOH ratio. CONCLUSIONS: MSI and CIN are independent predictors of DFS for stage II/III CRC. Prognostic molecular tests for CRC relapse should currently use MSI and a quantitative measure of CIN rather than specific gene mutations.


Assuntos
Neoplasias Colorretais/genética , Neoplasias Colorretais/mortalidade , Instabilidade de Microssatélites , Mutação , Idoso , Idoso de 80 Anos ou mais , Proteínas de Ciclo Celular/genética , Aberrações Cromossômicas , Classe I de Fosfatidilinositol 3-Quinases , Neoplasias Colorretais/patologia , Análise Mutacional de DNA , Intervalo Livre de Doença , Proteínas F-Box/genética , Proteína 7 com Repetições F-Box-WD , Feminino , GTP Fosfo-Hidrolases/genética , Humanos , Perda de Heterozigosidade , Masculino , Proteínas de Membrana/genética , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Fosfatidilinositol 3-Quinases/genética , Prognóstico , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas B-raf/genética , Proteínas Proto-Oncogênicas p21(ras) , Taxa de Sobrevida , Proteína Supressora de Tumor p53/genética , Ubiquitina-Proteína Ligases/genética , Proteínas ras/genética
4.
Bioinformatics ; 26(16): 1999-2003, 2010 Aug 15.
Artigo em Inglês | MEDLINE | ID: mdl-20554688

RESUMO

MOTIVATION: Quantifying differences in linkage disequilibrium (LD) between sub-groups can highlight genetic regions or sites under selection and/or associated with disease, and may have utility in trans-ethnic mapping studies. RESULTS: We present a novel pseudo Bayes factor (PBF) approach that assess differences in covariance of genotype frequencies from single nucleotide polymorphism (SNP) data from a genome-wide study. The magnitude of the PBF reflects the strength of evidence for a difference, while accounting for the sample size and number of SNPs, without the requirement for permutation testing to establish statistical significance. Application of the PBF to HapMap and Gambian malaria SNP data reveals regional LD differences, some known to be under selection. AVAILABILITY AND IMPLEMENTATION: The PBF approach has been implemented in the BALD (Bayesian analysis of LD differences) C++ software, and is available from http://homepages.lshtm.ac.uk/tgclark/downloads.


Assuntos
Desequilíbrio de Ligação , Polimorfismo de Nucleotídeo Único , Teorema de Bayes , Genoma , Estudo de Associação Genômica Ampla , Genótipo , Humanos , Tamanho da Amostra , Software
5.
Bioinformatics ; 24(19): 2209-14, 2008 Oct 01.
Artigo em Inglês | MEDLINE | ID: mdl-18653518

RESUMO

UNLABELLED: Current genotyping algorithms typically call genotypes by clustering allele-specific intensity data on a single nucleotide polymorphism (SNP) by SNP basis. This approach assumes the availability of a large number of control samples that have been sampled on the same array and platform. We have developed a SNP genotyping algorithm for the Illumina Infinium SNP genotyping assay that is entirely within-sample and does not require the need for a population of control samples nor parameters derived from such a population. Our algorithm exhibits high concordance with current methods and >99% call accuracy on HapMap samples. The ability to call genotypes using only within-sample information makes the method computationally light and practical for studies involving small sample sizes and provides a valuable independent quality control metric for other population-based approaches. AVAILABILITY: http://www.stats.ox.ac.uk/~giannoul/GenoSNP/.


Assuntos
Algoritmos , Biologia Computacional/métodos , Polimorfismo de Nucleotídeo Único , Análise por Conglomerados , Genótipo , Humanos , Modelos Estatísticos , Grupos Populacionais/genética
6.
Nucleic Acids Res ; 35(6): 2013-25, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17341461

RESUMO

Array-based technologies have been used to detect chromosomal copy number changes (aneuploidies) in the human genome. Recent studies identified numerous copy number variants (CNV) and some are common polymorphisms that may contribute to disease susceptibility. We developed, and experimentally validated, a novel computational framework (QuantiSNP) for detecting regions of copy number variation from BeadArray SNP genotyping data using an Objective Bayes Hidden-Markov Model (OB-HMM). Objective Bayes measures are used to set certain hyperparameters in the priors using a novel re-sampling framework to calibrate the model to a fixed Type I (false positive) error rate. Other parameters are set via maximum marginal likelihood to prior training data of known structure. QuantiSNP provides probabilistic quantification of state classifications and significantly improves the accuracy of segmental aneuploidy identification and mapping, relative to existing analytical tools (Beadstudio, Illumina), as demonstrated by validation of breakpoint boundaries. QuantiSNP identified both novel and validated CNVs. QuantiSNP was developed using BeadArray SNP data but it can be adapted to other platforms and we believe that the OB-HMM framework has widespread applicability in genomic research. In conclusion, QuantiSNP is a novel algorithm for high-resolution CNV/aneuploidy detection with application to clinical genetics, cancer and disease association studies.


Assuntos
Algoritmos , Aneuploidia , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Genômica/métodos , Polimorfismo de Nucleotídeo Único , Teorema de Bayes , Quebra Cromossômica , Genoma Humano , Genótipo , Humanos , Perda de Heterozigosidade , Cadeias de Markov , Modelos Estatísticos
7.
J Am Stat Assoc ; 111(513): 200-215, 2016 01 02.
Artigo em Inglês | MEDLINE | ID: mdl-27226674

RESUMO

Hidden Markov models (HMMs) are one of the most widely used statistical methods for analyzing sequence data. However, the reporting of output from HMMs has largely been restricted to the presentation of the most-probable (MAP) hidden state sequence, found via the Viterbi algorithm, or the sequence of most probable marginals using the forward-backward algorithm. In this article, we expand the amount of information we could obtain from the posterior distribution of an HMM by introducing linear-time dynamic programming recursions that, conditional on a user-specified constraint in the number of segments, allow us to (i) find MAP sequences, (ii) compute posterior probabilities, and (iii) simulate sample paths. We collectively call these recursions k-segment algorithms and illustrate their utility using simulated and real examples. We also highlight the prospective and retrospective use of k-segment constraints for fitting HMMs or exploring existing model fits. Supplementary materials for this article are available online.

8.
PLoS One ; 10(5): e0127882, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25992607

RESUMO

Expression Quantitative Trait Loci (eQTL) analysis enables characterisation of functional genetic variation influencing expression levels of individual genes. In outbread populations, including humans, eQTLs are commonly analysed using the conventional linear model, adjusting for relevant covariates, assuming an allelic dosage model and a Gaussian error term. However, gene expression data generally have noise that induces heavy-tailed errors relative to the Gaussian distribution and often include atypical observations, or outliers. Such departures from modelling assumptions can lead to an increased rate of type II errors (false negatives), and to some extent also type I errors (false positives). Careful model checking can reduce the risk of type-I errors but often not type II errors, since it is generally too time-consuming to carefully check all models with a non-significant effect in large-scale and genome-wide studies. Here we propose the application of a robust linear model for eQTL analysis to reduce adverse effects of deviations from the assumption of Gaussian residuals. We present results from a simulation study as well as results from the analysis of real eQTL data sets. Our findings suggest that in many situations robust models have the potential to provide more reliable eQTL results compared to conventional linear models, particularly in respect to reducing type II errors due to non-Gaussian noise. Post-genomic data, such as that generated in genome-wide eQTL studies, are often noisy and frequently contain atypical observations. Robust statistical models have the potential to provide more reliable results and increased statistical power under non-Gaussian conditions. The results presented here suggest that robust models should be considered routinely alongside other commonly used methodologies for eQTL analysis.


Assuntos
Modelos Lineares , Locos de Características Quantitativas , Bases de Dados Genéticas , Expressão Gênica , Humanos , Polimorfismo de Nucleotídeo Único
9.
Nat Genet ; 45(2): 136-44, 2013 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-23263490

RESUMO

Many individuals with multiple or large colorectal adenomas or early-onset colorectal cancer (CRC) have no detectable germline mutations in the known cancer predisposition genes. Using whole-genome sequencing, supplemented by linkage and association analysis, we identified specific heterozygous POLE or POLD1 germline variants in several multiple-adenoma and/or CRC cases but in no controls. The variants associated with susceptibility, POLE p.Leu424Val and POLD1 p.Ser478Asn, have high penetrance, and POLD1 mutation was also associated with endometrial cancer predisposition. The mutations map to equivalent sites in the proofreading (exonuclease) domain of DNA polymerases ɛ and δ and are predicted to cause a defect in the correction of mispaired bases inserted during DNA replication. In agreement with this prediction, the tumors from mutation carriers were microsatellite stable but tended to acquire base substitution mutations, as confirmed by yeast functional assays. Further analysis of published data showed that the recently described group of hypermutant, microsatellite-stable CRCs is likely to be caused by somatic POLE mutations affecting the exonuclease domain.


Assuntos
Adenoma/genética , Neoplasias Colorretais/genética , Reparo de Erro de Pareamento de DNA/genética , DNA Polimerase III/genética , DNA Polimerase II/genética , Replicação do DNA/genética , Modelos Moleculares , Exodesoxirribonucleases/genética , Ligação Genética , Estudo de Associação Genômica Ampla , Mutação em Linhagem Germinativa/genética , Humanos , Repetições de Microssatélites/genética , Linhagem , Proteínas de Ligação a Poli-ADP-Ribose , Schizosaccharomyces/genética , Análise de Sequência de DNA
10.
J Comput Graph Stat ; 19(4): 769-789, 2010 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-22003276

RESUMO

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.

11.
Genome Biol ; 11(9): R92, 2010.
Artigo em Inglês | MEDLINE | ID: mdl-20858232

RESUMO

We describe a statistical method for the characterization of genomic aberrations in single nucleotide polymorphism microarray data acquired from cancer genomes. Our approach allows us to model the joint effect of polyploidy, normal DNA contamination and intra-tumour heterogeneity within a single unified Bayesian framework. We demonstrate the efficacy of our method on numerous datasets including laboratory generated mixtures of normal-cancer cell lines and real primary tumours.


Assuntos
Interpretação Estatística de Dados , Modelos Genéticos , Neoplasias/genética , Polimorfismo de Nucleotídeo Único , Algoritmos , Teorema de Bayes , Linhagem Celular Tumoral , Contaminação por DNA , Variações do Número de Cópias de DNA , Heterogeneidade Genética , Genoma Humano , Genótipo , Humanos , Análise em Microsséries , Mutação , Poliploidia
12.
Nat Genet ; 42(11): 949-60, 2010 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-20935629

RESUMO

Waist-hip ratio (WHR) is a measure of body fat distribution and a predictor of metabolic consequences independent of overall adiposity. WHR is heritable, but few genetic variants influencing this trait have been identified. We conducted a meta-analysis of 32 genome-wide association studies for WHR adjusted for body mass index (comprising up to 77,167 participants), following up 16 loci in an additional 29 studies (comprising up to 113,636 subjects). We identified 13 new loci in or near RSPO3, VEGFA, TBX15-WARS2, NFE2L3, GRB14, DNM3-PIGC, ITPR2-SSPN, LY86, HOXC13, ADAMTS9, ZNRF3-KREMEN1, NISCH-STAB1 and CPEB4 (P = 1.9 × 10⁻9 to P = 1.8 × 10⁻4°) and the known signal at LYPLAL1. Seven of these loci exhibited marked sexual dimorphism, all with a stronger effect on WHR in women than men (P for sex difference = 1.9 × 10⁻³ to P = 1.2 × 10⁻¹³). These findings provide evidence for multiple loci that modulate body fat distribution independent of overall adiposity and reveal strong gene-by-sex interactions.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Relação Cintura-Quadril , Tecido Adiposo/anatomia & histologia , Fatores Etários , Mapeamento Cromossômico , Feminino , Genoma Humano , Humanos , Masculino , Metanálise como Assunto , Caracteres Sexuais
13.
Genetics ; 182(4): 1263-77, 2009 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-19474203

RESUMO

Highly recombinant populations derived from inbred lines, such as advanced intercross lines and heterogeneous stocks, can be used to map loci far more accurately than is possible with standard intercrosses. However, the varying degrees of relatedness that exist between individuals complicate analysis, potentially leading to many false positive signals. We describe a method to deal with these problems that does not require pedigree information and accounts for model uncertainty through model averaging. In our method, we select multiple quantitative trait loci (QTL) models using forward selection applied to resampled data sets obtained by nonparametric bootstrapping and subsampling. We provide model-averaged statistics about the probability of loci or of multilocus regions being included in model selection, and this leads to more accurate identification of QTL than by single-locus mapping. The generality of our approach means it can potentially be applied to any population of unknown structure.


Assuntos
Mapeamento Cromossômico , Modelos Genéticos , Modelos Estatísticos , Cruzamentos Genéticos , Genética Populacional , Locos de Características Quantitativas
14.
Proc Natl Acad Sci U S A ; 102(47): 16939-44, 2005 Nov 22.
Artigo em Inglês | MEDLINE | ID: mdl-16287981

RESUMO

We present a method for Bayesian model-based hierarchical coclustering of gene expression data and use it to study the temporal transcription responses of an Anopheles gambiae cell line upon challenge with multiple microbial elicitors. The method fits statistical regression models to the gene expression time series for each experiment and performs coclustering on the genes by optimizing a joint probability model, characterizing gene coregulation between multiple experiments. We compute the model using a two-stage Expectation-Maximization-type algorithm, first fixing the cross-experiment covariance structure and using efficient Bayesian hierarchical clustering to obtain a locally optimal clustering of the gene expression profiles and then, conditional on that clustering, carrying out Bayesian inference on the cross-experiment covariance using Markov chain Monte Carlo simulation to obtain an expectation. For the problem of model choice, we use a cross-validatory approach to decide between individual experiment modeling and varying levels of coclustering. Our method successfully generates tightly coregulated clusters of genes that are implicated in related processes and therefore can be used for analysis of global transcript responses to various stimuli and prediction of gene functions.


Assuntos
Anopheles/genética , Anopheles/imunologia , Expressão Gênica/imunologia , Algoritmos , Animais , Anopheles/efeitos dos fármacos , Anopheles/microbiologia , Teorema de Bayes , Linhagem Celular , Análise por Conglomerados , Expressão Gênica/efeitos dos fármacos , Perfilação da Expressão Gênica , Imunidade/genética , Imunidade/fisiologia , Modelos Genéticos , Zimosan/farmacologia
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa