Pesquisa | Portal de Pesquisa da BVS

GWASeq: targeted re-sequencing follow up to GWAS.

Salomon, Matthew P; Li, Wai Lok Sibon; Edlund, Christopher K; Morrison, John; Fortini, Barbara K; Win, Aung Ko; Conti, David V; Thomas, Duncan C; Duggan, David; Buchanan, Daniel D; Jenkins, Mark A; Hopper, John L; Gallinger, Steven; Le Marchand, Loïc; Newcomb, Polly A; Casey, Graham; Marjoram, Paul.

BMC Genomics ; 17: 176, 2016 Mar 03.

Artigo em Inglês | MEDLINE | ID: mdl-26940994

RESUMO

BACKGROUND: For the last decade the conceptual framework of the Genome-Wide Association Study (GWAS) has dominated the investigation of human disease and other complex traits. While GWAS have been successful in identifying a large number of variants associated with various phenotypes, the overall amount of heritability explained by these variants remains small. This raises the question of how best to follow up on a GWAS, localize causal variants accounting for GWAS hits, and as a consequence explain more of the so-called "missing" heritability. Advances in high throughput sequencing technologies now allow for the efficient and cost-effective collection of vast amounts of fine-scale genomic data to complement GWAS. RESULTS: We investigate these issues using a colon cancer dataset. After QC, our data consisted of 1993 cases, 899 controls. Using marginal tests of associations, we identify 10 variants distributed among six targeted regions that are significantly associated with colorectal cancer, with eight of the variants being novel to this study. Additionally, we perform so-called 'SNP-set' tests of association and identify two sets of variants that implicate both common and rare variants in the etiology of colorectal cancer. CONCLUSIONS: Here we present a large-scale targeted re-sequencing resource focusing on genomic regions implicated in colorectal cancer susceptibility previously identified in several GWAS, which aims to 1) provide fine-scale targeted sequencing data for fine-mapping and 2) provide data resources to address methodological questions regarding the design of sequencing-based follow-up studies to GWAS. Additionally, we show that this strategy successfully identifies novel variants associated with colorectal cancer susceptibility and can implicate both common and rare variants.

Assuntos

Estudo de Associação Genômica Ampla , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Mapeamento Cromossômico , Neoplasias do Colo/genética , Biologia Computacional , Variação Genética , Humanos

Accurate model selection of relaxed molecular clocks in bayesian phylogenetics.

Baele, Guy; Li, Wai Lok Sibon; Drummond, Alexei J; Suchard, Marc A; Lemey, Philippe.

Mol Biol Evol ; 30(2): 239-43, 2013 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-23090976

RESUMO

Recent implementations of path sampling (PS) and stepping-stone sampling (SS) have been shown to outperform the harmonic mean estimator (HME) and a posterior simulation-based analog of Akaike's information criterion through Markov chain Monte Carlo (AICM), in bayesian model selection of demographic and molecular clock models. Almost simultaneously, a bayesian model averaging approach was developed that avoids conditioning on a single model but averages over a set of relaxed clock models. This approach returns estimates of the posterior probability of each clock model through which one can estimate the Bayes factor in favor of the maximum a posteriori (MAP) clock model; however, this Bayes factor estimate may suffer when the posterior probability of the MAP model approaches 1. Here, we compare these two recent developments with the HME, stabilized/smoothed HME (sHME), and AICM, using both synthetic and empirical data. Our comparison shows reassuringly that MAP identification and its Bayes factor provide similar performance to PS and SS and that these approaches considerably outperform HME, sHME, and AICM in selecting the correct underlying clock model. We also illustrate the importance of using proper priors on a large set of empirical data sets.

Assuntos

Evolução Molecular , Modelos Genéticos , Filogenia , Animais , Teorema de Bayes , Simulação por Computador , Humanos

Model averaging and Bayes factor calculation of relaxed molecular clocks in Bayesian phylogenetics.

Li, Wai Lok Sibon; Drummond, Alexei J.

Mol Biol Evol ; 29(2): 751-61, 2012 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-21940644

RESUMO

We describe a procedure for model averaging of relaxed molecular clock models in Bayesian phylogenetics. Our approach allows us to model the distribution of rates of substitution across branches, averaged over a set of models, rather than conditioned on a single model. We implement this procedure and test it on simulated data to show that our method can accurately recover the true underlying distribution of rates. We applied the method to a set of alignments taken from a data set of 12 mammalian species and uncovered evidence that lognormally distributed rates better describe this data set than do exponentially distributed rates. Additionally, our implementation of model averaging permits accurate calculation of the Bayes factor(s) between two or more relaxed molecular clock models. Finally, we introduce a new computational approach for sampling rates of substitution across branches that improves the convergence of our Markov chain Monte Carlo algorithms in this context. Our methods are implemented under the BEAST 1.6 software package, available at http://beast-mcmc.googlecode.com.

Assuntos

Evolução Molecular , Mamíferos/genética , Modelos Genéticos , Filogenia , Algoritmos , Animais , Teorema de Bayes , Humanos , Modelos Teóricos , Alinhamento de Sequência

Covariation of branch lengths in phylogenies of functionally related genes.

Li, Wai Lok Sibon; Rodrigo, Allen G.

PLoS One ; 4(12): e8487, 2009 Dec 29.

Artigo em Inglês | MEDLINE | ID: mdl-20041191

RESUMO

Recent studies have shown evidence for the coevolution of functionally-related genes. This coevolution is a result of constraints to maintain functional relationships between interacting proteins. The studies have focused on the correlation in gene tree branch lengths of proteins that are directly interacting with each other. We here hypothesize that the correlation in branch lengths is not limited only to proteins that directly interact, but also to proteins that operate within the same pathway. Using generalized linear models as a basis of identifying correlation, we attempted to predict the gene ontology (GO) terms of a gene based on its gene tree branch lengths. We applied our method to a dataset consisting of proteins from ten prokaryotic species. We found that the degree of accuracy to which we could predict the function of the proteins from their gene tree varied substantially with different GO terms. In particular, our model could accurately predict genes involved in translation and certain ribosomal activities with the area of the receiver-operator curve of up to 92%. Further analysis showed that the similarity between the trees of genes labeled with similar GO terms was not limited to genes that physically interacted, but also extended to genes functioning within the same pathway. We discuss the relevance of our findings as it relates to the use of phylogenetic methods in comparative genomics.

Assuntos

Genes/genética , Filogenia , Animais , Bactérias/genética , Bases de Dados Genéticas , Humanos , Modelos Lineares , Curva ROC , Ribossomos/genética

Testing the reliability of genetic methods of species identification via simulation.

Ross, Howard A; Murugan, Sumathi; Li, Wai Lok Sibon.

Syst Biol ; 57(2): 216-30, 2008 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-18398767

RESUMO

Although genetic methods of species identification, especially DNA barcoding, are strongly debated, tests of these methods have been restricted to a few empirical cases for pragmatic reasons. Here we use simulation to test the performance of methods based on sequence comparison (BLAST and genetic distance) and tree topology over a wide range of evolutionary scenarios. Sequences were simulated on a range of gene trees spanning almost three orders of magnitude in tree depth and in coalescent depth; that is, deep or shallow trees with deep or shallow coalescences. When the query's conspecific sequences were included in the reference alignment, the rate of positive identification was related to the degree to which different species were genetically differentiated. The BLAST, distance, and liberal tree-based methods returned higher rates of correct identification than did the strict tree-based requirement that the query was within, but not sister to, a single-species clade. Under this more conservative approach, ambiguous outcomes occurred in inverse proportion to the number of reference sequences per species. When the query's conspecific sequences were not in the reference alignment, only the strict tree-based approach was relatively immune to making false-positive identifications. Thresholds affected the rates at which false-positive identifications were made when the query's species was unrepresented in the reference alignment but did not otherwise influence outcomes. A conservative approach using the strict tree-based method should be used initially in large-scale identification systems, with effort made to maximize sequence sampling within species. Once the genetic variation within a taxonomic group is well characterized and the taxonomy resolved, then the choice of method used should be dictated by considerations of computational efficiency. The requirement for extensive genetic sampling may render these techniques inappropriate in some circumstances.

Assuntos

Simulação por Computador , Especiação Genética , Modelos Genéticos , Animais , DNA Mitocondrial/genética , Fungos/genética , Invertebrados/genética , Sensibilidade e Especificidade , Baleias/genética

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA