Pesquisa | Biblioteca Virtual em Saúde

1.

Genetic distance for a general non-stationary markov substitution process.

Kaehler, Benjamin D; Yap, Von Bing; Zhang, Rongli; Huttley, Gavin A.

Syst Biol ; 64(2): 281-93, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-25503772

RESUMO

The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.

Assuntos

Evolução Molecular , Modelos Genéticos , Animais , Humanos , Mamíferos/classificação , Mamíferos/genética , Cadeias de Markov , Filogenia

2.

Genome analysis of the platypus reveals unique signatures of evolution.

Warren, Wesley C; Hillier, LaDeana W; Marshall Graves, Jennifer A; Birney, Ewan; Ponting, Chris P; Grützner, Frank; Belov, Katherine; Miller, Webb; Clarke, Laura; Chinwalla, Asif T; Yang, Shiaw-Pyng; Heger, Andreas; Locke, Devin P; Miethke, Pat; Waters, Paul D; Veyrunes, Frédéric; Fulton, Lucinda; Fulton, Bob; Graves, Tina; Wallis, John; Puente, Xose S; López-Otín, Carlos; Ordóñez, Gonzalo R; Eichler, Evan E; Chen, Lin; Cheng, Ze; Deakin, Janine E; Alsop, Amber; Thompson, Katherine; Kirby, Patrick; Papenfuss, Anthony T; Wakefield, Matthew J; Olender, Tsviya; Lancet, Doron; Huttley, Gavin A; Smit, Arian F A; Pask, Andrew; Temple-Smith, Peter; Batzer, Mark A; Walker, Jerilyn A; Konkel, Miriam K; Harris, Robert S; Whittington, Camilla M; Wong, Emily S W; Gemmell, Neil J; Buschiazzo, Emmanuel; Vargas Jentzsch, Iris M; Merkel, Angelika; Schmitz, Juergen; Zemann, Anja.

Nature ; 453(7192): 175-83, 2008 May 08.

Artigo em Inglês | MEDLINE | ID: mdl-18464734

RESUMO

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.

Assuntos

Evolução Molecular , Genoma/genética , Ornitorrinco/genética , Animais , Composição de Bases , Dentição , Feminino , Impressão Genômica/genética , Humanos , Imunidade/genética , Masculino , Mamíferos/genética , MicroRNAs/genética , Proteínas do Leite/genética , Filogenia , Ornitorrinco/imunologia , Ornitorrinco/fisiologia , Receptores Odorantes/genética , Sequências Repetitivas de Ácido Nucleico/genética , Répteis/genética , Análise de Sequência de DNA , Espermatozoides/metabolismo , Peçonhas/genética , Zona Pelúcida/metabolismo

3.

Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences.

Mikkelsen, Tarjei S; Wakefield, Matthew J; Aken, Bronwen; Amemiya, Chris T; Chang, Jean L; Duke, Shannon; Garber, Manuel; Gentles, Andrew J; Goodstadt, Leo; Heger, Andreas; Jurka, Jerzy; Kamal, Michael; Mauceli, Evan; Searle, Stephen M J; Sharpe, Ted; Baker, Michelle L; Batzer, Mark A; Benos, Panayiotis V; Belov, Katherine; Clamp, Michele; Cook, April; Cuff, James; Das, Radhika; Davidow, Lance; Deakin, Janine E; Fazzari, Melissa J; Glass, Jacob L; Grabherr, Manfred; Greally, John M; Gu, Wanjun; Hore, Timothy A; Huttley, Gavin A; Kleber, Michael; Jirtle, Randy L; Koina, Edda; Lee, Jeannie T; Mahony, Shaun; Marra, Marco A; Miller, Robert D; Nicholls, Robert D; Oda, Mayumi; Papenfuss, Anthony T; Parra, Zuly E; Pollock, David D; Ray, David A; Schein, Jacqueline E; Speed, Terence P; Thompson, Katherine; VandeBerg, John L; Wade, Claire M.

Nature ; 447(7141): 167-77, 2007 May 10.

Artigo em Inglês | MEDLINE | ID: mdl-17495919

RESUMO

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.

Assuntos

Evolução Molecular , Genoma/genética , Genômica , Gambás/genética , Animais , Composição de Bases , Sequência Conservada/genética , Elementos de DNA Transponíveis/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Biossíntese de Proteínas , Sintenia/genética , Inativação do Cromossomo X/genética

4.

Machine Learning Techniques for Classifying the Mutagenic Origins of Point Mutations.

Zhu, Yicheng; Ong, Cheng Soon; Huttley, Gavin A.

Genetics ; 215(1): 25-40, 2020 05.

Artigo em Inglês | MEDLINE | ID: mdl-32193188

RESUMO

There is increasing interest in developing diagnostics that discriminate individual mutagenic mechanisms in a range of applications that include identifying population-specific mutagenesis and resolving distinct mutation signatures in cancer samples. Analyses for these applications assume that mutagenic mechanisms have a distinct relationship with neighboring bases that allows them to be distinguished. Direct support for this assumption is limited to a small number of simple cases, e.g., CpG hypermutability. We have evaluated whether the mechanistic origin of a point mutation can be resolved using only sequence context for a more complicated case. We contrasted single nucleotide variants originating from the multitude of mutagenic processes that normally operate in the mouse germline with those induced by the potent mutagen N-ethyl-N-nitrosourea (ENU). The considerable overlap in the mutation spectra of these two samples make this a challenging problem. Employing a new, robust log-linear modeling method, we demonstrate that neighboring bases contain information regarding point mutation direction that differs between the ENU-induced and spontaneous mutation variant classes. A logistic regression classifier exhibited strong performance at discriminating between the different mutation classes. Concordance between the feature set of the best classifier and information content analyses suggest our results can be generalized to other mutation classification problems. We conclude that machine learning can be used to build a practical classification tool to identify the mutation mechanism for individual genetic variants. Software implementing our approach is freely available under an open-source license.

Assuntos

Aprendizado de Máquina , Mutação Puntual , Análise de Sequência de DNA/métodos , Animais , Etilnitrosoureia/toxicidade , Camundongos , Mutagênicos/toxicidade , Motivos de Nucleotídeos

5.

Species abundance information improves sequence taxonomy classification accuracy.

Kaehler, Benjamin D; Bokulich, Nicholas A; McDonald, Daniel; Knight, Rob; Caporaso, J Gregory; Huttley, Gavin A.

Nat Commun ; 10(1): 4643, 2019 10 11.

Artigo em Inglês | MEDLINE | ID: mdl-31604942

RESUMO

Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.

Assuntos

Microbiota/genética , Filogenia , Bactérias/genética , Classificação/métodos , Biologia Computacional , Metagenômica/métodos , Densidade Demográfica , Software

6.

Author Correction: Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2.

Bolyen, Evan; Rideout, Jai Ram; Dillon, Matthew R; Bokulich, Nicholas A; Abnet, Christian C; Al-Ghalith, Gabriel A; Alexander, Harriet; Alm, Eric J; Arumugam, Manimozhiyan; Asnicar, Francesco; Bai, Yang; Bisanz, Jordan E; Bittinger, Kyle; Brejnrod, Asker; Brislawn, Colin J; Brown, C Titus; Callahan, Benjamin J; Caraballo-Rodríguez, Andrés Mauricio; Chase, John; Cope, Emily K; Da Silva, Ricardo; Diener, Christian; Dorrestein, Pieter C; Douglas, Gavin M; Durall, Daniel M; Duvallet, Claire; Edwardson, Christian F; Ernst, Madeleine; Estaki, Mehrbod; Fouquier, Jennifer; Gauglitz, Julia M; Gibbons, Sean M; Gibson, Deanna L; Gonzalez, Antonio; Gorlick, Kestrel; Guo, Jiarong; Hillmann, Benjamin; Holmes, Susan; Holste, Hannes; Huttenhower, Curtis; Huttley, Gavin A; Janssen, Stefan; Jarmusch, Alan K; Jiang, Lingjing; Kaehler, Benjamin D; Kang, Kyo Bin; Keefe, Christopher R; Keim, Paul; Kelley, Scott T; Knights, Dan.

Nat Biotechnol ; 37(9): 1091, 2019 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-31399723

RESUMO

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

7.

Pathological rate matrices: from primates to pathogens.

Schranz, Harold W; Yap, Von Bing; Easteal, Simon; Knight, Rob; Huttley, Gavin A.

BMC Bioinformatics ; 9: 550, 2008 Dec 19.

Artigo em Inglês | MEDLINE | ID: mdl-19099591

RESUMO

BACKGROUND: Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation. RESULTS: We used concatenated protein coding gene alignments from microbial genomes, primate genomes and independent intron alignments from primate genomes. The Taylor series expansion and eigendecomposition matrix exponentiation algorithms were compared to the less widely employed, but more robust, Padé with scaling and squaring algorithm for nucleotide, dinucleotide, codon and trinucleotide rate matrices. Pathological dinucleotide and trinucleotide matrices were evident in the microbial data set, affecting the eigendecomposition and Taylor algorithms respectively. Even using a conservative estimate of matrix error (occurrence of an invalid probability), both Taylor and eigendecomposition algorithms exhibited substantial error rates: ~100% of all exonic trinucleotide matrices were pathological to the Taylor algorithm while ~10% of codon positions 1 and 2 dinucleotide matrices and intronic trinucleotide matrices, and ~30% of codon matrices were pathological to eigendecomposition. The majority of Taylor algorithm errors derived from occurrence of multiple unobserved states. A small number of negative probabilities were detected from the Padé algorithm on trinucleotide matrices that were attributable to machine precision. Although the Padé algorithm does not facilitate caching of intermediate results, it was up to 3x faster than eigendecomposition on the same matrices. CONCLUSION: Development of robust software for computing non-reversible dinucleotide, codon and higher evolutionary models requires implementation of the Padé with scaling and squaring algorithm.

Assuntos

Biologia Computacional/métodos , Evolução Molecular , Algoritmos , Animais , Códon , Humanos , Cadeias de Markov , Primatas/genética , Software

8.

Comparison of methods for estimating the nucleotide substitution matrix.

Oscamou, Maribeth; McDonald, Daniel; Yap, Von Bing; Huttley, Gavin A; Lladser, Manuel E; Knight, Rob.

BMC Bioinformatics ; 9: 511, 2008 Dec 01.

Artigo em Inglês | MEDLINE | ID: mdl-19046431

RESUMO

BACKGROUND: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. RESULTS: Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. CONCLUSION: Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.

Assuntos

Biologia Computacional/métodos , Análise Mutacional de DNA/métodos , Evolução Molecular , Nucleotídeos/genética , Algoritmos , Simulação por Computador , DNA/genética , Interpretação Estatística de Dados , Modelos Logísticos , Cadeias de Markov , Modelos Genéticos , Filogenia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade

9.

Detecting coevolution without phylogenetic trees? Tree-ignorant metrics of coevolution perform as well as tree-aware metrics.

Caporaso, J Gregory; Smit, Sandra; Easton, Brett C; Hunter, Lawrence; Huttley, Gavin A; Knight, Rob.

BMC Evol Biol ; 8: 327, 2008 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-19055758

RESUMO

BACKGROUND: Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance. RESULTS: Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical. CONCLUSION: The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.

Assuntos

Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Modelos Estatísticos , Filogenia , Modelos Genéticos , Mioglobina/genética , Miosinas/genética , Estrutura Secundária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína

10.

QIIME allows analysis of high-throughput community sequencing data.

Caporaso, J Gregory; Kuczynski, Justin; Stombaugh, Jesse; Bittinger, Kyle; Bushman, Frederic D; Costello, Elizabeth K; Fierer, Noah; Peña, Antonio Gonzalez; Goodrich, Julia K; Gordon, Jeffrey I; Huttley, Gavin A; Kelley, Scott T; Knights, Dan; Koenig, Jeremy E; Ley, Ruth E; Lozupone, Catherine A; McDonald, Daniel; Muegge, Brian D; Pirrung, Meg; Reeder, Jens; Sevinsky, Joel R; Turnbaugh, Peter J; Walters, William A; Widmann, Jeremy; Yatsunenko, Tanya; Zaneveld, Jesse; Knight, Rob.

Nat Methods ; 7(5): 335-6, 2010 May.

Artigo em Inglês | MEDLINE | ID: mdl-20383131

Assuntos

RNA Ribossômico 16S/genética , Análise de Sequência de RNA/métodos , Software , Animais , Fezes/microbiologia , Humanos , Camundongos , Gêmeos Dizigóticos/genética , Gêmeos Monozigóticos/genética

11.

Did aculeate silk evolve as an antifouling material?

Sutherland, Tara D; Sriskantha, Alagacone; Rapson, Trevor D; Kaehler, Benjamin D; Huttley, Gavin A.

PLoS One ; 13(9): e0203948, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-30240428

RESUMO

Many of the challenges we currently face as an advanced society have been solved in unique ways by biological systems. One such challenge is developing strategies to avoid microbial infection. Social aculeates (wasps, bees and ants) mitigate the risk of infection to their colonies using a wide range of adaptations and mechanisms. These adaptations and mechanisms are reliant on intricate social structures and are energetically costly for the colony. It seems likely that these species must have had alternative and simpler mechanisms in place to ensure the maintenance of hygienic domicile conditions prior to the evolution of these complex behaviours. Features of the aculeate coiled-coil silk proteins are reminiscent of those of naturally occurring α-helical antimicrobial peptides (AMPs). In this study, we demonstrate that peptides derived from the aculeate silk proteins have antimicrobial activity. We reconstruct the predicted ancestral silk sequences of an aculeate ancestor that pre-dates the evolution of sociality and demonstrate that these ancestral sequences also contained peptides with antimicrobial properties. It is possible that the silks evolved as an antifouling material and facilitated the evolution of sociality. These materials serve as model materials for consideration in future biomaterial development.

Assuntos

Peptídeos Catiônicos Antimicrobianos/genética , Peptídeos Catiônicos Antimicrobianos/fisiologia , Proteínas de Insetos/genética , Proteínas de Insetos/fisiologia , Seda/genética , Seda/fisiologia , Sequência de Aminoácidos , Animais , Peptídeos Catiônicos Antimicrobianos/química , Formigas/genética , Formigas/fisiologia , Abelhas/genética , Abelhas/fisiologia , Evolução Molecular , Proteínas de Insetos/química , Filogenia , Seda/química , Comportamento Social , Vespas/genética , Vespas/fisiologia

12.

q2-sample-classifier: machine-learning tools for microbiome classification and regression.

Bokulich, Nicholas A; Dillon, Matthew R; Bolyen, Evan; Kaehler, Benjamin D; Huttley, Gavin A; Caporaso, J Gregory.

J Open Res Softw ; 3(30)2018.

Artigo em Inglês | MEDLINE | ID: mdl-31552137

RESUMO

q2-sample-classifier is a plugin for the QIIME 2 microbiome bioinformatics platform that facilitates access, reproducibility, and interpretation of supervised learning (SL) methods for a broad audience of non-bioinformatics specialists.

13.

Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J.

Microbiome ; 6(1): 90, 2018 05 17.

Artigo em Inglês | MEDLINE | ID: mdl-29773078

RESUMO

BACKGROUND: Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. RESULTS: We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). CONCLUSIONS: Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

Assuntos

Bactérias/genética , Simulação por Computador , DNA Intergênico/genética , Fungos/genética , Microbiota/genética , RNA Ribossômico 16S/genética , Alinhamento de Sequência/métodos , Algoritmos , Sequência de Bases/genética , Aprendizado de Máquina , Software

14.

Standard Codon Substitution Models Overestimate Purifying Selection for Nonstationary Data.

Kaehler, Benjamin D; Yap, Von Bing; Huttley, Gavin A.

Genome Biol Evol ; 9(1): 134-149, 2017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-28175284

RESUMO

Estimation of natural selection on protein-coding sequences is a key comparative genomics approach for de novo prediction of lineage-specific adaptations. Selective pressure is measured on a per-gene basis by comparing the rate of nonsynonymous substitutions to the rate of synonymous substitutions. All published codon substitution models have been time-reversible and thus assume that sequence composition does not change over time. We previously demonstrated that if time-reversible DNA substitution models are applied in the presence of changing sequence composition, the number of substitutions is systematically biased towards overestimation. We extend these findings to the case of codon substitution models and further demonstrate that the ratio of nonsynonymous to synonymous rates of substitution tends to be underestimated over three data sets of mammals, vertebrates, and insects. Our basis for comparison is a nonstationary codon substitution model that allows sequence composition to change. Goodness-of-fit results demonstrate that our new model tends to fit the data better. Direct measurement of nonstationarity shows that bias in estimates of natural selection and genetic distance increases with the degree of violation of the stationarity assumption. Additionally, inferences drawn under time-reversible models are systematically affected by compositional divergence. As genomic sequences accumulate at an accelerating rate, the importance of accurate de novo estimation of natural selection increases. Our results establish that our new model provides a more robust perspective on this fundamental quantity.

Assuntos

Códon , Modelos Genéticos , Proteínas/genética , Seleção Genética , Animais , Humanos , Cadeias de Markov

15.

Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations.

Zhu, Yicheng; Neeman, Teresa; Yap, Von Bing; Huttley, Gavin A.

Genetics ; 205(2): 843-856, 2017 02.

Artigo em Inglês | MEDLINE | ID: mdl-27974498

RESUMO

Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A[Formula: see text]G mutations. We show that major effects of neighbors on germline mutation lie within [Formula: see text] of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T[Formula: see text]C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif.

Assuntos

Motivos de Nucleotídeos , Mutação Puntual , Análise de Sequência de DNA/métodos , Software , Animais , Ilhas de CpG , Interpretação Estatística de Dados , Humanos

16.

Vestige: maximum likelihood phylogenetic footprinting.

Wakefield, Matthew J; Maxwell, Peter; Huttley, Gavin A.

BMC Bioinformatics ; 6: 130, 2005 May 29.

Artigo em Inglês | MEDLINE | ID: mdl-15921531

RESUMO

BACKGROUND: Phylogenetic footprinting is the identification of functional regions of DNA by their evolutionary conservation. This is achieved by comparing orthologous regions from multiple species and identifying the DNA regions that have diverged less than neutral DNA. Vestige is a phylogenetic footprinting package built on the PyEvolve toolkit that uses probabilistic molecular evolutionary modelling to represent aspects of sequence evolution, including the conventional divergence measure employed by other footprinting approaches. In addition to measuring the divergence, Vestige allows the expansion of the definition of a phylogenetic footprint to include variation in the distribution of any molecular evolutionary processes. This is achieved by displaying the distribution of model parameters that represent partitions of molecular evolutionary substitutions. Examination of the spatial incidence of these effects across regions of the genome can identify DNA segments that differ in the nature of the evolutionary process. RESULTS: Vestige was applied to a reference dataset of the SCL locus from four species and provided clear identification of the known conserved regions in this dataset. To demonstrate the flexibility to use diverse models of molecular evolution and dissect the nature of the evolutionary process Vestige was used to footprint the Ka/Ks ratio in primate BRCA1 with a codon model of evolution. Two regions of putative adaptive evolution were identified illustrating the ability of Vestige to represent the spatial distribution of distinct molecular evolutionary processes. CONCLUSION: Vestige provides a flexible, open platform for phylogenetic footprinting. Underpinned by the PyEvolve toolkit, Vestige provides a framework for visualising the signatures of evolutionary processes across the genome of numerous organisms simultaneously. By exploiting the maximum-likelihood statistical framework, the complex interplay between mutational processes, DNA repair and selection can be evaluated both spatially (along a sequence alignment) and temporally (for each branch of the tree) providing visual indicators to the attributes and functions of DNA sequences.

Assuntos

Biologia Computacional/métodos , Interpretação Estatística de Dados , Algoritmos , Animais , Proteína BRCA1/genética , Sequência de Bases , Códon , Simulação por Computador , DNA/química , Reparo do DNA , Evolução Molecular , Genoma , Humanos , Funções Verossimilhança , Modelos Biológicos , Modelos Estatísticos , Filogenia , Linguagens de Programação , Sequências Reguladoras de Ácido Nucleico , Alinhamento de Sequência , Análise de Sequência de DNA , Análise de Sequência de Proteína , Software , Especificidade da Espécie , Fatores de Tempo

17.

Modelling and bioinformatics studies of the human Kappa-class glutathione transferase predict a novel third glutathione transferase family with similarity to prokaryotic 2-hydroxychromene-2-carboxylate isomerases.

Robinson, Anna; Huttley, Gavin A; Booth, Hilary S; Board, Philip G.

Biochem J ; 379(Pt 3): 541-52, 2004 May 01.

Artigo em Inglês | MEDLINE | ID: mdl-14709161

RESUMO

The Kappa class of GSTs (glutathione transferases) comprises soluble enzymes originally isolated from the mitochondrial matrix of rats. We have characterized a Kappa class cDNA from human breast. The cDNA is derived from a single gene comprising eight exons and seven introns located on chromosome 7q34-35. Recombinant hGSTK1-1 was expressed in Escherichia coli as a homodimer (subunit molecular mass approximately 25.5 kDa). Significant glutathione-conjugating activity was found only with the model substrate CDNB (1-chloro-2,4-ditnitrobenzene). Hyperbolic kinetics were obtained for GSH (parameters: K(m)app, 3.3+/-0.95 mM; V(max)app, 21.4+/-1.8 micromol/min per mg of enzyme), while sigmoidal kinetics were obtained for CDNB (parameters: S0.5app, 1.5+/-1.0 mM; V(max)app, 40.3+/-0.3 micromol/min per mg of enzyme; Hill coefficient, 1.3), reflecting low affinities for both substrates. Sequence analyses, homology modelling and secondary structure predictions show that hGSTK1 has (a) most similarity to bacterial HCCA (2-hydroxychromene-2-carboxylate) isomerases and (b) a predicted C-terminal domain structure that is almost identical to that of bacterial disulphide-bond-forming DsbA oxidoreductase (root mean square deviation 0.5-0.6 A). The structures of hGSTK1 and HCCA isomerase are predicted to possess a thioredoxin fold with a polyhelical domain (alpha(x)) embedded between the beta-strands (betaalphabetaalpha(x)betabetaalpha, where the underlined elements represent the N and C motifs of the thioredoxin fold), as occurs in the bacterial disulphide-bond-forming oxidoreductases. This is in contrast with the cytosolic GSTs, where the helical domain occurs exclusively at the C-terminus (betaalphabetaalphabetabetaalphaalpha(x)). Although hGSTK1-1 catalyses some typical GST reactions, we propose that it is structurally distinct from other classes of cytosolic GSTs. The present study suggests that the Kappa class may have arisen in prokaryotes well before the divergence of the cytosolic GSTs.

Assuntos

Biologia Computacional , Glutationa Transferase/química , Glutationa Transferase/classificação , Oxirredutases Intramoleculares/química , Modelos Moleculares , Células Procarióticas/enzimologia , Sequência de Aminoácidos , Animais , Sequência de Bases , Mama/enzimologia , Clonagem Molecular , Sequência Conservada/genética , Éxons/genética , Glutationa Transferase/isolamento & purificação , Glutationa Transferase/metabolismo , Humanos , Concentração de Íons de Hidrogênio , Íntrons/genética , Cinética , Camundongos , Dados de Sequência Molecular , Estrutura Terciária de Proteína , Ratos , Proteínas Recombinantes/química , Proteínas Recombinantes/genética , Proteínas Recombinantes/isolamento & purificação , Proteínas Recombinantes/metabolismo , Homologia Estrutural de Proteína , Especificidade por Substrato

18.

Folding behavior of four silks of giant honey bee reflects the evolutionary conservation of aculeate silk proteins.

Maitip, Jakkrawut; Trueman, Holly E; Kaehler, Benjamin D; Huttley, Gavin A; Chantawannakul, Panuwan; Sutherland, Tara D.

Insect Biochem Mol Biol ; 59: 72-9, 2015 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-25712559

RESUMO

Multiple gene duplication events in the precursor of the Aculeata (bees, ants, hornets) gave rise to four silk genes. Whilst these homologs encode proteins with similar amino acid composition and coiled coil structure, the retention of all four homologs implies they each are important. In this study we identified, produced and characterized the four silk proteins from Apis dorsata, the giant Asian honeybee. The proteins were readily purified, allowing us to investigate the folding behavior of solutions of individual proteins in comparison to mixtures of all four proteins at concentrations where they assemble into their native coiled coil structure. In contrast to solutions of any one protein type, solutions of a mixture of the four proteins formed coiled coils that were stable against dilution and detergent denaturation. The results are consistent with the formation of a heteromeric coiled coil protein complex. The mechanism of silk protein coiled coil formation and evolution is discussed in light of these results.

Assuntos

Abelhas/genética , Proteínas de Insetos/genética , Seda/genética , Sequência de Aminoácidos , Animais , Abelhas/metabolismo , Evolução Molecular , Proteínas de Insetos/química , Dados de Sequência Molecular , Dobramento de Proteína , Estrutura Secundária de Proteína , Homologia de Sequência , Seda/química

19.

Transcriptome sequencing of two phenotypic mosaic Eucalyptus trees reveals large scale transcriptome re-modelling.

Padovan, Amanda; Patel, Hardip R; Chuah, Aaron; Huttley, Gavin A; Krause, Sandra T; Degenhardt, Jörg; Foley, William J; Külheim, Carsten.

PLoS One ; 10(5): e0123226, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-25978451

RESUMO

Phenotypic mosaic trees offer an ideal system for studying differential gene expression. We have investigated two mosaic eucalypt trees from two closely related species (Eucalyptus melliodora and E. sideroxylon), which each support two types of leaves: one part of the canopy is resistant to insect herbivory and the remaining leaves are susceptible. Driving this ecological distinction are differences in plant secondary metabolites. We used these phenotypic mosaics to investigate genome wide patterns of foliar gene expression with the aim of identifying patterns of differential gene expression and the somatic mutation(s) that lead to this phenotypic mosaicism. We sequenced the mRNA pool from leaves of the resistant and susceptible ecotypes from both mosaic eucalypts using the Illumina HiSeq 2000 platform. We found large differences in pathway regulation and gene expression between the ecotypes of each mosaic. The expression of the genes in the MVA and MEP pathways is reflected by variation in leaf chemistry, however this is not the case for the terpene synthases. Apart from the terpene biosynthetic pathway, there are several other metabolic pathways that are differentially regulated between the two ecotypes, suggesting there is much more phenotypic diversity than has been described. Despite the close relationship between the two species, they show large differences in the global patterns of gene and pathway regulation.

Assuntos

Eucalyptus/genética , Transcriptoma/genética , Regulação da Expressão Gênica de Plantas/genética , Modelos Teóricos , Folhas de Planta/genética

20.

Draft Genome of Australian Environmental Strain WM 09.24 of the Opportunistic Human Pathogen Scedosporium aurantiacum.

Pérez-Bercoff, Åsa; Papanicolaou, Alexie; Ramsperger, Marc; Kaur, Jashanpreet; Patel, Hardip R; Harun, Azian; Duan, Shu Yao; Elbourne, Liam; Bouchara, Jean-Philippe; Paulsen, Ian T; Nevalainen, Helena; Meyer, Wieland; Huttley, Gavin A.

Genome Announc ; 3(1)2015 Feb 12.

Artigo em Inglês | MEDLINE | ID: mdl-25676755

RESUMO

We report here the first genome assembly and annotation of the human-pathogenic fungus Scedosporium aurantiacum, with a predicted 10,525 genes, and 11,661 transcripts. The strain WM 09.24 was isolated from the environment at Circular Quay, Sydney, New South Wales, Australia.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA