Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 50
Filtrar
1.
Mol Biol Evol ; 39(2)2022 02 03.
Artigo em Inglês | MEDLINE | ID: mdl-35084499

RESUMO

Considerable attention has recently been focused on the potential involvement of DNA methylation in regulating gene expression in cnidarians. Much of this work has been centered on corals, in the context of changes in methylation perhaps facilitating adaptation to higher seawater temperatures and other stressful conditions. Although first proposed more than 30 years ago, the possibility that DNA methylation systems function in protecting animal genomes against the harmful effects of transposon activity has largely been ignored since that time. Here, we show that transposons are specifically targeted by the DNA methylation system in cnidarians, and that the youngest transposons (i.e., those most likely to be active) are most highly methylated. Transposons in longer and highly active genes were preferentially methylated and, as transposons aged, methylation levels declined, reducing the potentially harmful side effects of CpG methylation. In Cnidaria and a range of other invertebrates, correlation between the overall extent of methylation and transposon content was strongly supported. Present transposon burden is the dominant factor in determining overall level of genomic methylation in a range of animals that diverged in or before the early Cambrian, suggesting that genome defense represents the ancestral role of CpG methylation.


Assuntos
Cnidários , Metilação de DNA , Animais , Cnidários/genética , Ilhas de CpG , Genoma , Invertebrados/genética
2.
Syst Biol ; 64(2): 281-93, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25503772

RESUMO

The genetic distance between biological sequences is a fundamental quantity in molecular evolution. It pertains to questions of rates of evolution, existence of a molecular clock, and phylogenetic inference. Under the class of continuous-time substitution models, the distance is commonly defined as the expected number of substitutions at any site in the sequence. We eschew the almost ubiquitous assumptions of evolution under stationarity and time-reversible conditions and extend the concept of the expected number of substitutions to nonstationary Markov models where the only remaining constraint is of time homogeneity between nodes in the tree. Our measure of genetic distance reduces to the standard formulation if the data in question are consistent with the stationarity assumption. We apply this general model to samples from across the tree of life to compare distances so obtained with those from the general time-reversible model, with and without rate heterogeneity across sites, and the paralinear distance, an empirical pairwise method explicitly designed to address nonstationarity. We discover that estimates from both variants of the general time-reversible model and the paralinear distance systematically overestimate genetic distance and departure from the molecular clock. The magnitude of the distance bias is proportional to departure from stationarity, which we demonstrate to be associated with longer edge lengths. The marked improvement in consistency between the general nonstationary Markov model and sequence alignments leads us to conclude that analyses of evolutionary rates and phylogenies will be substantively improved by application of this model.


Assuntos
Evolução Molecular , Modelos Genéticos , Animais , Humanos , Mamíferos/classificação , Mamíferos/genética , Cadeias de Markov , Filogenia
3.
J Transl Med ; 13: 173, 2015 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-26031516

RESUMO

BACKGROUND: Multiple autoimmune syndrome (MAS), an extreme phenotype of autoimmune disorders, is a very well suited trait to tackle genomic variants of these conditions. Whole exome sequencing (WES) is a widely used strategy for detection of protein coding and splicing variants associated with inherited diseases. METHODS: The DNA of eight patients affected by MAS [all of whom presenting with Sjögren's syndrome (SS)], four patients affected by SS alone and 38 unaffected individuals, were subject to WES. Filters to identify novel and rare functional (pathogenic-deleterious) homozygous and/or compound heterozygous variants in these patients and controls were applied. Bioinformatics tools such as the Human gene connectome as well as pathway and network analysis were applied to test overrepresentation of genes harbouring these variants in critical pathways and networks involved in autoimmunity. RESULTS: Eleven novel and rare functional variants were identified in cases but not in controls, harboured in: MACF1, KIAA0754, DUSP12, ICA1, CELA1, LRP1/STAT6, GRIN3B, ANKLE1, TMEM161A, and FKRP. These were subsequently subject to network analysis and their functional relatedness to genes already associated with autoimmunity was evaluated. Notably, the LRP1/STAT6 novel mutation was homozygous in one MAS affected patient and heterozygous in another. LRP1/STAT6 disclosed the strongest plausibility for autoimmunity. LRP1/STAT6 are involved in extracellular and intracellular anti-inflammatory pathways that play key roles in maintaining the homeostasis of the immune system. Further; networks, pathways, and interaction analyses showed that LRP1 is functionally related to the HLA-B and IL10 genes and it has a substantial impact within immunological pathways and/or reaction to bacterial and other foreign proteins (phagocytosis, regulation of phospholipase A2 activity, negative regulation of apoptosis and response to lipopolysaccharides). Further, ICA1 and STAT6 were also closely related to AIRE and IRF5, two very well known autoimmunity genes. CONCLUSIONS: Novel and rare exonic mutations that may account for autoimmunity were identified. Among those, the LRP1/STAT6 novel mutation has the strongest case for being categorised as potentially causative of MAS given the presence of intriguing patterns of functional interaction with other major genes shaping autoimmunity.


Assuntos
Predisposição Genética para Doença , Genoma Humano , Mutação/genética , Síndrome de Sjogren/genética , Adulto , Idoso , Autoimunidade/genética , Sequência de Bases , Estudos de Casos e Controles , Conectoma , Feminino , Redes Reguladoras de Genes , Humanos , Pessoa de Meia-Idade , Fenótipo
4.
Nature ; 453(7192): 175-83, 2008 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-18464734

RESUMO

We present a draft genome sequence of the platypus, Ornithorhynchus anatinus. This monotreme exhibits a fascinating combination of reptilian and mammalian characters. For example, platypuses have a coat of fur adapted to an aquatic lifestyle; platypus females lactate, yet lay eggs; and males are equipped with venom similar to that of reptiles. Analysis of the first monotreme genome aligned these features with genetic innovations. We find that reptile and platypus venom proteins have been co-opted independently from the same gene families; milk protein genes are conserved despite platypuses laying eggs; and immune gene family expansions are directly related to platypus biology. Expansions of protein, non-protein-coding RNA and microRNA families, as well as repeat elements, are identified. Sequencing of this genome now provides a valuable resource for deep mammalian comparative analyses, as well as for monotreme biology and conservation.


Assuntos
Evolução Molecular , Genoma/genética , Ornitorrinco/genética , Animais , Composição de Bases , Dentição , Feminino , Impressão Genômica/genética , Humanos , Imunidade/genética , Masculino , Mamíferos/genética , MicroRNAs/genética , Proteínas do Leite/genética , Filogenia , Ornitorrinco/imunologia , Ornitorrinco/fisiologia , Receptores Odorantes/genética , Sequências Repetitivas de Ácido Nucleico/genética , Répteis/genética , Análise de Sequência de DNA , Espermatozoides/metabolismo , Peçonhas/genética , Zona Pelúcida/metabolismo
5.
Nature ; 447(7141): 167-77, 2007 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-17495919

RESUMO

We report a high-quality draft of the genome sequence of the grey, short-tailed opossum (Monodelphis domestica). As the first metatherian ('marsupial') species to be sequenced, the opossum provides a unique perspective on the organization and evolution of mammalian genomes. Distinctive features of the opossum chromosomes provide support for recent theories about genome evolution and function, including a strong influence of biased gene conversion on nucleotide sequence composition, and a relationship between chromosomal characteristics and X chromosome inactivation. Comparison of opossum and eutherian genomes also reveals a sharp difference in evolutionary innovation between protein-coding and non-coding functional elements. True innovation in protein-coding genes seems to be relatively rare, with lineage-specific differences being largely due to diversification and rapid turnover in gene families involved in environmental interactions. In contrast, about 20% of eutherian conserved non-coding elements (CNEs) are recent inventions that postdate the divergence of Eutheria and Metatheria. A substantial proportion of these eutherian-specific CNEs arose from sequence inserted by transposable elements, pointing to transposons as a major creative force in the evolution of mammalian gene regulation.


Assuntos
Evolução Molecular , Genoma/genética , Genômica , Gambás/genética , Animais , Composição de Bases , Sequência Conservada/genética , Elementos de DNA Transponíveis/genética , Humanos , Polimorfismo de Nucleotídeo Único/genética , Biossíntese de Proteínas , Sintenia/genética , Inativação do Cromossomo X/genética
6.
Mol Biol Evol ; 27(3): 637-49, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19843619

RESUMO

Understanding the origins of localized substitution rate heterogeneity has important implications for identifying functional genomic sequences. Outside of gene regions, the origins of rate heterogeneity remain unclear. Experimental studies establish that chromatin compaction affects rates of both DNA lesion formation and repair. A functional association between chromatin status and 5-methyl-cytosine also exists. These suggest that both the total rate and the type of substitution will be affected by chromatin status. Regular positioning of nucleosomes, the building block of chromatin, further predicts that substitution rate and type should vary spatially in an oscillating manner. We addressed chromatin's influence on substitution rate and type in primates. Matched numbers of sites were sampled from Dnase I hypersensitive (DHS) and closed chromatin control flank (Flank). Likelihood ratio tests revealed significant excesses of total and of transition substitutions in Flank compared with matched DHS for both intergenic and intronic samples. An additional excess of CpG transitions was evident for the intergenic, but not intronic, regions. Fluctuation in substitution rate along approximately 1,800 primate promoters was measured using phylogenetic footprinting. Significant positive correlations were evident between the substitution rate and a nucleosome score from resting human T-cells, with up to approximately 50% of the variance in substitution rate accounted for. Using signal processing techniques, a dominant oscillation at approximately 200 bp was evident in both the substitution rate and the nucleosome score. Our results support a role for differential DNA repair rates between open and closed chromatin in the spatial distribution of rate heterogeneity.


Assuntos
Biologia Computacional/métodos , Análise Mutacional de DNA/métodos , Reparo do DNA/genética , Nucleossomos/genética , Distribuição de Qui-Quadrado , Cromatina/genética , Ilhas de CpG , Dano ao DNA , Metilação de DNA , Desoxirribonuclease I , Evolução Molecular , Análise de Fourier , Humanos , Filogenia , Regiões Promotoras Genéticas , Processamento de Sinais Assistido por Computador
7.
Mol Biol Evol ; 27(3): 726-34, 2010 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-19815689

RESUMO

Analysis of natural selection is key to understanding many core biological processes, including the emergence of competition, cooperation, and complexity, and has important applications in the targeted development of vaccines. Selection is hard to observe directly but can be inferred from molecular sequence variation. For protein-coding nucleotide sequences, the ratio of nonsynonymous to synonymous substitutions (omega) distinguishes neutrally evolving sequences (omega = 1) from those subjected to purifying (omega < 1) or positive Darwinian (omega > 1) selection. We show that current models used to estimate omega are substantially biased by naturally occurring sequence compositions. We present a novel model that weights substitutions by conditional nucleotide frequencies and which escapes these artifacts. Applying it to the genomes of pathogens causing malaria, leprosy, tuberculosis, and Lyme disease gave significant discrepancies in estimates with approximately 10-30% of genes affected. Our work has substantial implications for how vaccine targets are chosen and for studying the molecular basis of adaptive evolution.


Assuntos
Códon , Evolução Molecular , Modelos Genéticos , Modelos Estatísticos , Seleção Genética , Composição de Bases , Distribuição de Qui-Quadrado , Simulação por Computador , Genes de Protozoários , Mutação , Plasmodium/genética , Alinhamento de Sequência
8.
J Mol Evol ; 72(2): 147-59, 2011 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-21107551

RESUMO

Sequence divergence derives from either point substitution or indel (insertion or deletion) processes. We investigated the rates of these two processes both in protein and non-protein coding DNA. We aligned sequence pairs using two pair-hidden Markov models (PHMMs) conjoined by one silent state. The two PHMMs had their own set of parameters to model rates in their respective regions. The aim was to test the hypothesis that the indel mutation rate mimics the point mutation rate. That is, indels are found less often in conserved regions (slow point substitution rate) and more often in non-conserved regions (fast point substitution rate). Both polypeptides and rRNA molecules in our data exhibited a clear distinction between slow and fast rates of the two processes. These two rates served as surrogates to conserved and non-conserved secondary structure components, respectively. With polypeptides we found both the fast indel rate and the fast replacement rate were co-located with hydrophilic residues. We also found that the average concordance, of our alignments with corresponding curated alignments, improves markedly when the model allows either of the two fast rates to colocate with hydrophilic residues. With rRNA molecules, our model did not detect colocation between the fast indel rate and the fast substitution rate. Nevertheless, coupling the indel rates with the point substitution rates across the two regions markedly increased model fit. This result suggests that rRNA pairwise alignments should be modeled after allowing for the two processes to vary simultaneously and independently in the two regions.


Assuntos
Sequência de Aminoácidos , Sequência de Bases , Alinhamento de Sequência/métodos , Algoritmos , Substituição de Aminoácidos , Bases de Dados de Proteínas , Interações Hidrofóbicas e Hidrofílicas , Mutação INDEL , Funções Verossimilhança , Cadeias de Markov , Modelos Genéticos , Fases de Leitura Aberta , Mutação Puntual , RNA Ribossômico/genética
9.
G3 (Bethesda) ; 10(8): 2641-2652, 2020 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-32527747

RESUMO

We report work to quantify the impact on the probability of human genome polymorphism both of recombination and of sequence context at different scales. We use population-based analyses of data on human genetic variants obtained from the public Ensembl database. For recombination, we calculate the variance due to recombination and the probability that a recombination event causes a mutation. We employ novel statistical procedures to take account of the spatial auto-correlation of recombination and mutation rates along the genome. Our results support the view that genomic diversity in recombination hotspots arises largely from a direct effect of recombination on mutation rather than predominantly from the effect of selective sweeps. We also use the statistic of variance due to context to compare the effect on the probability of polymorphism of contexts of various sizes. We find that when the 12 point mutations are considered separately, variance due to context increases significantly as we move from 3-mer to 5-mer and from 5-mer to 7-mer contexts. However, when all mutations are considered in aggregate, these differences are outweighed by the effect of interaction between the central base and its immediate neighbors. This interaction is itself dominated by the transition mutations, including, but not limited to, the CpG effect. We also demonstrate strand-asymmetry of contextual influence in intronic regions, which is hypothesized to be a result of transcription coupled DNA repair. We consider the extent to which the measures we have used can be used to meaningfully compare the relative magnitudes of the impact of recombination and context on mutation.


Assuntos
Taxa de Mutação , Recombinação Genética , Genoma Humano , Humanos , Mutação , Polimorfismo Genético
10.
Genetics ; 215(1): 25-40, 2020 05.
Artigo em Inglês | MEDLINE | ID: mdl-32193188

RESUMO

There is increasing interest in developing diagnostics that discriminate individual mutagenic mechanisms in a range of applications that include identifying population-specific mutagenesis and resolving distinct mutation signatures in cancer samples. Analyses for these applications assume that mutagenic mechanisms have a distinct relationship with neighboring bases that allows them to be distinguished. Direct support for this assumption is limited to a small number of simple cases, e.g., CpG hypermutability. We have evaluated whether the mechanistic origin of a point mutation can be resolved using only sequence context for a more complicated case. We contrasted single nucleotide variants originating from the multitude of mutagenic processes that normally operate in the mouse germline with those induced by the potent mutagen N-ethyl-N-nitrosourea (ENU). The considerable overlap in the mutation spectra of these two samples make this a challenging problem. Employing a new, robust log-linear modeling method, we demonstrate that neighboring bases contain information regarding point mutation direction that differs between the ENU-induced and spontaneous mutation variant classes. A logistic regression classifier exhibited strong performance at discriminating between the different mutation classes. Concordance between the feature set of the best classifier and information content analyses suggest our results can be generalized to other mutation classification problems. We conclude that machine learning can be used to build a practical classification tool to identify the mutation mechanism for individual genetic variants. Software implementing our approach is freely available under an open-source license.


Assuntos
Aprendizado de Máquina , Mutação Puntual , Análise de Sequência de DNA/métodos , Animais , Etilnitrosoureia/toxicidade , Camundongos , Mutagênicos/toxicidade , Motivos de Nucleotídeos
11.
mSystems ; 5(2)2020 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-32184368

RESUMO

Microbiome-based disease classification depends on well-validated disease-specific models or a priori organismal markers. These are lacking for many diseases. Here, we present an alternative, search-based strategy for disease detection and classification, which detects diseased samples via their outlier novelty versus a database of samples from healthy subjects and then compares these to databases of samples from patients. Our strategy's precision, sensitivity, and speed outperform model-based approaches. In addition, it is more robust to platform heterogeneity and to contamination in 16S rRNA gene amplicon data sets. This search-based strategy shows promise as an important first step in microbiome big-data-based diagnosis.IMPORTANCE Here, we present a search-based strategy for disease detection and classification, which detects diseased samples via their outlier novelty versus a database of samples from healthy subjects and then compares them to databases of samples from patients. This approach enables the identification of microbiome states associated with disease even in the presence of different cohorts, multiple sequencing platforms, or significant contamination.

12.
Nat Commun ; 10(1): 4643, 2019 10 11.
Artigo em Inglês | MEDLINE | ID: mdl-31604942

RESUMO

Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.


Assuntos
Microbiota/genética , Filogenia , Bactérias/genética , Classificação/métodos , Biologia Computacional , Metagenômica/métodos , Densidade Demográfica , Software
13.
mSystems ; 4(4)2019 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-31239397

RESUMO

Meta-analyses at the whole-community level have been important in microbiome studies, revealing profound features that structure Earth's microbial communities, such as the unique differentiation of microbes from the mammalian gut relative to free-living microbial communities, the separation of microbiomes in saline and nonsaline environments, and the role of pH in driving soil microbial compositions. However, our ability to identify the specific features of a microbiome that differentiate these community-level patterns have lagged behind, especially as ever-cheaper DNA sequencing has yielded increasingly large data sets. One critical gap is the ability to search for samples that contain specific features (for example, sub-operational taxonomic units [sOTUs] identified by high-resolution statistical methods for removing amplicon sequencing errors). Here we introduce redbiom, a microbiome caching layer, which allows users to rapidly query samples that contain a given feature, retrieve sample data and metadata, and search for samples that match specified metadata values or ranges (e.g., all samples with a pH of >7), implemented using an in-memory NoSQL database called Redis. By default, redbiom allows public anonymous sample access for over 100,000 publicly available samples in the Qiita database. At over 100,000 samples, the caching server requires only 35 GB of resident memory. We highlight how redbiom enables a new type of characterization of microbiome samples and provide tutorials for using redbiom with QIIME 2. redbiom is open source under the BSD license, hosted on GitHub, and can be deployed independently of Qiita to enable search of proprietary or clinically restricted microbiome databases.IMPORTANCE Although analyses that combine many microbiomes at the whole-community level have become routine, searching rapidly for microbiomes that contain a particular sequence has remained difficult. The software we present here, redbiom, dramatically accelerates this process, allowing samples that contain microbiome features to be rapidly identified. This is especially useful when taxonomic annotation is limited, allowing users to identify environments in which unannotated microbes of interest were previously observed. This approach also allows environmental or clinical factors that correlate with specific features, or vice versa, to be identified rapidly, even at a scale of billions of sequences in hundreds of thousands of samples. The software is integrated with existing analysis tools to enable fast, large-scale microbiome searches and discovery of new microbiome relationships.

15.
BMC Bioinformatics ; 9: 550, 2008 Dec 19.
Artigo em Inglês | MEDLINE | ID: mdl-19099591

RESUMO

BACKGROUND: Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation. RESULTS: We used concatenated protein coding gene alignments from microbial genomes, primate genomes and independent intron alignments from primate genomes. The Taylor series expansion and eigendecomposition matrix exponentiation algorithms were compared to the less widely employed, but more robust, Padé with scaling and squaring algorithm for nucleotide, dinucleotide, codon and trinucleotide rate matrices. Pathological dinucleotide and trinucleotide matrices were evident in the microbial data set, affecting the eigendecomposition and Taylor algorithms respectively. Even using a conservative estimate of matrix error (occurrence of an invalid probability), both Taylor and eigendecomposition algorithms exhibited substantial error rates: ~100% of all exonic trinucleotide matrices were pathological to the Taylor algorithm while ~10% of codon positions 1 and 2 dinucleotide matrices and intronic trinucleotide matrices, and ~30% of codon matrices were pathological to eigendecomposition. The majority of Taylor algorithm errors derived from occurrence of multiple unobserved states. A small number of negative probabilities were detected from the Padé algorithm on trinucleotide matrices that were attributable to machine precision. Although the Padé algorithm does not facilitate caching of intermediate results, it was up to 3x faster than eigendecomposition on the same matrices. CONCLUSION: Development of robust software for computing non-reversible dinucleotide, codon and higher evolutionary models requires implementation of the Padé with scaling and squaring algorithm.


Assuntos
Biologia Computacional/métodos , Evolução Molecular , Algoritmos , Animais , Códon , Humanos , Cadeias de Markov , Primatas/genética , Software
16.
BMC Bioinformatics ; 9: 511, 2008 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-19046431

RESUMO

BACKGROUND: The nucleotide substitution rate matrix is a key parameter of molecular evolution. Several methods for inferring this parameter have been proposed, with different mathematical bases. These methods include counting sequence differences and taking the log of the resulting probability matrices, methods based on Markov triples, and maximum likelihood methods that infer the substitution probabilities that lead to the most likely model of evolution. However, the speed and accuracy of these methods has not been compared. RESULTS: Different methods differ in performance by orders of magnitude (ranging from 1 ms to 10 s per matrix), but differences in accuracy of rate matrix reconstruction appear to be relatively small. Encouragingly, relatively simple and fast methods can provide results at least as accurate as far more complex and computationally intensive methods, especially when the sequences to be compared are relatively short. CONCLUSION: Based on the conditions tested, we recommend the use of method of Gojobori et al. (1982) for long sequences (> 600 nucleotides), and the method of Goldman et al. (1996) for shorter sequences (< 600 nucleotides). The method of Barry and Hartigan (1987) can provide somewhat more accuracy, measured as the Euclidean distance between the true and inferred matrices, on long sequences (> 2000 nucleotides) at the expense of substantially longer computation time. The availability of methods that are both fast and accurate will allow us to gain a global picture of change in the nucleotide substitution rate matrix on a genomewide scale across the tree of life.


Assuntos
Biologia Computacional/métodos , Análise Mutacional de DNA/métodos , Evolução Molecular , Nucleotídeos/genética , Algoritmos , Simulação por Computador , DNA/genética , Interpretação Estatística de Dados , Modelos Logísticos , Cadeias de Markov , Modelos Genéticos , Filogenia , Reprodutibilidade dos Testes , Sensibilidade e Especificidade
17.
BMC Evol Biol ; 8: 327, 2008 Dec 03.
Artigo em Inglês | MEDLINE | ID: mdl-19055758

RESUMO

BACKGROUND: Identifying coevolving positions in protein sequences has myriad applications, ranging from understanding and predicting the structure of single molecules to generating proteome-wide predictions of interactions. Algorithms for detecting coevolving positions can be classified into two categories: tree-aware, which incorporate knowledge of phylogeny, and tree-ignorant, which do not. Tree-ignorant methods are frequently orders of magnitude faster, but are widely held to be insufficiently accurate because of a confounding of shared ancestry with coevolution. We conjectured that by using a null distribution that appropriately controls for the shared-ancestry signal, tree-ignorant methods would exhibit equivalent statistical power to tree-aware methods. Using a novel t-test transformation of coevolution metrics, we systematically compared four tree-aware and five tree-ignorant coevolution algorithms, applying them to myoglobin and myosin. We further considered the influence of sequence recoding using reduced-state amino acid alphabets, a common tactic employed in coevolutionary analyses to improve both statistical and computational performance. RESULTS: Consistent with our conjecture, the transformed tree-ignorant metrics (particularly Mutual Information) often outperformed the tree-aware metrics. Our examination of the effect of recoding suggested that charge-based alphabets were generally superior for identifying the stabilizing interactions in alpha helices. Performance was not always improved by recoding however, indicating that the choice of alphabet is critical. CONCLUSION: The results suggest that t-test transformation of tree-ignorant metrics can be sufficient to control for patterns arising from shared ancestry.


Assuntos
Algoritmos , Biologia Computacional/métodos , Evolução Molecular , Modelos Estatísticos , Filogenia , Modelos Genéticos , Mioglobina/genética , Miosinas/genética , Estrutura Secundária de Proteína , Alinhamento de Sequência , Análise de Sequência de Proteína
19.
PLoS One ; 13(9): e0203948, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30240428

RESUMO

Many of the challenges we currently face as an advanced society have been solved in unique ways by biological systems. One such challenge is developing strategies to avoid microbial infection. Social aculeates (wasps, bees and ants) mitigate the risk of infection to their colonies using a wide range of adaptations and mechanisms. These adaptations and mechanisms are reliant on intricate social structures and are energetically costly for the colony. It seems likely that these species must have had alternative and simpler mechanisms in place to ensure the maintenance of hygienic domicile conditions prior to the evolution of these complex behaviours. Features of the aculeate coiled-coil silk proteins are reminiscent of those of naturally occurring α-helical antimicrobial peptides (AMPs). In this study, we demonstrate that peptides derived from the aculeate silk proteins have antimicrobial activity. We reconstruct the predicted ancestral silk sequences of an aculeate ancestor that pre-dates the evolution of sociality and demonstrate that these ancestral sequences also contained peptides with antimicrobial properties. It is possible that the silks evolved as an antifouling material and facilitated the evolution of sociality. These materials serve as model materials for consideration in future biomaterial development.


Assuntos
Peptídeos Catiônicos Antimicrobianos/genética , Peptídeos Catiônicos Antimicrobianos/fisiologia , Proteínas de Insetos/genética , Proteínas de Insetos/fisiologia , Seda/genética , Seda/fisiologia , Sequência de Aminoácidos , Animais , Peptídeos Catiônicos Antimicrobianos/química , Formigas/genética , Formigas/fisiologia , Abelhas/genética , Abelhas/fisiologia , Evolução Molecular , Proteínas de Insetos/química , Filogenia , Seda/química , Comportamento Social , Vespas/genética , Vespas/fisiologia
20.
J Open Res Softw ; 3(30)2018.
Artigo em Inglês | MEDLINE | ID: mdl-31552137

RESUMO

q2-sample-classifier is a plugin for the QIIME 2 microbiome bioinformatics platform that facilitates access, reproducibility, and interpretation of supervised learning (SL) methods for a broad audience of non-bioinformatics specialists.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA