Pesquisa | BVS CLAP/SMR-OPAS/OMS

1.

MAST: Phylogenetic Inference with Mixtures Across Sites and Trees.

Wong, Thomas K F; Cherryh, Caitlin; Rodrigo, Allen G; Hahn, Matthew W; Minh, Bui Quang; Lanfear, Robert.

Syst Biol ; 2024 Feb 29.

Artigo em Inglês | MEDLINE | ID: mdl-38421146

RESUMO

Hundreds or thousands of loci are now routinely used in modern phylogenomic studies. Concatenation approaches to tree inference assume that there is a single topology for the entire dataset, but different loci may have different evolutionary histories due to incomplete lineage sorting, introgression, and/or horizontal gene transfer; even single loci may not be treelike due to recombination. To overcome this shortcoming, we introduce an implementation of a multi-tree mixture model that we call MAST. This model extends a prior implementation by Boussau et al. (2009) by allowing users to estimate the weight of each of a set of pre-specified bifurcating trees in a single alignment. The MAST model allows each tree to have its own weight, topology, branch lengths, substitution model, nucleotide or amino acid frequencies, and model of rate heterogeneity across sites. We implemented the MAST model in a maximum-likelihood framework in the popular phylogenetic software, IQ-TREE. Simulations show that we can accurately recover the true model parameters, including branch lengths and tree weights for a given set of tree topologies, under a wide range of biologically realistic scenarios. We also show that we can use standard statistical inference approaches to reject a single-tree model when data are simulated under multiple trees (and vice versa). We applied the MAST model to multiple primate datasets and found that it can recover the signal of incomplete lineage sorting in the Great Apes, as well as the asymmetry in minor trees caused by introgression among several macaque species. When applied to a dataset of four Platyrrhine species for which standard concatenated maximum likelihood and gene tree approaches disagree, we observe that MAST gives the highest weight (i.e. the largest proportion of sites) to the tree also supported by gene tree approaches. These results suggest that the MAST model is able to analyse a concatenated alignment using maximum likelihood, while avoiding some of the biases that come with assuming there is only a single tree. We discuss how the MAST model can be extended in the future.

2.

An assembly-free method of phylogeny reconstruction using short-read sequences from pooled samples without barcodes.

Wong, Thomas K F; Li, Teng; Ranjard, Louis; Wu, Steven H; Sukumaran, Jeet; Rodrigo, Allen G.

PLoS Comput Biol ; 17(9): e1008949, 2021 09.

Artigo em Inglês | MEDLINE | ID: mdl-34516547

RESUMO

A current strategy for obtaining haplotype information from several individuals involves short-read sequencing of pooled amplicons, where fragments from each individual is identified by a unique DNA barcode. In this paper, we report a new method to recover the phylogeny of haplotypes from short-read sequences obtained using pooled amplicons from a mixture of individuals, without barcoding. The method, AFPhyloMix, accepts an alignment of the mixture of reads against a reference sequence, obtains the single-nucleotide-polymorphisms (SNP) patterns along the alignment, and constructs the phylogenetic tree according to the SNP patterns. AFPhyloMix adopts a Bayesian inference model to estimate the phylogeny of the haplotypes and their relative abundances, given that the number of haplotypes is known. In our simulations, AFPhyloMix achieved at least 80% accuracy at recovering the phylogenies and relative abundances of the constituent haplotypes, for mixtures with up to 15 haplotypes. AFPhyloMix also worked well on a real data set of kangaroo mitochondrial DNA sequences.

Assuntos

Código de Barras de DNA Taxonômico , Filogenia , Algoritmos , Teorema de Bayes , DNA Mitocondrial/genética , Humanos , Cadeias de Markov , Método de Monte Carlo , Polimorfismo de Nucleotídeo Único

3.

Correction to: Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage.

Ranjard, Louis; Wong, Thomas K F; Rodrigo, Allen G.

BMC Bioinformatics ; 21(1): 24, 2020 01 22.

Artigo em Inglês | MEDLINE | ID: mdl-31969110

RESUMO

Following publication of the original article [1], the author reported that there are several errors in the original article.

4.

Effective machine-learning assembly for next-generation amplicon sequencing with very low coverage.

Ranjard, Louis; Wong, Thomas K F; Rodrigo, Allen G.

BMC Bioinformatics ; 20(1): 654, 2019 Dec 11.

Artigo em Inglês | MEDLINE | ID: mdl-31829137

RESUMO

BACKGROUND: In short-read DNA sequencing experiments, the read coverage is a key parameter to successfully assemble the reads and reconstruct the sequence of the input DNA. When coverage is very low, the original sequence reconstruction from the reads can be difficult because of the occurrence of uncovered gaps. Reference guided assembly can then improve these assemblies. However, when the available reference is phylogenetically distant from the sequencing reads, the mapping rate of the reads can be extremely low. Some recent improvements in read mapping approaches aim at modifying the reference according to the reads dynamically. Such approaches can significantly improve the alignment rate of the reads onto distant references but the processing of insertions and deletions remains challenging. RESULTS: Here, we introduce a new algorithm to update the reference sequence according to previously aligned reads. Substitutions, insertions and deletions are performed in the reference sequence dynamically. We evaluate this approach to assemble a western-grey kangaroo mitochondrial amplicon. Our results show that more reads can be aligned and that this method produces assemblies of length comparable to the truth while limiting error rate when classic approaches fail to recover the correct length. Finally, we discuss how the core algorithm of this method could be improved and combined with other approaches to analyse larger genomic sequences. CONCLUSIONS: We introduced an algorithm to perform dynamic alignment of reads on a distant reference. We showed that such approach can improve the reconstruction of an amplicon compared to classically used bioinformatic pipelines. Although not portable to genomic scale in the current form, we suggested several improvements to be investigated to make this method more flexible and allow dynamic alignment to be used for large genome assemblies.

Assuntos

Sequenciamento de Nucleotídeos em Larga Escala/métodos , Aprendizado de Máquina , Algoritmos , Animais , Sequência de Bases , Genoma Mitocondrial , Macropodidae/genética , Nucleotídeos/genética

5.

HaploJuice : accurate haplotype assembly from a pool of sequences with known relative concentrations.

Wong, Thomas K F; Ranjard, Louis; Lin, Yu; Rodrigo, Allen G.

BMC Bioinformatics ; 19(1): 389, 2018 Oct 22.

Artigo em Inglês | MEDLINE | ID: mdl-30348075

RESUMO

BACKGROUND: Pooling techniques, where multiple sub-samples are mixed in a single sample, are widely used to take full advantage of high-throughput DNA sequencing. Recently, Ranjard et al. (PLoS ONE 13:0195090, 2018) proposed a pooling strategy without the use of barcodes. Three sub-samples were mixed in different known proportions (i.e. 62.5%, 25% and 12.5%), and a method was developed to use these proportions to reconstruct the three haplotypes effectively. RESULTS: HaploJuice provides an alternative haplotype reconstruction algorithm for Ranjard et al.'s pooling strategy. HaploJuice significantly increases the accuracy by first identifying the empirical proportions of the three mixed sub-samples and then assembling the haplotypes using a dynamic programming approach. HaploJuice was evaluated against five different assembly algorithms, Hmmfreq (Ranjard et al., PLoS ONE 13:0195090, 2018), ShoRAH (Zagordi et al., BMC Bioinformatics 12:119, 2011), SAVAGE (Baaijens et al., Genome Res 27:835-848, 2017), PredictHaplo (Prabhakaran et al., IEEE/ACM Trans Comput Biol Bioinform 11:182-91, 2014) and QuRe (Prosperi and Salemi, Bioinformatics 28:132-3, 2012). Using simulated and real data sets, HaploJuice reconstructed the true sequences with the highest coverage and the lowest error rate. CONCLUSION: HaploJuice provides high accuracy in haplotype reconstruction, making Ranjard et al.'s pooling strategy more efficient, feasible, and applicable, with the benefit of reducing the sequencing cost.

Assuntos

Algoritmos , Haplótipos/genética , Sequência de Bases , Simulação por Computador , Bases de Dados Genéticas , Humanos

6.

Estimation of evolutionary parameters using short, random and partial sequences from mixed samples of anonymous individuals.

Wu, Steven H; Rodrigo, Allen G.

BMC Bioinformatics ; 16: 357, 2015 Nov 04.

Artigo em Inglês | MEDLINE | ID: mdl-26536860

RESUMO

BACKGROUND: Over the last decade, next generation sequencing (NGS) has become widely available, and is now the sequencing technology of choice for most researchers. Nonetheless, NGS presents a challenge for the evolutionary biologists who wish to estimate evolutionary genetic parameters from a mixed sample of unlabelled or untagged individuals, especially when the reconstruction of full length haplotypes can be unreliable. We propose two novel approaches, least squares estimation (LS) and Approximate Bayesian Computation Markov chain Monte Carlo estimation (ABC-MCMC), to infer evolutionary genetic parameters from a collection of short-read sequences obtained from a mixed sample of anonymous DNA using the frequencies of nucleotides at each site only without reconstructing the full-length alignment nor the phylogeny. RESULTS: We used simulations to evaluate the performance of these algorithms, and our results demonstrate that LS performs poorly because bootstrap 95% Confidence Intervals (CIs) tend to under- or over-estimate the true values of the parameters. In contrast, ABC-MCMC 95% Highest Posterior Density (HPD) intervals recovered from ABC-MCMC enclosed the true parameter values with a rate approximately equivalent to that obtained using BEAST, a program that implements a Bayesian MCMC estimation of evolutionary parameters using full-length sequences. Because there is a loss of information with the use of sitewise nucleotide frequencies alone, the ABC-MCMC 95% HPDs are larger than those obtained by BEAST. CONCLUSION: We propose two novel algorithms to estimate evolutionary genetic parameters based on the proportion of each nucleotide. The LS method cannot be recommended as a standalone method for evolutionary parameter estimation. On the other hand, parameters recovered by ABC-MCMC are comparable to those obtained using BEAST, but with larger 95% HPDs. One major advantage of ABC-MCMC is that computational time scales linearly with the number of short-read sequences, and is independent of the number of full-length sequences in the original data. This allows us to perform the analysis on NGS datasets with large numbers of short read fragments. The source code for ABC-MCMC is available at https://github.com/stevenhwu/SF-ABC.

Assuntos

Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Algoritmos , Sequência de Bases , Teorema de Bayes , Simulação por Computador , Intervalos de Confiança , Humanos , Análise dos Mínimos Quadrados , Cadeias de Markov , Método de Monte Carlo , Densidade Demográfica

7.

Transient compartmentalization of simian immunodeficiency virus variants in the breast milk of african green monkeys.

Ho, Carrie; Wu, Steven; Amos, Joshua D; Colvin, Lisa; Smith, Shannon D; Wilks, Andrew B; Demarco, C Todd; Brinkley, Christie; Denny, Thomas N; Schmitz, Joern E; Rodrigo, Allen G; Permar, Sallie R.

J Virol ; 87(20): 11292-9, 2013 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-23926337

RESUMO

Natural hosts of simian immunodeficiency virus (SIV), African green monkeys (AGMs), rarely transmit SIV via breast-feeding. In order to examine the genetic diversity of breast milk SIV variants in this limited-transmission setting, we performed phylogenetic analysis on envelope sequences of milk and plasma SIV variants of AGMs. Low-diversity milk virus populations were compartmentalized from that in plasma. However, this compartmentalization was transient, as the milk virus lineages did not persist longitudinally.

Assuntos

Variação Genética , Leite Humano/virologia , Síndrome de Imunodeficiência Adquirida dos Símios/virologia , Vírus da Imunodeficiência Símia/classificação , Vírus da Imunodeficiência Símia/isolamento & purificação , Animais , Chlorocebus aethiops , Análise por Conglomerados , Feminino , Produtos do Gene env/genética , Filogenia , Plasma/virologia , Análise de Sequência de DNA , Vírus da Imunodeficiência Símia/genética

8.

A Bayesian model for classifying all differentially expressed proteins simultaneously in 2D PAGE gels.

Wu, Steven H; Black, Michael A; North, Robyn A; Rodrigo, Allen G.

BMC Bioinformatics ; 13: 137, 2012 Jun 19.

Artigo em Inglês | MEDLINE | ID: mdl-22712439

RESUMO

BACKGROUND: Two-dimensional polyacrylamide gel electrophoresis (2D PAGE) is commonly used to identify differentially expressed proteins under two or more experimental or observational conditions. Wu et al (2009) developed a univariate probabilistic model which was used to identify differential expression between Case and Control groups, by applying a Likelihood Ratio Test (LRT) to each protein on a 2D PAGE. In contrast to commonly used statistical approaches, this model takes into account the two possible causes of missing values in 2D PAGE: either (1) the non-expression of a protein; or (2) a level of expression that falls below the limit of detection. RESULTS: We develop a global Bayesian model which extends the previously described model. Unlike the univariate approach, the model reported here is able treat all differentially expressed proteins simultaneously. Whereas each protein is modelled by the univariate likelihood function previously described, several global distributions are used to model the underlying relationship between the parameters associated with individual proteins. These global distributions are able to combine information from each protein to give more accurate estimates of the true parameters. In our implementation of the procedure, all parameters are recovered by Markov chain Monte Carlo (MCMC) integration. The 95% highest posterior density (HPD) intervals for the marginal posterior distributions are used to determine whether differences in protein expression are due to differences in mean expression intensities, and/or differences in the probabilities of expression. CONCLUSIONS: Simulation analyses showed that the global model is able to accurately recover the underlying global distributions, and identify more differentially expressed proteins than the simple application of a LRT. Additionally, simulations also indicate that the probability of incorrectly identifying a protein as differentially expressed (i.e., the False Discovery Rate) is very low. The source code is available at https://github.com/stevenhwu/BIDE-2D.

Assuntos

Simulação por Computador , Eletroforese em Gel Bidimensional/estatística & dados numéricos , Modelos Biológicos , Biossíntese de Proteínas , Proteômica/estatística & dados numéricos , Teorema de Bayes , Funções Verossimilhança , Cadeias de Markov , Modelos Estatísticos , Método de Monte Carlo , Probabilidade

9.

pgHMA: Application of the heteroduplex mobility assay analysis in phylogenetics and population genetics.

Li, Teng; Wong, Thomas K F; Ranjard, Louis; Rodrigo, Allen G.

Mol Ecol Resour ; 22(2): 653-663, 2022 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-34551204

RESUMO

The heteroduplex mobility assay (HMA) has proven to be a robust tool for the detection of genetic variation. Here, we describe a simple and rapid application of the HMA by microfluidic capillary electrophoresis, for phylogenetics and population genetic analyses (pgHMA). We show how commonly applied techniques in phylogenetics and population genetics have equivalents with pgHMA: phylogenetic reconstruction with bootstrapping, skyline plots, and mismatch distribution analysis. We assess the performance and accuracy of pgHMA by comparing the results obtained against those obtained using standard methods of analyses applied to sequencing data. The resulting comparisons demonstrate that: (a) there is a significant linear relationship (R2 = .992) between heteroduplex mobility and genetic distance, (b) phylogenetic trees obtained by HMA and nucleotide sequences present nearly identical topologies, (c) clades with high pgHMA parametric bootstrap support also have high bootstrap support on nucleotide phylogenies, (d) skyline plots estimated from the UPGMA trees of HMA and Bayesian trees of nucleotide data reveal similar trends, especially for the median trend estimate of effective population size, and (e) optimized mismatch distributions of HMA are closely fitted to the mismatch distributions of nucleotide sequences. In summary, pgHMA is an easily-applied method for approximating phylogenetic diversity and population trends.

Assuntos

Genética Populacional , Análise Heteroduplex , Sequência de Bases , Teorema de Bayes , Filogenia

10.

Evidence for reduced selection pressure on the hepatitis B virus core gene in hepatitis B e antigen-negative chronic hepatitis B.

Warner, Brook G; Tsai, Peter; Rodrigo, Allen G; 'Ofanoa, Malakai; Gane, Edward J; Munn, Stephen R; Abbott, William G H.

J Gen Virol ; 92(Pt 8): 1800-1808, 2011 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-21508187

RESUMO

The mechanisms underlying the high levels of hepatitis B virus (HBV) replication that cause hepatitis B e antigen (HBeAg)-negative chronic hepatitis B (e-CHB) are unknown. Impaired anti-HBV immunity, which may be measurable as a relaxation of selection pressure on the virus, is possible. A group of Tongans (nâ=â345) with a chronic HBV infection, including seven with e-CHB, were genotyped at HLA class I. The repertoire of HBV core-gene codons under positive selection pressure was defined by phylogenetic analysis (by using the paml program) of 708 cloned sequences extracted from the 67 of these 345 subjects with the same repertoire of HLA class I alleles as the seven e-CHB individuals and matched controls (see below). The frequency of non-synonymous mutations at these codons was measured in longitudinal data from 15 subjects. Finally, the number of non-synonymous mutations at these codons was compared in seven groups comprised of one subject with e-CHB and 1-3 HLA class I-matched controls with an inactive, HBeAg-negative chronic HBV infection (e-InD). Nineteen codons in the core gene were under positive selection pressure. There was a high frequency of new non-synonymous mutations at these codons (P<0.0001) in longitudinal data. The mean number of these 19 codons with non-synonymous mutations was lower (Pâ=â0.02) in HBV from subjects with e-CHB (4.4±0.5 codons per subject) versus those with e-InD (6.4±0.4 codons per subject). There is a subtle relaxation in selection pressure on the HBV core gene in e-CHB. This may be due to impaired antiviral immunity, and could contribute to the high levels of viral replication that cause liver inflammation in this disease.

Assuntos

Antígenos do Núcleo do Vírus da Hepatite B/genética , Antígenos E da Hepatite B/genética , Vírus da Hepatite B/genética , Hepatite B Crônica/virologia , Seleção Genética , Adulto , Sequência de Aminoácidos , Feminino , Antígenos do Núcleo do Vírus da Hepatite B/metabolismo , Antígenos E da Hepatite B/metabolismo , Vírus da Hepatite B/classificação , Vírus da Hepatite B/isolamento & purificação , Vírus da Hepatite B/fisiologia , Humanos , Masculino , Pessoa de Meia-Idade , Dados de Sequência Molecular , Mutação , Filogenia

11.

Associations between HLA class I alleles and escape mutations in the hepatitis B virus core gene in New Zealand-resident Tongans.

Abbott, William G H; Tsai, Peter; Leung, Euphemia; Trevarton, Alex; Ofanoa, Malakai; Hornell, John; Gane, Edward J; Munn, Stephen R; Rodrigo, Allen G.

J Virol ; 84(1): 621-9, 2010 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-19846510

RESUMO

The full repertoire of hepatitis B virus (HBV) peptides that bind to the common HLA class I molecules found in areas with a high prevalence of chronic HBV infection has not been determined. This information may be useful for designing immunotherapies for chronic hepatitis B. We identified amino acid residues under positive selection pressure in the HBV core gene by phylogenetic analysis of cloned DNA sequences obtained from HBV DNA extracted from the sera of Tongan subjects with inactive, HBeAg-negative chronic HBV infections. The repertoires of positively selected sites in groups of subjects who were homozygous for either HLA-B*4001 (n = 10) or HLA-B*5602 (n = 7) were compared. We identified 13 amino acid sites under positive selection pressure. A significant association between an HLA class I allele and the presence of nonsynonymous mutations was found at five of these sites. HLA-B*4001 was associated with mutations at E77 (P = 0.05) and E113 (P = 0.002), and HLA-B*5602 was associated with mutations at S21 (P = 0.02). In addition, amino acid mutations at V13 (P = 0.03) and E14 (P = 0.01) were more common in the seven subjects with an HLA-A*02 allele. In summary, we have developed an assay that can identify associations between HLA class I alleles and HBV core gene amino acids that mutate in response to selection pressure. This is consistent with published evidence that CD8(+) T cells have a role in suppressing viral replication in inactive, HBeAg-negative chronic HBV infection. This assay may be useful for identifying the clinically significant HBV peptides that bind to common HLA class I molecules.

Assuntos

Vírus da Hepatite B/genética , Antígenos de Histocompatibilidade Classe I/genética , Evasão da Resposta Imune/genética , Mutação , Alelos , Antígenos HLA-A/genética , Antígenos HLA-B/genética , Hepatite B/epidemiologia , Hepatite B/genética , Hepatite B/imunologia , Vírus da Hepatite B/imunologia , Antígenos de Histocompatibilidade Classe I/metabolismo , Humanos , Nova Zelândia/epidemiologia , Fragmentos de Peptídeos/imunologia , Fragmentos de Peptídeos/metabolismo , Seleção Genética , Tonga/epidemiologia , Proteínas do Core Viral/genética , Proteínas do Core Viral/imunologia

12.

Time-dependent rates of molecular evolution.

Ho, Simon Y W; Lanfear, Robert; Bromham, Lindell; Phillips, Matthew J; Soubrier, Julien; Rodrigo, Allen G; Cooper, Alan.

Mol Ecol ; 20(15): 3087-101, 2011 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-21740474

RESUMO

For over half a century, it has been known that the rate of morphological evolution appears to vary with the time frame of measurement. Rates of microevolutionary change, measured between successive generations, were found to be far higher than rates of macroevolutionary change inferred from the fossil record. More recently, it has been suggested that rates of molecular evolution are also time dependent, with the estimated rate depending on the timescale of measurement. This followed surprising observations that estimates of mutation rates, obtained in studies of pedigrees and laboratory mutation-accumulation lines, exceeded long-term substitution rates by an order of magnitude or more. Although a range of studies have provided evidence for such a pattern, the hypothesis remains relatively contentious. Furthermore, there is ongoing discussion about the factors that can cause molecular rate estimates to be dependent on time. Here we present an overview of our current understanding of time-dependent rates. We provide a summary of the evidence for time-dependent rates in animals, bacteria and viruses. We review the various biological and methodological factors that can cause rates to be time dependent, including the effects of natural selection, calibration errors, model misspecification and other artefacts. We also describe the challenges in calibrating estimates of molecular rates, particularly on the intermediate timescales that are critical for an accurate characterization of time-dependent rates. This has important consequences for the use of molecular-clock methods to estimate timescales of recent evolutionary events.

Assuntos

Evolução Biológica , Taxa de Mutação , Animais , Bactérias/genética , Calibragem , DNA Mitocondrial/genética , Fósseis , Humanos , Modelos Genéticos , Filogenia , Seleção Genética , Tempo , Vírus/genética

13.

A statistical model to identify differentially expressed proteins in 2D PAGE gels.

Wu, Steven H; Black, Michael A; North, Robyn A; Atkinson, Kelly R; Rodrigo, Allen G.

PLoS Comput Biol ; 5(9): e1000509, 2009 Sep.

Artigo em Inglês | MEDLINE | ID: mdl-19763172

RESUMO

Two dimensional polyacrylamide gel electrophoresis (2D PAGE) is used to identify differentially expressed proteins and may be applied to biomarker discovery. A limitation of this approach is the inability to detect a protein when its concentration falls below the limit of detection. Consequently, differential expression of proteins may be missed when the level of a protein in the cases or controls is below the limit of detection for 2D PAGE. Standard statistical techniques have difficulty dealing with undetected proteins. To address this issue, we propose a mixture model that takes into account both detected and non-detected proteins. Non-detected proteins are classified either as (a) proteins that are not expressed in at least one replicate, or (b) proteins that are expressed but are below the limit of detection. We obtain maximum likelihood estimates of the parameters of the mixture model, including the group-specific probability of expression and mean expression intensities. Differentially expressed proteins can be detected by using a Likelihood Ratio Test (LRT). Our simulation results, using data generated from biological experiments, show that the likelihood model has higher statistical power than standard statistical approaches to detect differentially expressed proteins. An R package, Slider (Statistical Likelihood model for Identifying Differential Expression in R), is freely available at http://www.cebl.auckland.ac.nz/slider.php.

Assuntos

Eletroforese em Gel Bidimensional/métodos , Modelos Biológicos , Modelos Estatísticos , Proteínas/metabolismo , Proteômica/métodos , Algoritmos , Análise de Variância , Simulação por Computador , Feminino , Humanos , Funções Verossimilhança , Pré-Eclâmpsia , Gravidez , Sensibilidade e Especificidade

14.

Selecting taxa to save or sequence: desirable criteria and a greedy solution.

Bordewich, Magnus; Rodrigo, Allen G; Semple, Charles.

Syst Biol ; 57(6): 825-34, 2008 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-19085326

RESUMO

Three desirable properties for any method of selecting a subset of evolutionary units (EUs) for conservation or for genomic sequencing are discussed. These properties are spread, stability, and applicability. We are motivated by a practical case in which the maximization of phylogenetic diversity (PD), which has been suggested as a suitable method, appears to lead to counterintuitive collections of EUs and does not meet these three criteria. We define a simple greedy algorithm (GREEDYMMD) as a close approximation to choosing the subset that maximizes the minimum pairwise distance (MMD) between EUs. GREEDYMMD satisfies our three criteria and may be a useful alternative to PD in real-world situations. In particular, we show that this method of selection is suitable under a model of biodiversity in which features arise and/or disappear during evolution. We also show that if distances between EUs satisfy the ultrametric condition, then GREEDYMMD delivers an optimal subset of EUs that maximizes both the minimum pairwise distance and the PD. Finally, because GREEDYMMD works with distances and does not require a tree, it is readily applicable to many data sets.

Assuntos

Classificação/métodos , Genômica/métodos , Filogenia , Algoritmos , Animais , Genômica/normas

15.

SQUINT: a multiple alignment program and editor.

Goode, Matthew G; Rodrigo, Allen G.

Bioinformatics ; 23(12): 1553-5, 2007 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-17485434

RESUMO

SUMMARY: SQUINT is a sequence alignment tool, and combines both automated progressive sequence alignment with facilities for manual editing. The program imports nucleotide or amino acid sequence multiple alignment files in standard formats, and permits users to view two translations of the same multiple alignment simultaneously. Edits in one view are instantaneously reflected in the other, and the scoring cost of the changes are shown in real-time. Progressive multiple alignments, using a variety of alignment parameters, can be performed on any block of sequences, including blocks embedded in the existing alignment. AVAILABILITY: The software is freely available for download at http://www.cebl.auckland.ac.nz

Assuntos

Biologia Computacional/métodos , Alinhamento de Sequência , Software , Sequência de Aminoácidos , Sequência de Bases , Análise de Sequência de Proteína , Homologia de Sequência de Aminoácidos , Homologia de Sequência do Ácido Nucleico

16.

Recombination in feline immunodeficiency virus from feral and companion domestic cats.

Hayward, Jessica J; Rodrigo, Allen G.

Virol J ; 5: 76, 2008 Jun 17.

Artigo em Inglês | MEDLINE | ID: mdl-18559113

RESUMO

BACKGROUND: Recombination is a relatively common phenomenon in retroviruses. We investigated recombination in Feline Immunodeficiency Virus from naturally-infected New Zealand domestic cats (Felis catus) by sequencing regions of the gag, pol and env genes. RESULTS: The occurrence of intragenic recombination was highest in env, with evidence of recombination in 6.4% (n = 156) of all cats. A further recombinant was identified in each of the gag (n = 48) and pol (n = 91) genes. Comparisons of phylogenetic trees across genes identified cases of incongruence, indicating intergenic recombination. Three (7.7%, n = 39) of these incongruencies were found to be significantly different using the Shimodaira-Hasegawa test.Surprisingly, our phylogenies from the gag and pol genes showed that no New Zealand sequences group with reference subtype C sequences within intrasubtype pairwise distances. Indeed, we find one and two distinct unknown subtype groups in gag and pol, respectively. These observations cause us to speculate that these New Zealand FIV strains have undergone several recombination events between subtype A parent strains and undefined unknown subtype strains, similar to the evolutionary history hypothesised for HIV-1 "subtype E".Endpoint dilution sequencing was used to confirm the consensus sequences of the putative recombinants and unknown subtype groups, providing evidence for the authenticity of these sequences. Endpoint dilution sequencing also resulted in the identification of a dual infection event in the env gene. In addition, an intrahost recombination event between variants of the same subtype in the pol gene was established. This is the first known example of naturally-occurring recombination in a cat with infection of the parent strains. CONCLUSION: Evidence of intragenic recombination in the gag, pol and env regions, and complex intergenic recombination, of FIV from naturally-infected domestic cats in New Zealand was found. Strains of unknown subtype were identified in all three gene regions. These results have implications for the use of the current FIV vaccine in New Zealand.

Assuntos

Animais Selvagens/virologia , Gatos/virologia , Síndrome de Imunodeficiência Adquirida Felina/virologia , Vírus da Imunodeficiência Felina/genética , Animais , Portador Sadio , Genes env/genética , Genes gag/genética , Genes pol/genética , Nova Zelândia , Filogenia , Recombinação Genética

17.

Reassembling haplotypes in a mixture of pooled amplicons when the relative concentrations are known: A proof-of-concept study on the efficient design of next-generation sequencing strategies.

Ranjard, Louis; Wong, Thomas K F; Rodrigo, Allen G.

PLoS One ; 13(4): e0195090, 2018.

Artigo em Inglês | MEDLINE | ID: mdl-29621260

RESUMO

Next-generation sequencing can be costly and labour intensive. Usually, the sequencing cost per sample is reduced by pooling amplified DNA = amplicons) derived from different individuals on the same sequencing lane. Barcodes unique to each amplicon permit short-read sequences to be assigned appropriately. However, the cost of the library preparation increases with the number of barcodes used. We propose an alternative to barcoding: by using different known proportions of individually-derived amplicons in a pooled sample, each is characterised a priori by an expected depth of coverage. We have developed a Hidden Markov Model that uses these expected proportions to reconstruct the input sequences. We apply this method to pools of mitochondrial DNA amplicons extracted from kangaroo meat, genus Macropus. Our experiments indicate that the sequence coverage can be efficiently used to index the short-reads and that we can reassemble the input haplotypes when secondary factors impacting the coverage are controlled. We therefore demonstrate that, by combining our approach with standard barcoding, the cost of the library preparation is reduced to a third.

Assuntos

Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala , Animais , Mapeamento Cromossômico , Biologia Computacional/métodos , DNA Mitocondrial , Genoma Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala/economia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Sequenciamento de Nucleotídeos em Larga Escala/normas , Macropodidae/genética , Cadeias de Markov , Análise de Sequência de DNA

18.

Complete mitochondrial genome of the green-lipped mussel, Perna canaliculus (Mollusca: Mytiloidea), from long nanopore sequencing reads.

Ranjard, Louis; Wong, Thomas K F; Külheim, Carsten; Rodrigo, Allen G; Ragg, Norman L C; Patel, Selina; Dunphy, Brendon J.

Mitochondrial DNA B Resour ; 3(1): 175-176, 2018 Feb 09.

Artigo em Inglês | MEDLINE | ID: mdl-33490494

RESUMO

We describe here the first complete genome assembly of the New Zealand green-lipped mussel, Perna canaliculus, mitochondrion. The assembly was performed de novo from a mix of long nanopore sequencing reads and short sequencing reads. The genome is 16,005 bp long. Comparison to other Mytiloidea mitochondrial genomes indicates important gene rearrangements in this family.

19.

Computational Evaluation of the Strict Master and Random Template Models of Endogenous Retrovirus Evolution.

Nascimento, Fabrícia F; Rodrigo, Allen G.

PLoS One ; 11(9): e0162454, 2016.

Artigo em Inglês | MEDLINE | ID: mdl-27649303

RESUMO

Transposable elements (TEs) are DNA sequences that are able to replicate and move within and between host genomes. Their mechanism of replication is also shared with endogenous retroviruses (ERVs), which are also a type of TE that represent an ancient retroviral infection within animal genomes. Two models have been proposed to explain TE proliferation in host genomes: the strict master model (SMM), and the random template (or transposon) model (TM). In SMM only a single copy of a given TE lineage is able to replicate, and all other genomic copies of TEs are derived from that master copy. In TM, any element of a given family is able to replicate in the host genome. In this paper, we simulated ERV phylogenetic trees under variations of SMM and TM. To test whether current phylogenetic programs can recover the simulated ERV phylogenies, DNA sequence alignments were simulated and maximum likelihood trees were reconstructed and compared to the simulated phylogenies. Results indicate that visual inspection of phylogenetic trees alone can be misleading. However, if a set of statistical summaries is calculated, we are able to distinguish between models with high accuracy by using a data mining algorithm that we introduce here. We also demonstrate the use of our data mining algorithm with empirical data for the porcine endogenous retrovirus (PERV), an ERV that is able to replicate in human and pig cells in vitro.

Assuntos

Simulação por Computador , Elementos de DNA Transponíveis , Retrovirus Endógenos/genética , Modelos Genéticos , Filogenia , Animais , Mineração de Dados , Evolução Molecular , Humanos , Suínos

20.

Identifying predictors of time-inhomogeneous viral evolutionary processes.

Bielejec, Filip; Baele, Guy; Rodrigo, Allen G; Suchard, Marc A; Lemey, Philippe.

Virus Evol ; 2(2): vew023, 2016 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-27774306

RESUMO

Various factors determine the rate at which mutations are generated and fixed in viral genomes. Viral evolutionary rates may vary over the course of a single persistent infection and can reflect changes in replication rates and selective dynamics. Dedicated statistical inference approaches are required to understand how the complex interplay of these processes shapes the genetic diversity and divergence in viral populations. Although evolutionary models accommodating a high degree of complexity can now be formalized, adequately informing these models by potentially sparse data, and assessing the association of the resulting estimates with external predictors, remains a major challenge. In this article, we present a novel Bayesian evolutionary inference method, which integrates multiple potential predictors and tests their association with variation in the absolute rates of synonymous and non-synonymous substitutions along the evolutionary history. We consider clinical and virological measures as predictors, but also changes in population size trajectories that are simultaneously inferred using coalescent modelling. We demonstrate the potential of our method in an application to within-host HIV-1 sequence data sampled throughout the infection of multiple patients. While analyses of individual patient populations lack statistical power, we detect significant evidence for an abrupt drop in non-synonymous rates in late stage infection and a more gradual increase in synonymous rates over the course of infection in a joint analysis across all patients. The former is predicted by the immune relaxation hypothesis while the latter may be in line with increasing replicative fitness during the asymptomatic stage.

RESUMO

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA