Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 6 de 6
Filter
1.
PLoS Comput Biol ; 20(2): e1011871, 2024 Feb.
Article in English | MEDLINE | ID: mdl-38330139

ABSTRACT

Massive sequencing of SARS-CoV-2 genomes has urged novel methods that employ existing phylogenies to add new samples efficiently instead of de novo inference. 'TIPars' was developed for such challenge integrating parsimony analysis with pre-computed ancestral sequences. It took about 21 seconds to insert 100 SARS-CoV-2 genomes into a 100k-taxa reference tree using 1.4 gigabytes. Benchmarking on four datasets, TIPars achieved the highest accuracy for phylogenies of moderately similar sequences. For highly similar and divergent scenarios, fully parsimony-based and likelihood-based phylogenetic placement methods performed the best respectively while TIPars was the second best. TIPars accomplished efficient and accurate expansion of phylogenies of both similar and divergent sequences, which would have broad biological applications beyond SARS-CoV-2. TIPars is accessible from https://tipars.hku.hk/ and source codes are available at https://github.com/id-bioinfo/TIPars.


Subject(s)
Genome , Software , Phylogeny , Likelihood Functions , SARS-CoV-2/genetics
2.
Plant J ; 85(4): 532-47, 2016 Feb.
Article in English | MEDLINE | ID: mdl-26764122

ABSTRACT

The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. Almost all of them are post-translationally targeted to plastids and mitochondria, where they play important roles in post-transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide-binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C-terminus of many RNA-editing factors. We show that the super-helical RNA-binding surface of RNA-editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre-requisite for accurate large-scale computational predictions of PPR targets. We have created a web portal (http://www.plantppr.com) that provides open access to these resources for the community.


Subject(s)
Embryophyta/genetics , Models, Structural , Plant Proteins/chemistry , RNA Editing/genetics , Amino Acid Motifs , Amino Acid Sequence , Embryophyta/metabolism , Mitochondria/metabolism , Models, Molecular , Molecular Sequence Annotation , Plant Proteins/genetics , Plant Proteins/metabolism , Plastids/metabolism , Protein Transport , RNA Recognition Motif Proteins/chemistry , RNA Recognition Motif Proteins/genetics , RNA Recognition Motif Proteins/metabolism , RNA, Plant/genetics , Sequence Alignment
3.
BMC Bioinformatics ; 17 Suppl 8: 285, 2016 Aug 31.
Article in English | MEDLINE | ID: mdl-27585754

ABSTRACT

BACKGROUND: This paper describes a new MSA tool called PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies sequences into two types: normally related and distantly related. For normally related sequences, it uses an adaptive approach to construct the guide tree needed for progressive alignment; it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the better method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree and uses instead some non-progressive alignment method to generate the alignment. RESULTS: To evaluate PnpProbs, we have compared it with thirteen other popular MSA tools, and PnpProbs has the best alignment scores in all but one test. We have also used it for phylogenetic analysis, and found that the phylogenetic trees constructed from PnpProbs' alignments are closest to the model trees. CONCLUSIONS: By combining the strength of the progressive and non-progressive alignment methods, we have developed an MSA tool called PnpProbs. We have compared PnpProbs with thirteen other popular MSA tools and our results showed that our tool usually constructed the best alignments.


Subject(s)
Algorithms , Phylogeny , Sequence Alignment/methods , Amino Acid Sequence , Computer Simulation , Databases, Protein , Software , Time Factors
4.
BMC Bioinformatics ; 16 Suppl 5: S4, 2015.
Article in English | MEDLINE | ID: mdl-25859903

ABSTRACT

Progressive sequence alignment is one of the most commonly used method for multiple sequence alignment. Roughly speaking, the method first builds a guide tree, and then aligns the sequences progressively according to the topology of the tree. It is believed that guide trees are very important to progressive alignment; a better guide tree will give an alignment with higher accuracy. Recently, we have proposed an adaptive method for constructing guide trees. This paper studies the quality of the guide trees constructed by such method. Our study showed that our adaptive method can be used to improve the accuracy of many different progressive MSA tools. In fact, we give evidences showing that the guide trees constructed by the adaptive method are among the best.


Subject(s)
Computational Biology/methods , Sequence Alignment/methods , Sequence Analysis, DNA , Computer Simulation , Databases, Genetic , Evolution, Molecular , Humans , Phylogeny , Software
5.
ISME J ; 18(1)2024 Jan 08.
Article in English | MEDLINE | ID: mdl-38747389

ABSTRACT

Spillovers of viruses from animals to humans occur more frequently under warmer conditions, particularly arboviruses. The invasive tick species Haemaphysalis longicornis, the Asian longhorned tick, poses a significant public health threat due to its global expansion and its potential to carry a wide range of pathogens. We analyzed meta-transcriptomic data from 3595 adult H. longicornis ticks collected between 2016 and 2019 in 22 provinces across China encompassing diverse ecological conditions. Generalized additive modeling revealed that climate factors exerted a stronger influence on the virome of H. longicornis than other ecological factors, such as ecotypes, distance to coastline, animal host, tick gender, and antiviral immunity. To understand how climate changes drive the tick virome, we performed a mechanistic investigation using causality inference with emphasis on the significance of this process for public health. Our findings demonstrated that higher temperatures and lower relative humidity/precipitation contribute to variations in animal host diversity, leading to increased diversity of the tick virome, particularly the evenness of vertebrate-associated viruses. These findings may explain the evolution of tick-borne viruses into generalists across multiple hosts, thereby increasing the probability of spillover events involving tick-borne pathogens. Deep learning projections have indicated that the diversity of the H. longicornis virome is expected to increase in 81.9% of regions under the SSP8.5 scenario from 2019 to 2030. Extension of surveillance should be implemented to avert the spread of tick-borne diseases.


Subject(s)
Introduced Species , Virome , Animals , China , Ixodidae/virology , Female , Climate Change , Male , Climate
6.
Article in English | MEDLINE | ID: mdl-26357079

ABSTRACT

This paper introduces a simple and effective approach to improve the accuracy of multiple sequence alignment. We use a natural measure to estimate the similarity of the input sequences, and based on this measure, we align the input sequences differently. For example, for inputs with high similarity, we consider the whole sequences and align them globally, while for those with moderately low similarity, we may ignore the flank regions and align them locally. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with about one dozen leading alignment tools on three benchmark alignment databases, and GLProbs's alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic trees construction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging.


Subject(s)
Computational Biology/methods , Sequence Alignment/methods , Sequence Analysis, Protein/methods , Software , Algorithms , Amino Acid Sequence , Markov Chains , Molecular Sequence Data , Phylogeny , Protein Structure, Secondary , Proteins/chemistry , Proteins/classification
SELECTION OF CITATIONS
SEARCH DETAIL