Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
PLoS Comput Biol ; 20(2): e1011871, 2024 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-38330139

RESUMEN

Massive sequencing of SARS-CoV-2 genomes has urged novel methods that employ existing phylogenies to add new samples efficiently instead of de novo inference. 'TIPars' was developed for such challenge integrating parsimony analysis with pre-computed ancestral sequences. It took about 21 seconds to insert 100 SARS-CoV-2 genomes into a 100k-taxa reference tree using 1.4 gigabytes. Benchmarking on four datasets, TIPars achieved the highest accuracy for phylogenies of moderately similar sequences. For highly similar and divergent scenarios, fully parsimony-based and likelihood-based phylogenetic placement methods performed the best respectively while TIPars was the second best. TIPars accomplished efficient and accurate expansion of phylogenies of both similar and divergent sequences, which would have broad biological applications beyond SARS-CoV-2. TIPars is accessible from https://tipars.hku.hk/ and source codes are available at https://github.com/id-bioinfo/TIPars.


Asunto(s)
Genoma , Programas Informáticos , Filogenia , Funciones de Verosimilitud , SARS-CoV-2/genética
2.
Plant J ; 85(4): 532-47, 2016 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-26764122

RESUMEN

The pentatricopeptide repeat (PPR) proteins form one of the largest protein families in land plants. They are characterised by tandem 30-40 amino acid motifs that form an extended binding surface capable of sequence-specific recognition of RNA strands. Almost all of them are post-translationally targeted to plastids and mitochondria, where they play important roles in post-transcriptional processes including splicing, RNA editing and the initiation of translation. A code describing how PPR proteins recognise their RNA targets promises to accelerate research on these proteins, but making use of this code requires accurate definition and annotation of all of the various nucleotide-binding motifs in each protein. We have used a structural modelling approach to define 10 different variants of the PPR motif found in plant proteins, in addition to the putative deaminase motif that is found at the C-terminus of many RNA-editing factors. We show that the super-helical RNA-binding surface of RNA-editing factors is potentially longer than previously recognised. We used the redefined motifs to develop accurate and consistent annotations of PPR sequences from 109 genomes. We report a high error rate in PPR gene models in many public plant proteomes, due to gene fusions and insertions of spurious introns. These consistently annotated datasets across a wide range of species are valuable resources for future comparative genomics studies, and an essential pre-requisite for accurate large-scale computational predictions of PPR targets. We have created a web portal (http://www.plantppr.com) that provides open access to these resources for the community.


Asunto(s)
Embryophyta/genética , Modelos Estructurales , Proteínas de Plantas/química , Edición de ARN/genética , Secuencias de Aminoácidos , Secuencia de Aminoácidos , Embryophyta/metabolismo , Mitocondrias/metabolismo , Modelos Moleculares , Anotación de Secuencia Molecular , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Plastidios/metabolismo , Transporte de Proteínas , Proteínas con Motivos de Reconocimiento de ARN/química , Proteínas con Motivos de Reconocimiento de ARN/genética , Proteínas con Motivos de Reconocimiento de ARN/metabolismo , ARN de Planta/genética , Alineación de Secuencia
3.
BMC Bioinformatics ; 17 Suppl 8: 285, 2016 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-27585754

RESUMEN

BACKGROUND: This paper describes a new MSA tool called PnpProbs, which constructs better multiple sequence alignments by better handling of guide trees. It classifies sequences into two types: normally related and distantly related. For normally related sequences, it uses an adaptive approach to construct the guide tree needed for progressive alignment; it first estimates the input's discrepancy by computing the standard deviation of their percent identities, and based on this estimate, it chooses the better method to construct the guide tree. For distantly related sequences, PnpProbs abandons the guide tree and uses instead some non-progressive alignment method to generate the alignment. RESULTS: To evaluate PnpProbs, we have compared it with thirteen other popular MSA tools, and PnpProbs has the best alignment scores in all but one test. We have also used it for phylogenetic analysis, and found that the phylogenetic trees constructed from PnpProbs' alignments are closest to the model trees. CONCLUSIONS: By combining the strength of the progressive and non-progressive alignment methods, we have developed an MSA tool called PnpProbs. We have compared PnpProbs with thirteen other popular MSA tools and our results showed that our tool usually constructed the best alignments.


Asunto(s)
Algoritmos , Filogenia , Alineación de Secuencia/métodos , Secuencia de Aminoácidos , Simulación por Computador , Bases de Datos de Proteínas , Programas Informáticos , Factores de Tiempo
4.
BMC Bioinformatics ; 16 Suppl 5: S4, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25859903

RESUMEN

Progressive sequence alignment is one of the most commonly used method for multiple sequence alignment. Roughly speaking, the method first builds a guide tree, and then aligns the sequences progressively according to the topology of the tree. It is believed that guide trees are very important to progressive alignment; a better guide tree will give an alignment with higher accuracy. Recently, we have proposed an adaptive method for constructing guide trees. This paper studies the quality of the guide trees constructed by such method. Our study showed that our adaptive method can be used to improve the accuracy of many different progressive MSA tools. In fact, we give evidences showing that the guide trees constructed by the adaptive method are among the best.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN , Simulación por Computador , Bases de Datos Genéticas , Evolución Molecular , Humanos , Filogenia , Programas Informáticos
5.
Virus Evol ; 10(1): veae056, 2024.
Artículo en Inglés | MEDLINE | ID: mdl-39247558

RESUMEN

The unprecedentedly large size of the global SARS-CoV-2 phylogeny makes any computation on the tree difficult. Lineage identification (e.g. the PANGO nomenclature for SARS-CoV-2) and assignment are key to track the virus evolution. It requires annotating clade roots of lineages to unlabeled ancestral nodes in a phylogenetic tree. Then the lineage labels of descendant samples under these clade roots can be inferred to be the corresponding lineages. This is the ancestral lineage annotation problem, and matUtils (a package in pUShER) and PastML are commonly used methods. However, their computational tractability is a challenge and their accuracy needs further exploration in huge SARS-CoV-2 phylogenies. We have developed an efficient and accurate method, called "F1ALA", that utilizes the F1-score to evaluate the confidence with which a specific ancestral node can be annotated as the clade root of a lineage, given the lineage labels of a set of taxa in a rooted tree. Compared to these methods, F1ALA achieved roughly an order of magnitude faster yet with ∼12% of their memory usage when annotating 2277 PANGO lineages in a phylogeny of 5.26 million taxa. F1ALA allows real-time lineage tracking to be performed on a laptop computer. F1ALA outperformed matUtils (pUShER) with statistical significance, and had comparable accuracy to PastML in tests on empirical and simulated data. F1ALA enables a tree refinement by pruning taxa with inconsistent labels to their closest annotation nodes and re-inserting them back to the pruned tree to improve a SARS-CoV-2 phylogeny with both higher log-likelihood and lower parsimony score. Given the ultrafast speed and high accuracy, we anticipated that F1ALA will also be useful for large phylogenies of other viruses. Codes and benchmark datasets are publicly available at https://github.com/id-bioinfo/F1ALA.

6.
ISME J ; 18(1)2024 Jan 08.
Artículo en Inglés | MEDLINE | ID: mdl-38747389

RESUMEN

Spillovers of viruses from animals to humans occur more frequently under warmer conditions, particularly arboviruses. The invasive tick species Haemaphysalis longicornis, the Asian longhorned tick, poses a significant public health threat due to its global expansion and its potential to carry a wide range of pathogens. We analyzed meta-transcriptomic data from 3595 adult H. longicornis ticks collected between 2016 and 2019 in 22 provinces across China encompassing diverse ecological conditions. Generalized additive modeling revealed that climate factors exerted a stronger influence on the virome of H. longicornis than other ecological factors, such as ecotypes, distance to coastline, animal host, tick gender, and antiviral immunity. To understand how climate changes drive the tick virome, we performed a mechanistic investigation using causality inference with emphasis on the significance of this process for public health. Our findings demonstrated that higher temperatures and lower relative humidity/precipitation contribute to variations in animal host diversity, leading to increased diversity of the tick virome, particularly the evenness of vertebrate-associated viruses. These findings may explain the evolution of tick-borne viruses into generalists across multiple hosts, thereby increasing the probability of spillover events involving tick-borne pathogens. Deep learning projections have indicated that the diversity of the H. longicornis virome is expected to increase in 81.9% of regions under the SSP8.5 scenario from 2019 to 2030. Extension of surveillance should be implemented to avert the spread of tick-borne diseases.


Asunto(s)
Especies Introducidas , Viroma , Animales , China , Ixodidae/virología , Femenino , Cambio Climático , Masculino , Clima
7.
Artículo en Inglés | MEDLINE | ID: mdl-26357079

RESUMEN

This paper introduces a simple and effective approach to improve the accuracy of multiple sequence alignment. We use a natural measure to estimate the similarity of the input sequences, and based on this measure, we align the input sequences differently. For example, for inputs with high similarity, we consider the whole sequences and align them globally, while for those with moderately low similarity, we may ignore the flank regions and align them locally. To test the effectiveness of this approach, we have implemented a multiple sequence alignment tool called GLProbs and compared its performance with about one dozen leading alignment tools on three benchmark alignment databases, and GLProbs's alignments have the best scores in almost all testings. We have also evaluated the practicability of the alignments of GLProbs by applying the tool to three biological applications, namely phylogenetic trees construction, protein secondary structure prediction and the detection of high risk members for cervical cancer in the HPV-E6 family, and the results are very encouraging.


Asunto(s)
Biología Computacional/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Secuencia de Aminoácidos , Cadenas de Markov , Datos de Secuencia Molecular , Filogenia , Estructura Secundaria de Proteína , Proteínas/química , Proteínas/clasificación
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA