Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
J Comput Biol ; 2024 Jun 11.
Artículo en Inglés | MEDLINE | ID: mdl-38860371

RESUMEN

The single-matrix amino acid (AA) substitution models are widely used in phylogenetic analyses; however, they are unable to properly model the heterogeneity of AA substitution rates among sites. The multi-matrix mixture models can handle the site rate heterogeneity and outperform the single-matrix models. Estimating multi-matrix mixture models is a complex process and no computer program is available for this task. In this study, we implemented a computer program of the so-called QMix based on the algorithm of LG4X and LG4M with several enhancements to automatically estimate multi-matrix mixture models from large datasets. QMix employs QMaker algorithm instead of XRATE algorithm to accurately and rapidly estimate the parameters of models. It is able to estimate mixture models with different number of matrices and supports multi-threading computing to efficiently estimate models from thousands of genes. We re-estimate mixture models LG4X and LG4M from 1471 HSSP alignments. The re-estimated models (HP4X and HP4M) are slightly better than LG4X and LG4M in building maximum likelihood trees from HSSP and TreeBASE datasets. QMix program required about 10 hours on a computer with 18 cores to estimate a mixture model with four matrices from 200 HSSP alignments. It is easy to use and freely available for researchers.

2.
J Evol Biol ; 37(2): 256-265, 2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38366253

RESUMEN

Estimating parameters of amino acid substitution models is a crucial task in bioinformatics. The maximum likelihood (ML) approach has been proposed to estimate amino acid substitution models from large datasets. The quality of newly estimated models is normally assessed by comparing with the existing models in building ML trees. Two important questions remained are the correlation of the estimated models with the true models and the required size of the training datasets to estimate reliable models. In this article, we performed a simulation study to answer these two questions based on simulated data. We simulated genome datasets with different numbers of genes/alignments based on predefined models (called true models) and predefined trees (called true trees). The simulated datasets were used to estimate amino acid substitution model using the ML estimation methods. Our experiments showed that models estimated by the ML methods from simulated datasets with more than 100 genes have high correlations with the true models. The estimated models performed well in building ML trees in comparison with the true models. The results suggest that amino acid substitution models estimated by the ML methods from large genome datasets are a reliable tool for analyzing amino acid sequences.


Asunto(s)
Algoritmos , Genoma , Sustitución de Aminoácidos , Filogenia , Simulación por Computador , Modelos Genéticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...