Pesquisa | BVS IEC

1.

AMRomics: a scalable workflow to analyze large microbial genome collections.

Le, Duc Quang; Nguyen, Tam Thi; Nguyen, Canh Hao; Ho, Tho Huu; Vo, Nam S; Nguyen, Trang; Nguyen, Hoang Anh; Vinh, Le Sy; Dang, Thanh Hai; Cao, Minh Duc; Nguyen, Son Hoang.

BMC Genomics ; 25(1): 709, 2024 Jul 22.

Artigo em Inglês | MEDLINE | ID: mdl-39039439

RESUMO

Whole genome analysis for microbial genomics is critical to studying and monitoring antimicrobial resistance strains. The exponential growth of microbial sequencing data necessitates a fast and scalable computational pipeline to generate the desired outputs in a timely and cost-effective manner. Recent methods have been implemented to integrate individual genomes into large collections of specific bacterial populations and are widely employed for systematic genomic surveillance. However, they do not scale well when the population expands and turnaround time remains the main issue for this type of analysis. Here, we introduce AMRomics, an optimized microbial genomics pipeline that can work efficiently with big datasets. We use different bacterial data collections to compare AMRomics against competitive tools and show that our pipeline can generate similar results of interest but with better performance. The software is open source and is publicly available at https://github.com/amromics/amromics under an MIT license.

Assuntos

Genoma Bacteriano , Genômica , Software , Fluxo de Trabalho , Genômica/métodos , Biologia Computacional/métodos , Bactérias/genética , Genoma Microbiano , Farmacorresistência Bacteriana/genética

2.

Estimating amino acid substitution models from genome datasets: a simulation study on the performance of estimated models.

Tinh, Nguyen Huy; Dang, Cuong Cao; Vinh, Le Sy.

J Evol Biol ; 37(2): 256-265, 2024 Feb 14.

Artigo em Inglês | MEDLINE | ID: mdl-38366253

RESUMO

Estimating parameters of amino acid substitution models is a crucial task in bioinformatics. The maximum likelihood (ML) approach has been proposed to estimate amino acid substitution models from large datasets. The quality of newly estimated models is normally assessed by comparing with the existing models in building ML trees. Two important questions remained are the correlation of the estimated models with the true models and the required size of the training datasets to estimate reliable models. In this article, we performed a simulation study to answer these two questions based on simulated data. We simulated genome datasets with different numbers of genes/alignments based on predefined models (called true models) and predefined trees (called true trees). The simulated datasets were used to estimate amino acid substitution model using the ML estimation methods. Our experiments showed that models estimated by the ML methods from simulated datasets with more than 100 genes have high correlations with the true models. The estimated models performed well in building ML trees in comparison with the true models. The results suggest that amino acid substitution models estimated by the ML methods from large genome datasets are a reliable tool for analyzing amino acid sequences.

Assuntos

Algoritmos , Genoma , Substituição de Aminoácidos , Filogenia , Simulação por Computador , Modelos Genéticos

3.

Estimating amino acid substitution models for metazoan evolutionary studies.

Dang, Cuong Cao; Vinh, Le Sy.

J Evol Biol ; 36(3): 499-506, 2023 03.

Artigo em Inglês | MEDLINE | ID: mdl-36598184

RESUMO

Amino acid substitution models represent the substitution rates among amino acids during the evolution of protein sequences. The models are a prerequisite for maximum likelihood or Bayesian methods to analyse the phylogenetic relationships among species based on their protein sequences. Estimating amino acid substitution models requires large protein datasets and intensive computation. In this paper, we presented the estimation of both time-reversible model (Q.met) and time non-reversible model (NQ.met) for multicellular animals (Metazoa). Analyses showed that the Q.met and NQ.met models were significantly better than existing models in analysing metazoan protein sequences. Moreover, the time non-reversible model NQ.met enables us to reconstruct the rooted phylogenetic tree for Metazoa. We recommend researchers to employ the Q.met and NQ.met models in analysing metazoan protein sequences.

Assuntos

Evolução Molecular , Proteínas , Animais , Filogenia , Substituição de Aminoácidos , Teorema de Bayes , Modelos Genéticos

4.

nQMaker: Estimating Time Nonreversible Amino Acid Substitution Models.

Dang, Cuong Cao; Minh, Bui Quang; McShea, Hanon; Masel, Joanna; James, Jennifer Eleanor; Vinh, Le Sy; Lanfear, Robert.

Syst Biol ; 71(5): 1110-1123, 2022 08 10.

Artigo em Inglês | MEDLINE | ID: mdl-35139203

RESUMO

Amino acid substitution models are a key component in phylogenetic analyses of protein sequences. All commonly used amino acid models available to date are time-reversible, an assumption designed for computational convenience but not for biological reality. Another significant downside to time-reversible models is that they do not allow inference of rooted trees without outgroups. In this article, we introduce a maximum likelihood approach nQMaker, an extension of the recently published QMaker method, that allows the estimation of time nonreversible amino acid substitution models and rooted phylogenetic trees from a set of protein sequence alignments. We show that the nonreversible models estimated with nQMaker are a much better fit to empirical alignments than pre-existing reversible models, across a wide range of data sets including mammals, birds, plants, fungi, and other taxa, and that the improvements in model fit scale with the size of the data set. Notably, for the recently published plant and bird trees, these nonreversible models correctly recovered the commonly estimated root placements with very high-statistical support without the need to use an outgroup. We provide nQMaker as an easy-to-use feature in the IQ-TREE software (http://www.iqtree.org), allowing users to estimate nonreversible models and rooted phylogenies from their own protein data sets. The data sets and scripts used in this article are available at https://doi.org/10.5061/dryad.3tx95x6hx. [amino acid sequence analyses; amino acid substitution models; maximum likelihood model estimation; nonreversible models; phylogenetic inference; reversible models.].

Assuntos

Modelos Genéticos , Software , Substituição de Aminoácidos , Animais , Evolução Molecular , Funções Verossimilhança , Mamíferos , Filogenia , Proteínas

5.

QMaker: Fast and Accurate Method to Estimate Empirical Models of Protein Evolution.

Minh, Bui Quang; Dang, Cuong Cao; Vinh, Le Sy; Lanfear, Robert.

Syst Biol ; 70(5): 1046-1060, 2021 08 11.

Artigo em Inglês | MEDLINE | ID: mdl-33616668

RESUMO

Amino acid substitution models play a crucial role in phylogenetic analyses. Maximum likelihood (ML) methods have been proposed to estimate amino acid substitution models; however, they are typically complicated and slow. In this article, we propose QMaker, a new ML method to estimate a general time-reversible $Q$ matrix from a large protein data set consisting of multiple sequence alignments. QMaker combines an efficient ML tree search algorithm, a model selection for handling the model heterogeneity among alignments, and the consideration of rate mixture models among sites. We provide QMaker as a user-friendly function in the IQ-TREE software package (http://www.iqtree.org) supporting the use of multiple CPU cores so that biologists can easily estimate amino acid substitution models from their own protein alignments. We used QMaker to estimate new empirical general amino acid substitution models from the current Pfam database as well as five clade-specific models for mammals, birds, insects, yeasts, and plants. Our results show that the new models considerably improve the fit between model and data and in some cases influence the inference of phylogenetic tree topologies.[Amino acid replacement matrices; amino acid substitution models; maximum likelihood estimation; phylogenetic inferences.].

Assuntos

Evolução Molecular , Modelos Genéticos , Animais , Funções Verossimilhança , Filogenia , Proteínas/genética , Alinhamento de Sequência

6.

Exploring the Kinh Vietnamese genomic database for the polymorphisms of the P450 genes towards precision public health.

Hoang, Diep Thi; Hiep, Tran Van; Thi Phuong Nguyen, Thao; Nhung, Hoang Thi My; Tran, Kien Trung; Vinh, Le Sy.

Ann Hum Biol ; 49(2): 152-155, 2022 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-35289678

RESUMO

BACKGROUND: Human cytochrome P450 (CYPs) genes are essential in metabolising drugs. Due to their high polymorphism, population-specific studies are of great interest. AIM: This research examined the six CYP genes, including CYP2B6, CYP2C9, CYP2C19, CYP2D6, CYP3A5, and CYP4F2 in the Kinh Vietnamese (KHV) for population-scale precision medicine. SUBJECTS AND METHODS: We processed data from a genomics database of 206 healthy and unrelated KHV individuals to calculate CYP allele frequencies. First, we compared the CYP genes of the KHV to six other populations retrieved from the 1000 Genomes Project. Second, we searched the PharmGBK database for drug-CYP interaction data to compile a drug dosage recommendation for the KHV. RESULTS: We observed the diverging trends in genetic variations of CYP2B6, CYP2D6, and CYP3A5 in the KHV. Regarding phenotypic drug responses in the KHV, CYP2C19 exhibited all metabolic phenotypes at a non-trivial frequency. In addition, CYP3A5 metabolised drugs at a lower rate compared to the other five CYPs. CONCLUSION: This is the first large-scale study to investigate multiple CYP genes in the KHV for precision medicine from a public health perspective. Differences found in the distributions of metabolizers for the KHV suggest careful prescriptions for CYP2C19 and CYP3A5-metabolised drugs.

Assuntos

Citocromo P-450 CYP2D6 , Citocromo P-450 CYP3A , Povo Asiático/genética , Citocromo P-450 CYP2B6 , Citocromo P-450 CYP2C19/genética , Citocromo P-450 CYP2C9 , Citocromo P-450 CYP2D6/genética , Citocromo P-450 CYP3A/genética , Sistema Enzimático do Citocromo P-450/genética , Genômica , Humanos , Saúde Pública

7.

De novo homozygous variant of the SCN1A gene in a patient with severe Dravet syndrome complicated by acute encephalopathy.

Van, Le Thi Khanh; Hien, Huynh Thi Dieu; Kieu, Huynh Thi Thuy; Hieu, Nguyen Le Trung; Vinh, Le Sy; Hoa, Giang; Hang, Do Thi Thu.

Neurogenetics ; 22(2): 133-136, 2021 05.

Artigo em Inglês | MEDLINE | ID: mdl-33674996

RESUMO

Variants in the SCN1A gene have been identified in epilepsy patients with widely variable phenotypes and they are generally heterozygous. Here, we report a homozygous missense variant, NM_001165963.4: c.4319C>T (p.Ala1440Val), in the SCN1A gene which seemed to occur de novo together with a gene conversion event. It's highly possible that this variant, although located in a critical functional domain of protein Nav1.1, depending on the nature of the amino acid substitution, may not cause the complete loss of protein function. And the accumulated effect by having this variant on both alleles results in a Dravet syndrome phenotype which is more severe than average. This first report of a de novo homozygous variant in the SCN1A gene, therefore, provides a clear illustration of a complex genotype-phenotype relationship.

Assuntos

Encefalopatias/etiologia , Epilepsias Mioclônicas/genética , Mutação de Sentido Incorreto , Canal de Sódio Disparado por Voltagem NAV1.1/genética , Mutação Puntual , Substituição de Aminoácidos , Transtorno do Espectro Autista/genética , Transtornos do Comportamento Infantil/genética , Epilepsia Resistente a Medicamentos/genética , Epilepsias Mioclônicas/complicações , Estudos de Associação Genética , Homozigoto , Humanos , Lactente , Masculino , Domínios Proteicos/genética , Transtornos do Sono-Vigília/genética

8.

FLAVI: An Amino Acid Substitution Model for Flaviviruses.

Le, Thu Kim; Vinh, Le Sy.

J Mol Evol ; 88(5): 445-452, 2020 07.

Artigo em Inglês | MEDLINE | ID: mdl-32356020

RESUMO

Amino acid substitution models represent substitution rates among amino acids during the evolution. The models play an important role in analyzing protein sequences, especially inferring phylogenies. The rapid evolution of flaviviruses is expanding the threat in public health. A number of models have been estimated for some viruses, however, they are unable to properly represent amino acid substitution patterns of flaviviruses. In this study, we collected protein sequences from the flavivirus genus to specifically estimate an amino acid substitution model, called FLAVI, for flaviviruses. Experiments showed that the collected dataset was sufficient to estimate a stable model. More importantly, the FLAVI model was remarkably better than other existing models in analyzing flavivirus protein sequences. We recommend researchers to use the FLAVI model when studying protein sequences of flaviviruses or closely related viruses.

Assuntos

Substituição de Aminoácidos , Flavivirus , Modelos Genéticos , Sequência de Aminoácidos , Flavivirus/genética

9.

UFBoot2: Improving the Ultrafast Bootstrap Approximation.

Hoang, Diep Thi; Chernomor, Olga; von Haeseler, Arndt; Minh, Bui Quang; Vinh, Le Sy.

Mol Biol Evol ; 35(2): 518-522, 2018 02 01.

Artigo em Inglês | MEDLINE | ID: mdl-29077904

RESUMO

The standard bootstrap (SBS), despite being computationally intensive, is widely used in maximum likelihood phylogenetic analyses. We recently proposed the ultrafast bootstrap approximation (UFBoot) to reduce computing time while achieving more unbiased branch supports than SBS under mild model violations. UFBoot has been steadily adopted as an efficient alternative to SBS and other bootstrap approaches. Here, we present UFBoot2, which substantially accelerates UFBoot and reduces the risk of overestimating branch supports due to polytomies or severe model violations. Additionally, UFBoot2 provides suitable bootstrap resampling strategies for phylogenomic data. UFBoot2 is 778 times (median) faster than SBS and 8.4 times (median) faster than RAxML rapid bootstrap on tested data sets. UFBoot2 is implemented in the IQ-TREE software package version 1.6 and freely available at http://www.iqtree.org.

Assuntos

Funções Verossimilhança , Filogenia , Software , Modelos Genéticos

10.

MPBoot: fast phylogenetic maximum parsimony tree inference and bootstrap approximation.

Hoang, Diep Thi; Vinh, Le Sy; Flouri, Tomás; Stamatakis, Alexandros; von Haeseler, Arndt; Minh, Bui Quang.

BMC Evol Biol ; 18(1): 11, 2018 02 02.

Artigo em Inglês | MEDLINE | ID: mdl-29390973

RESUMO

BACKGROUND: The nonparametric bootstrap is widely used to measure the branch support of phylogenetic trees. However, bootstrapping is computationally expensive and remains a bottleneck in phylogenetic analyses. Recently, an ultrafast bootstrap approximation (UFBoot) approach was proposed for maximum likelihood analyses. However, such an approach is still missing for maximum parsimony. RESULTS: To close this gap we present MPBoot, an adaptation and extension of UFBoot to compute branch supports under the maximum parsimony principle. MPBoot works for both uniform and non-uniform cost matrices. Our analyses on biological DNA and protein showed that under uniform cost matrices, MPBoot runs on average 4.7 (DNA) to 7 times (protein data) (range: 1.2-20.7) faster than the standard parsimony bootstrap implemented in PAUP*; but 1.6 (DNA) to 4.1 times (protein data) slower than the standard bootstrap with a fast search routine in TNT (fast-TNT). However, for non-uniform cost matrices MPBoot is 5 (DNA) to 13 times (protein data) (range:0.3-63.9) faster than fast-TNT. We note that MPBoot achieves better scores more frequently than PAUP* and fast-TNT. However, this effect is less pronounced if an intensive but slower search in TNT is invoked. Moreover, experiments on large-scale simulated data show that while both PAUP* and TNT bootstrap estimates are too conservative, MPBoot bootstrap estimates appear more unbiased. CONCLUSIONS: MPBoot provides an efficient alternative to the standard maximum parsimony bootstrap procedure. It shows favorable performance in terms of run time, the capability of finding a maximum parsimony tree, and high bootstrap accuracy on simulated as well as empirical data sets. MPBoot is easy-to-use, open-source and available at http://www.cibiv.at/software/mpboot .

Assuntos

Filogenia , Software , DNA/genética , Funções Verossimilhança , Modelos Genéticos , Alinhamento de Sequência , Fatores de Tempo

11.

QMix: An Efficient Program to Automatically Estimate Multi-Matrix Mixture Models for Amino Acid Substitution Process.

Tinh, Nguyen Huy; Dang, Cuong Cao; Vinh, Le Sy.

J Comput Biol ; 31(8): 703-707, 2024 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-38860371

RESUMO

The single-matrix amino acid (AA) substitution models are widely used in phylogenetic analyses; however, they are unable to properly model the heterogeneity of AA substitution rates among sites. The multi-matrix mixture models can handle the site rate heterogeneity and outperform the single-matrix models. Estimating multi-matrix mixture models is a complex process and no computer program is available for this task. In this study, we implemented a computer program of the so-called QMix based on the algorithm of LG4X and LG4M with several enhancements to automatically estimate multi-matrix mixture models from large datasets. QMix employs QMaker algorithm instead of XRATE algorithm to accurately and rapidly estimate the parameters of models. It is able to estimate mixture models with different number of matrices and supports multi-threading computing to efficiently estimate models from thousands of genes. We re-estimate mixture models LG4X and LG4M from 1471 HSSP alignments. The re-estimated models (HP4X and HP4M) are slightly better than LG4X and LG4M in building maximum likelihood trees from HSSP and TreeBASE datasets. QMix program required about 10 hours on a computer with 18 cores to estimate a mixture model with four matrices from 200 HSSP alignments. It is easy to use and freely available for researchers.

Assuntos

Algoritmos , Substituição de Aminoácidos , Filogenia , Software , Modelos Genéticos , Biologia Computacional/métodos , Funções Verossimilhança

12.

BM-BronchoLC - A rich bronchoscopy dataset for anatomical landmarks and lung cancer lesion recognition.

Vu, Van Giap; Hoang, Anh Duc; Phan, Thu Phuong; Nguyen, Ngoc Du; Nguyen, Thanh Thuy; Nguyen, Duc Nghia; Dao, Ngoc Phu; Doan, Thi Phuong Lan; Nguyen, Thi Thanh Huyen; Trinh, Thi Huong; Pham, Thi Le Quyen; Le, Thi Thu Trang; Thi Hanh, Phan; Pham, Van Tuyen; Tran, Van Chuong; Vu, Dang Luu; Tran, Van Luong; Nguyen, Thi Thu Thao; Pham, Cam Phuong; Pham, Gia Linh; Luong, Son Ba; Pham, Trung-Dung; Nguyen, Duy-Phuc; Truong, Thi Kieu Anh; Nguyen, Quang Minh; Tran, Truong-Thuy; Dang, Tran Binh; Ta, Viet-Cuong; Tran, Quoc Long; Le, Duc-Trong; Vinh, Le Sy.

Sci Data ; 11(1): 321, 2024 Mar 28.

Artigo em Inglês | MEDLINE | ID: mdl-38548727

RESUMO

Flexible bronchoscopy has revolutionized respiratory disease diagnosis. It offers direct visualization and detection of airway abnormalities, including lung cancer lesions. Accurate identification of airway lesions during flexible bronchoscopy plays an important role in the lung cancer diagnosis. The application of artificial intelligence (AI) aims to support physicians in recognizing anatomical landmarks and lung cancer lesions within bronchoscopic imagery. This work described the development of BM-BronchoLC, a rich bronchoscopy dataset encompassing 106 lung cancer and 102 non-lung cancer patients. The dataset incorporates detailed localization and categorical annotations for both anatomical landmarks and lesions, meticulously conducted by senior doctors at Bach Mai Hospital, Vietnam. To assess the dataset's quality, we evaluate two prevalent AI backbone models, namely UNet++ and ESFPNet, on the image segmentation and classification tasks with single-task and multi-task learning paradigms. We present BM-BronchoLC as a reference dataset in developing AI models to assist diagnostic accuracy for anatomical landmarks and lung cancer lesions in bronchoscopy data.

Assuntos

Broncoscopia , Neoplasias Pulmonares , Humanos , Inteligência Artificial , Neoplasias Pulmonares/diagnóstico por imagem , Tórax/diagnóstico por imagem , Pontos de Referência Anatômicos/diagnóstico por imagem

13.

Random Tree-Puzzle leads to the Yule-Harding distribution.

Vinh, Le Sy; Fuehrer, Andrea; von Haeseler, Arndt.

Mol Biol Evol ; 28(2): 873-7, 2011 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-20705907

RESUMO

Approaches to reconstruct phylogenies abound and are widely used in the study of molecular evolution. Partially through extensive simulations, we are beginning to understand the potential pitfalls as well as the advantages of different methods. However, little work has been done on possible biases introduced by the methods if the input data are random and do not carry any phylogenetic signal. Although Tree-Puzzle (Strimmer K, von Haeseler A. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Mol Biol Evol. 13:964-969; Schmidt HA, Strimmer K, Vingron M, von Haeseler A. 2002. Tree-Puzzle: maximum likelihood phylogenetic analysis using quartets and parallel computing. Bioinformatics 18:502-504) has become common in phylogenetics, the resulting distribution of labeled unrooted bifurcating trees when data do not carry any phylogenetic signal has not been investigated. Our note shows that the distribution converges to the well-known Yule-Harding distribution. However, the bias of the Yule-Harding distribution will be diminished by a tiny amount of phylogenetic information. maximum likelihood, phylogenetic reconstruction, Tree-Puzzle, tree distribution, Yule-Harding distribution.

Assuntos

Modelos Genéticos , Filogenia , Algoritmos , Teorema de Bayes

14.

POY version 4: phylogenetic analysis using dynamic homologies.

Varón, Andrés; Vinh, Le Sy; Wheeler, Ward C.

Cladistics ; 26(1): 72-85, 2010 Feb.

Artigo em Inglês | MEDLINE | ID: mdl-34875752

RESUMO

We present POY version 4, an open source program for the phylogenetic analysis of morphological, prealigned sequence, unaligned sequence, and genomic data. POY allows phylogenetic inference when not only substitutions, but insertions, deletions, and rearrangement events are allowed (computed using the breakpoint or inversion distance). Compared with previous versions, POY 4 provides greater flexibility, a larger number of supported parameter sets, numerous execution time improvements, a vastly improved user interface, greater quality control, and extensive documentation. We introduce POY's basic features, and present a simple example illustrating the performance improvements over previous versions of the application. © The Willi Hennig Society 2009.

15.

Pairwise alignment with rearrangements.

Vinh, Le Sy; Varón, Andrés; Wheeler, Ward C.

Genome Inform ; 17(2): 141-51, 2006.

Artigo em Inglês | MEDLINE | ID: mdl-17503387

RESUMO

The increase of available genomes poses new optimization problems in genome comparisons. A genome can be considered as a sequence of characters (loci) which are genes or segments of nucleotides. Genomes are subject to both nucleotide transformation and character order rearrangement processes. In this context, we define a problem of so-called pairwise alignment with rearrangements (PAR) between two genomes. The PAR generalizes the ordinary pairwise alignment by allowing the rearrangement of character order. The objective is to find the optimal PAR that minimizes the total cost which is composed of three factors: the edit cost between characters, the deletion/insertion cost of characters, and the rearrangement cost between character orders. To this end, we propose simple and effective heuristic methods: character moving and simultaneous character swapping. The efficiency of the methods is tested on Metazoa mitochondrial genomes. Experiments show that, pairwise alignments with rearrangements give better performance than ordinary pairwise alignments without rearrangements. The best proposed method, simultaneous character swapping, is implemented as an essential subroutine in our software POY version 4.0 to reconstruct genome-based phylogenies.

Assuntos

Rearranjo Gênico , Genoma , Filogenia , Recombinação Genética , Alinhamento de Sequência , Algoritmos , Animais , Sequência de Bases , DNA Mitocondrial/genética , Bases de Dados Genéticas , Evolução Molecular , Ordem dos Genes , Mitocôndrias/genética , Modelos Genéticos , Mutagênese Insercional , Software

16.

Shortest triplet clustering: reconstructing large phylogenies using representative sets.

Vinh, Le Sy; von Haeseler, Arndt.

BMC Bioinformatics ; 6: 92, 2005 Apr 08.

Artigo em Inglês | MEDLINE | ID: mdl-15819989

RESUMO

BACKGROUND: Understanding the evolutionary relationships among species based on their genetic information is one of the primary objectives in phylogenetic analysis. Reconstructing phylogenies for large data sets is still a challenging task in Bioinformatics. RESULTS: We propose a new distance-based clustering method, the shortest triplet clustering algorithm (STC), to reconstruct phylogenies. The main idea is the introduction of a natural definition of so-called k-representative sets. Based on k-representative sets, shortest triplets are reconstructed and serve as building blocks for the STC algorithm to agglomerate sequences for tree reconstruction in O(n2) time for n sequences. Simulations show that STC gives better topological accuracy than other tested methods that also build a first starting tree. STC appears as a very good method to start the tree reconstruction. However, all tested methods give similar results if balanced nearest neighbor interchange (BNNI) is applied as a post-processing step. BNNI leads to an improvement in all instances. The program is available at http://www.bi.uni-duesseldorf.de/software/stc/. CONCLUSION: The results demonstrate that the new approach efficiently reconstructs phylogenies for large data sets. We found that BNNI boosts the topological accuracy of all methods including STC, therefore, one should use BNNI as a post-processing step to get better topological accuracy.

Assuntos

Biologia Computacional/métodos , Interpretação Estatística de Dados , Algoritmos , Sequência de Bases , Análise por Conglomerados , Simulação por Computador , Computadores , Evolução Molecular , Internet , Funções Verossimilhança , Modelos Genéticos , Modelos Estatísticos , Reconhecimento Automatizado de Padrão , Filogenia , Alinhamento de Sequência , Análise de Sequência de DNA , Software

17.

Whole genome analysis of a Vietnamese trio.

Hai, Dang Thanh; Thanh, Nguyen Dai; Trang, Pham Thi Minh; Quang, Le Si; Hang, Phan Thi Thu; Cuong, Dang Cao; Phuc, Hoang Kim; Duc, Nguyen Huu; Dong, Do Duc; Minh, Bui Quang; Son, Pham Bao; Vinh, Le Sy.

J Biosci ; 40(1): 113-24, 2015 Mar.

Artigo em Inglês | MEDLINE | ID: mdl-25740146

RESUMO

We here present the first whole genome analysis of an anonymous Kinh Vietnamese (KHV) trio whose genomes were deeply sequenced to 30-fold average coverage. The resulting short reads covered 99.91 percent of the human reference genome (GRCh37d5). We identified 4,719,412 SNPs and 827,385 short indels that satisfied the Mendelian inheritance law. Among them, 109,914 (2.3 percent) SNPs and 59,119 (7.1 percent) short indels were novel. We also detected 30,171 structural variants of which 27,604 (91.5 percent) were large indels. There were 6,681 large indels in the range 0.1-100 kbp occurring in the child genome that were also confirmed in either the father or mother genome. We compared these large indels against the DGV database and found that 1,499 (22.44 percent) were KHV specific. De novo assembly of high-quality unmapped reads yielded 789 contigs with the length greater than or equal to 300 bp. There were 235 contigs from the child genome of which 199 (84.7 percent) were significantly matched with at least one contig from the father or mother genome. Blasting these 199 contigs against other alternative human genomes revealed 4 novel contigs. The novel variants identified from our study demonstrated the necessity of conducting more genome-wide studies not only for Kinh but also for other ethnic groups in Vietnam.

Assuntos

Etnicidade/genética , Genoma Humano/genética , Povo Asiático/genética , Sequência de Bases , DNA/análise , DNA/genética , Família , Humanos , Mutação INDEL/genética , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA , Vietnã

18.

pIQPNNI: parallel reconstruction of large maximum likelihood phylogenies.

Minh, Bui Quang; Vinh, Le Sy; von Haeseler, Arndt; Schmidt, Heiko A.

Bioinformatics ; 21(19): 3794-6, 2005 Oct 01.

Artigo em Inglês | MEDLINE | ID: mdl-16046495

RESUMO

SUMMARY: IQPNNI is a program to infer maximum-likelihood phylogenetic trees from DNA or protein data with a large number of sequences. We present an improved and MPI-parallel implementation showing very good scaling and speed-up behavior.

Assuntos

Algoritmos , Metodologias Computacionais , Evolução Molecular , Modelos Genéticos , Filogenia , Alinhamento de Sequência/métodos , Análise de Sequência de DNA/métodos , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Animais , Sequência de Bases , Humanos , Funções Verossimilhança , Dados de Sequência Molecular

19.

IQPNNI: moving fast through tree space and stopping in time.

Vinh, Le Sy; Von Haeseler, Arndt.

Mol Biol Evol ; 21(8): 1565-71, 2004 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-15163768

RESUMO

An efficient tree reconstruction method (IQPNNI) is introduced to reconstruct a phylogenetic tree based on DNA or amino acid sequence data. Our approach combines various fast algorithms to generate a list of potential candidate trees. The key ingredient is the definition of so-called important quartets (IQs), which allow the computation of an intermediate tree in O(n(2)) time for n sequences. The resulting tree is then further optimized by applying the nearest neighbor interchange (NNI) operation. Subsequently a random fraction of the sequences is deleted from the best tree found so far. The deleted sequences are then re-inserted in the smaller tree using the important quartet puzzling (IQP) algorithm. These steps are repeated several times and the best tree, with respect to the likelihood criterion, is considered as the inferred phylogenetic tree. Moreover, we suggest a rule which indicates when to stop the search. Simulations show that IQPNNI gives a slightly better accuracy than other programs tested. Moreover, we applied the approach to 218 small subunit rRNA sequences and 500 rbcL sequences. We found trees with higher likelihood compared to the results by others. A program to reconstruct DNA or amino acid based phylogenetic trees is available online (http://www.bi.uni-duesseldorf.de/software/iqpnni).

Assuntos

Algoritmos , Evolução Molecular , Modelos Genéticos , Filogenia , Sequência de Aminoácidos , Animais , Sequência de Bases , Humanos , Dados de Sequência Molecular , Análise de Sequência de DNA/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA