Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 48
Filtrar
1.
Brief Bioinform ; 25(5)2024 Jul 25.
Artigo em Inglês | MEDLINE | ID: mdl-39120645

RESUMO

Predicting the strength of promoters and guiding their directed evolution is a crucial task in synthetic biology. This approach significantly reduces the experimental costs in conventional promoter engineering. Previous studies employing machine learning or deep learning methods have shown some success in this task, but their outcomes were not satisfactory enough, primarily due to the neglect of evolutionary information. In this paper, we introduce the Chaos-Attention net for Promoter Evolution (CAPE) to address the limitations of existing methods. We comprehensively extract evolutionary information within promoters using merged chaos game representation and process the overall information with modified DenseNet and Transformer structures. Our model achieves state-of-the-art results on two kinds of distinct tasks related to prokaryotic promoter strength prediction. The incorporation of evolutionary information enhances the model's accuracy, with transfer learning further extending its adaptability. Furthermore, experimental results confirm CAPE's efficacy in simulating in silico directed evolution of promoters, marking a significant advancement in predictive modeling for prokaryotic promoter strength. Our paper also presents a user-friendly website for the practical implementation of in silico directed evolution on promoters. The source code implemented in this study and the instructions on accessing the website can be found in our GitHub repository https://github.com/BobYHY/CAPE.


Assuntos
Aprendizado Profundo , Regiões Promotoras Genéticas , Algoritmos , Evolução Molecular , Simulação por Computador , Dinâmica não Linear , Biologia Computacional/métodos
2.
J Mol Graph Model ; 132: 108835, 2024 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-39106629

RESUMO

MicroRNAs (miRNAs) are small, non-coding RNA molecules that regulate gene expression. Despite their relatively short length (about 21 nucleotides), they can regulate thousands of transcripts within a cell. Due to their low complementarity to targets, studying their activity and binding region preferences (3'UTR, 5'UTR, or CDS) is challenging. In this paper, we analyzed a set of human miRNAs to uncover their general patterns. We began with a sequence logo to verify conservation at specific positions. To discover long-range correlations, we employed chaos game representation (CGR) and genomatrix, methods that enable both graphical and analytical analysis of sequence sets and are well-established in bioinformatics. Our results showed that miRNAs exhibit strongly non-random and characteristic patterns. To incorporate physicochemical properties into the analysis, we applied the electron-ion interaction potential (EIIP) parameter. An important part of our study was to validate the division of miRNAs into two parts-seed and puzzle. The seed region is responsible for target binding, while the puzzle region likely interacts with the RISC complex. We estimated duplex binding energy within the 3'UTR, 5'UTR, and CDS regions using the miRanda tool. Based on the median energy distribution, we divided the miRNAs into two subsets, reflecting different patterns in chaos game representation. Interestingly, one subset displayed significant similarity to conserved and highly confidential miRNAs. Our results confirm the low complementarity of miRNA/mRNA interactions and support the functional division of miRNA structure. Additionally, we present findings related to the localization of transcript target sites, which form the basis for further analyses.


Assuntos
Regiões 3' não Traduzidas , MicroRNAs , MicroRNAs/genética , MicroRNAs/química , Humanos , Regiões 5' não Traduzidas , Biologia Computacional/métodos , Termodinâmica , Sequência de Bases , Sítios de Ligação
3.
Theory Biosci ; 143(3): 183-193, 2024 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-38807013

RESUMO

Cervical cancer is one of the most severe threats to women worldwide and holds fourth rank in lethality. It is estimated that 604, 127 cervical cancer cases have been reported in 2020 globally. With advancements in high throughput technologies and bioinformatics, several cervical candidate genes have been proposed for better therapeutic strategies. In this paper, we intend to prioritize the candidate genes that are involved in cervical cancer progression through a fractal time series-based cross-correlations approach. we apply the chaos game representation theory combining a two-dimensional multifractal detrended cross-correlations approach among the known and candidate genes involved in cervical cancer progression to prioritize the candidate genes. We obtained 16 candidate genes that showed cross-correlation with known cancer genes. Functional enrichment analysis of the candidate genes shows that they involve GO terms: biological processes, cell-cell junction assembly, cell-cell junction organization, regulation of cell shape, cortical actin cytoskeleton organization, and actomyosin structure organization. KEGG pathway analysis revealed genes' role in Rap1 signaling pathway, ErbB signaling pathway, MAPK signaling pathway, PI3K-Akt signaling pathway, mTOR signaling pathway, Acute myeloid leukemia, chronic myeloid leukemia, Breast cancer, Thyroid cancer, Bladder cancer, and Gastric cancer. Further, we performed survival analysis and prioritized six genes CDH2, PAIP1, BRAF, EPB41L3, OSMR, and RUNX1 as potential candidate genes for cervical cancer that has a crucial role in tumor progression. We found that our study through this integrative approach an efficient tool and paved a new way to prioritize the candidate genes and these genes could be evaluated experimentally for potential validation. We suggest this may be useful in analyzing the nucleotide sequences and protein sequences for clustering, classification, class affiliation, etc.


Assuntos
Biologia Computacional , Fractais , Neoplasias do Colo do Útero , Feminino , Humanos , Neoplasias do Colo do Útero/genética , Biologia Computacional/métodos , Transdução de Sinais/genética , Teoria dos Jogos , Algoritmos , Regulação Neoplásica da Expressão Gênica , Dinâmica não Linear , Progressão da Doença , Redes Reguladoras de Genes
4.
Front Microbiol ; 15: 1339156, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38572227

RESUMO

Traditional alignment-based methods meet serious challenges in genome sequence comparison and phylogeny reconstruction due to their high computational complexity. Here, we propose a new alignment-free method to analyze the phylogenetic relationships (classification) among species. In our method, the dynamical language (DL) model and the chaos game representation (CGR) method are used to characterize the frequency information and the context information of k-mers in a sequence, respectively. Then for each DNA sequence or protein sequence in a dataset, our method converts the sequence into a feature vector that represents the sequence information based on CGR weighted by the DL model to infer phylogenetic relationships. We name our method CGRWDL. Its performance was tested on both DNA and protein sequences of 8 datasets of viruses to construct the phylogenetic trees. We compared the Robinson-Foulds (RF) distance between the phylogenetic tree constructed by CGRWDL and the reference tree by other advanced methods for each dataset. The results show that the phylogenetic trees constructed by CGRWDL can accurately classify the viruses, and the RF scores between the trees and the reference trees are smaller than that with other methods.

5.
Curr Issues Mol Biol ; 45(12): 10056-10078, 2023 Dec 13.
Artigo em Inglês | MEDLINE | ID: mdl-38132474

RESUMO

Two approaches to the synthesis of 2D binary identifiers ("fingerprints") of DNA-associated symbol sequences are considered in this paper. One of these approaches is based on the simulation of polarization-dependent diffraction patterns formed by reading the modeled DNA-associated 2D phase-modulating structures with a coherent light beam. In this case, 2D binarized distributions of close-to-circular extreme polarization states are applied as fingerprints of analyzed nucleotide sequences. The second approach is based on the transformation of the DNA-associated chaos game representation (CGR) maps into finite-dimensional binary matrices. In both cases, the differences between the structures of the analyzed and reference symbol sequences are quantified by calculating the correlation coefficient of the synthesized binary matrices. A comparison of the approaches under consideration is carried out using symbol sequences corresponding to nucleotide sequences of the hly gene from the vaccine and wild-type strains of Listeria monocytogenes as the analyzed objects. These strains differ in terms of the number of substituted nucleotides in relation to the vaccine strain selected as a reference. The results of the performed analysis allow us to conclude that the identification of structural differences in the DNA-associated symbolic sequences is significantly more efficient when using the binary distributions of close-to-circular extreme polarization states. The approach given can be applicable for genetic differentiation immunized from vaccinated animals (DIVA).

6.
Molecules ; 28(5)2023 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-36903531

RESUMO

The subcellular localization of messenger RNA (mRNA) precisely controls where protein products are synthesized and where they function. However, obtaining an mRNA's subcellular localization through wet-lab experiments is time-consuming and expensive, and many existing mRNA subcellular localization prediction algorithms need to be improved. In this study, a deep neural network-based eukaryotic mRNA subcellular location prediction method, DeepmRNALoc, was proposed, utilizing a two-stage feature extraction strategy that featured bimodal information splitting and fusing for the first stage and a VGGNet-like CNN module for the second stage. The five-fold cross-validation accuracies of DeepmRNALoc in the cytoplasm, endoplasmic reticulum, extracellular region, mitochondria, and nucleus were 0.895, 0.594, 0.308, 0.944, and 0.865, respectively, demonstrating that it outperforms existing models and techniques.


Assuntos
Aprendizado Profundo , Eucariotos , Eucariotos/metabolismo , Proteínas/metabolismo , Retículo Endoplasmático/metabolismo , RNA Mensageiro , Biologia Computacional/métodos
7.
Biology (Basel) ; 12(2)2023 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-36829597

RESUMO

Organisms are unique physical entities in which information is stored and continuously processed. The digital nature of DNA sequences enables the construction of a dynamic information reservoir. However, the distinction between the hardware and software components in the information flow is crucial to identify the mechanisms generating specific genomic signatures. In this work, we perform a bibliometric analysis to identify the different purposes of looking for particular patterns in DNA sequences associated with a given phenotype. This study has enabled us to make a conceptual breakdown of the genomic signature and differentiate the leading applications. On the one hand, it refers to gene expression profiling associated with a biological function, which may be shared across taxa. This signature is the focus of study in precision medicine. On the other hand, it also refers to characteristic patterns in species-specific DNA sequences. This interpretation plays a key role in comparative genomics, identifying evolutionary relationships. Looking at the relevant studies in our bibliographic database, we highlight the main factors causing heterogeneities in genome composition and how they can be quantified. All these findings lead us to reformulate some questions relevant to evolutionary biology.

8.
Front Cell Infect Microbiol ; 13: 1117421, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36779183

RESUMO

Introduction: The species diversity of microbiomes is a cutting-edge concept in metagenomic research. In this study, we propose a multifractal analysis for metagenomic research. Method and Results: Firstly, we visualized the chaotic game representation (CGR) of simulated metagenomes and real metagenomes. We find that metagenomes are visualized with self-similarity. Then we defined and calculated the multifractal dimension for the visualized plot of simulated and real metagenomes, respectively. By analyzing the Pearson correlation coefficients between the multifractal dimension and the traditional species diversity index, we obtain that the correlation coefficients between the multifractal dimension and the species richness index and Shannon diversity index reached the maximum value when q = 0, 1, and the correlation coefficient between the multifractal dimension and the Simpson diversity index reached the maximum value when q = 5. Finally, we apply our method to real metagenomes of the gut microbiota of 100 infants who are newborn and 4 and 12 months old. The results show that the multifractal dimensions of an infant's gut microbiomes can distinguish age differences. Conclusion and Discussion: There is self-similarity among the CGRs of WGS of metagenomes, and the multifractal spectrum is an important characteristic for metagenomes. The traditional diversity indicators can be unified under the framework of multifractal analysis. These results coincided with similar results in macrobial ecology. The multifractal spectrum of infants' gut microbiomes are related to the development of the infants.


Assuntos
Microbioma Gastrointestinal , Microbiota , Humanos , Lactente , Recém-Nascido , Metagenoma , Microbiota/genética , Microbioma Gastrointestinal/genética , Metagenômica/métodos , Ecologia
9.
Comput Geosci ; 27(1): 143-157, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36590642

RESUMO

This paper proposes a new method to approach earthquake hypocenter studies based on Chaos Game Representation (CGR), a method initially used for making fractal structures and applied for studying DNA sequences. Applying the CGR method, this study aims at checking whether any relation exists between earthquakes occurring in different depth ranges in a seismically active area. For this purpose, the seismically active areas around the Indian tectonic plate were used. The CGR images gave characteristic patterns, implying that the occurrence of earthquakes in some specific depth range combinations showed higher preference. Statistical data on the frequency of different depth range combinations were derived from these plots. We put forward a mathematical value which we call proximity index, to compare the similarity between two different CGR plots. Proximity index values were used to compare the similarity in seismic activity in two different zones by comparing their respective CGR plots. Supplementary Information: The online version contains supplementary material available at 10.1007/s10596-022-10187-x.

10.
Comput Econ ; 61(1): 57-68, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-34629755

RESUMO

We propose a novel approach to visualize and compare financial markets across the globe using chaos game representation (CGR) of iterated function systems (IFS). We modified a fractal method, widely used in life sciences, and applied it to study the effect of COVID-19 on global financial markets. This modified driven IFS approach is used to generate compact fractal portraits of the financial markets in form of percentage CGR (PC) plots and subtraction percentage (SP) plots. The markets over different periods are compared and the difference is quantified through a parameter called the proximity (Pr) index. The reaction of the financial market across the globe and volatility to the current pandemic of COVID-19 is studied and modeled successfully. The imminent bearish and a surprise bullish pattern of the financial markets across the world is revealed by this fractal method and provides a new tool to study financial markets.

11.
Gigascience ; 122022 12 28.
Artigo em Inglês | MEDLINE | ID: mdl-36576129

RESUMO

BACKGROUND: Since the beginning of the coronavirus disease 2019 pandemic, there has been an explosion of sequencing of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, making it the most widely sequenced virus in the history. Several databases and tools have been created to keep track of genome sequences and variants of the virus; most notably, the GISAID platform hosts millions of complete genome sequences, and it is continuously expanding every day. A challenging task is the development of fast and accurate tools that are able to distinguish between the different SARS-CoV-2 variants and assign them to a clade. RESULTS: In this article, we leverage the frequency chaos game representation (FCGR) and convolutional neural networks (CNNs) to develop an original method that learns how to classify genome sequences that we implement into CouGaR-g, a tool for the clade assignment problem on SARS-CoV-2 sequences. On a testing subset of the GISAID, CouGaR-g achieved an $96.29\%$ overall accuracy, while a similar tool, Covidex, obtained a $77,12\%$ overall accuracy. As far as we know, our method is the first using deep learning and FCGR for intraspecies classification. Furthermore, by using some feature importance methods, CouGaR-g allows to identify k-mers that match SARS-CoV-2 marker variants. CONCLUSIONS: By combining FCGR and CNNs, we develop a method that achieves a better accuracy than Covidex (which is based on random forest) for clade assignment of SARS-CoV-2 genome sequences, also thanks to our training on a much larger dataset, with comparable running times. Our method implemented in CouGaR-g is able to detect k-mers that capture relevant biological information that distinguishes the clades, known as marker variants. AVAILABILITY: The trained models can be tested online providing a FASTA file (with 1 or multiple sequences) at https://huggingface.co/spaces/BIASLab/sars-cov-2-classification-fcgr. CouGaR-g is also available at https://github.com/AlgoLab/CouGaR-g under the GPL.


Assuntos
COVID-19 , Aprendizado Profundo , Puma , Animais , SARS-CoV-2/genética , Puma/genética , Genoma Viral
12.
Comput Biol Med ; 151(Pt A): 106243, 2022 12.
Artigo em Inglês | MEDLINE | ID: mdl-36335814

RESUMO

Chaos game representation (CGR) has been successfully applied in bioinformatics for over 30 years. Since then, many further extensions were announced. Numerical encoding of biological sequences is especially convenient in the visualisation process, free-alignment methods and input preparation for machine learning techniques. The development and applications of CGR have embraced mainly linear nucleotide sequences. However, there were also some attempts to create a representation of proteins. The latter need to be more sophisticated, as arbitrary coordinates for amino acids do not reflect their properties which is crucial during the encoding process. In this paper, the authors summarised various variations of CGRs and their limitations. We began by studying the PROSITE motifs and showed the immense number of amino acid properties employed by different proteins. To this aim, we harnessed the Principal Component Analysis (PCA) and studied the relation between explained variance and the number of features that describe them. It appeared that even after many reductions, about 50 features are non-redundant. This was the reason we introduced an embedding concept from natural language processing which enables adjusting features for a given list of sequences. We presented a simple neural network architecture with one hidden layer and one neuron within it and showed it provides satisfactory results in phylogenetic tree construction in ND5 and SPARC protein cases. To this aim, we transformed CGR representations for all considered sequences using Discrete Fourier Transform (DFT) and applied Unweighted Pair Group Method with Arithmetic Mean (UPGMA) algorithm. Moreover, we indicated some similarities between CGR and Recurrent Neural Networks (RNN). In the end, we attempted to include information about the RNA secondary structure and defined some measures to validate biological significance. We studied their properties and showed on ALMV-3 example its usefulness.


Assuntos
Algoritmos , Biologia Computacional , Filogenia , Análise de Sequência de DNA/métodos , Proteínas/química , Aminoácidos
13.
Int J Mol Sci ; 23(3)2022 Feb 06.
Artigo em Inglês | MEDLINE | ID: mdl-35163771

RESUMO

The fractal characteristics of DNA sequences are studied using the frequency chaos game representation (FCGR) and small-angle scattering (SAS) technique. The FCGR allows representation of the frequencies of occurrence of k-mers (oligonucleotides of length k) in the form of images. The numerically encoded data are then used in a SAS analysis to enhance hidden features in DNA sequences. It is shown that the simulated SAS intensity allows us to obtain the fractal dimensions and scaling factors at various scales. These structural parameters can be used to distinguish unambiguously between the scaling properties of complex hierarchical DNA sequences. The validity of this approach is illustrated on several sequences from: Escherichia coli, Mouse mitochondrion, Homo sapiens mitochondrion and Human cosmid.


Assuntos
Cosmídeos/genética , Escherichia coli/genética , Mitocôndrias/genética , Análise de Sequência de DNA/métodos , Algoritmos , Animais , DNA Bacteriano/genética , DNA Mitocondrial/genética , Fractais , Humanos , Camundongos , Dinâmica não Linear , Espalhamento a Baixo Ângulo , Fatores de Tempo
14.
Expert Syst Appl ; 194: 116559, 2022 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-35095217

RESUMO

In this study, chaos game representation (CGR) is introduced for investigating the pattern of genome sequences. It is an image representation of the genome for the overall visualization of the sequence. The CGR representation is a mapping technique that assigns each sequence base into the respective position in the two-dimension plane to portray the DNA sequence. Importantly, CGR provides one to one mapping to nucleotides as well as sequence. A coordinate of the CGR plane can tell the corresponding base and its location in the original genome. Therefore, the whole nucleotide sequence (until the current nucleotide) can be restored from the one point of the CGR. In this study, CGR coupled with artificial neural network (ANN) is introduced as a new way to represent the genome and to classify intra-coronavirus sequences. A hierarchy clustering study is done to validate the approach and found to be more than 90% accurate while comparing the result with the phylogenetic tree of the corresponding genomes. Interestingly, the method makes the genome sequence significantly shorter (more than 99% compressed) saving the data space while preserving the genome features.

15.
Comput Struct Biotechnol J ; 19: 6263-6271, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34900136

RESUMO

Chaos game representation (CGR), a milestone in graphical bioinformatics, has become a powerful tool regarding alignment-free sequence comparison and feature encoding for machine learning. The algorithm maps a sequence to 2-dimensional space, while an extension of the CGR, the so-called frequency matrix representation (FCGR), transforms sequences of different lengths into equal-sized images or matrices. The CGR is a generalized Markov chain and includes various properties, which allow a unique representation of a sequence. Therefore, it has a broad spectrum of applications in bioinformatics, such as sequence comparison and phylogenetic analysis and as an encoding of sequences for machine learning. This review introduces the construction of CGRs and FCGRs, their applications on DNA and proteins, and gives an overview of recent applications and progress in bioinformatics.

16.
J Theor Biol ; 531: 110917, 2021 12 21.
Artigo em Inglês | MEDLINE | ID: mdl-34563550

RESUMO

Proteins encoded by genes are engaged in most of the processes within a cell. Typing a minimal set of genes required for survival is still a challenging task. Essential genes seem to be more conservative and are usually responsible for basic functions, for instance, genetic information flow or energy production. Despite persistent advances in experimental methods, computer predictions may constitute an important part of this investigation. Firstly, they may embrace a huge amount of data and provide some characteristic patterns. Furthermore, they enable scientists to build models for predicting essential genes which are not yet verified experimentally. Some papers indicate interesting dependencies within essential genes sequences using different computer models. In this paper, an author took a three-step analysis for a deeper understanding of the fundamentals of essential and non-essential genes. Beginning from a simple nucleotide composition and finishing at long-range correlations, presents some characteristic patterns that are expected to be developed in future studies.


Assuntos
Genes Essenciais , Teoria da Informação , Bactérias/genética , Simulação por Computador , Teoria dos Jogos , Genes Essenciais/genética , Proteínas
17.
Mitochondrion ; 60: 121-128, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34375735

RESUMO

We characterized the multifractality and power-law cross-correlation of mitochondrial genomes of various species through the recently developed method which combines the chaos game representation theory and 2D-multifractal detrended cross-correlation analysis. In the present paper, we analyzed 32 mitochondrial genomes of different species and the obtained results show that all the analyzed data exhibit multifractal nature and power-law cross-correlation behaviour. Further, we performed a cluster analysis from the calculated scaling exponents to identify the class affiliation and its outcome is represented as a dendrogram. We suggest that this integrative approach may help the researchers to understand the phylogeny of any kingdom with their varying genome lengths and also this approach may find applications in characterizing the protein sequences, mRNA sequences, next-generation sequencing, and drug development, etc.


Assuntos
Simulação por Computador , Teoria dos Jogos , Genoma Mitocondrial , Mitocôndrias/genética , Dinâmica não Linear , Animais , Saccharomyces cerevisiae/genética
18.
BMC Bioinformatics ; 22(Suppl 6): 129, 2021 Jun 02.
Artigo em Inglês | MEDLINE | ID: mdl-34078256

RESUMO

BACKGROUND: Nucleosome plays an important role in the process of genome expression, DNA replication, DNA repair and transcription. Therefore, the research of nucleosome positioning has invariably received extensive attention. Considering the diversity of DNA sequence representation methods, we tried to integrate multiple features to analyze its effect in the process of nucleosome positioning analysis. This process can also deepen our understanding of the theoretical analysis of nucleosome positioning. RESULTS: Here, we not only used frequency chaos game representation (FCGR) to construct DNA sequence features, but also integrated it with other features and adopted the principal component analysis (PCA) algorithm. Simultaneously, support vector machine (SVM), extreme learning machine (ELM), extreme gradient boosting (XGBoost), multilayer perceptron (MLP) and convolutional neural networks (CNN) are used as predictors for nucleosome positioning prediction analysis, respectively. The integrated feature vector prediction quality is significantly superior to a single feature. After using principal component analysis (PCA) to reduce the feature dimension, the prediction quality of H. sapiens dataset has been significantly improved. CONCLUSIONS: Comparative analysis and prediction on H. sapiens, C. elegans, D. melanogaster and S. cerevisiae datasets, demonstrate that the application of FCGR to nucleosome positioning is feasible, and we also found that integrative feature representation would be better.


Assuntos
Caenorhabditis elegans , Nucleossomos , Algoritmos , Animais , Caenorhabditis elegans/genética , Drosophila melanogaster/genética , Aprendizado de Máquina , Nucleossomos/genética , Saccharomyces cerevisiae/genética , Máquina de Vetores de Suporte
19.
J Mol Graph Model ; 107: 107942, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34058640

RESUMO

As a very important research direction in the field of bioinformatics, sequence alignment plays a vital role in the research and development of biology. Converting genome sequence to graph by using frequency chaos game representation (FCGR) is an excellent gene sequence mapping technology, which can store rich genetic information into FCGR graphics. To each FCGR image, we construct its perceptual image hashing (PIH) matrix using the bicubic interpolation zooming. The difference of the perceptual hash matrix of each two images is calculated, and the clustering distance of the corresponding two gene sequences is represented by the differentials of the perceptual hash matrix. In this paper, we aligned and analyzed several typical genome sequence datasets including mammalian mitochondrial genes, human immunodeficiency virus 1 (HIV-1) and hepatitis E virus (HEV) to build their evolutionary trees. Experimental results showed that our PIH combining FCGR method (FCGR-PIH) has similar classification accuracy to the classical Clustal W sequence alignment method. Furthermore, 25 complete mitochondrial DNA sequences of cichlid fishes and 27 Escherichia coli/Shigella full genome sequences were selected from the AFproject test platform for tests. The performance benchmark rankings demonstrate the effectiveness of the FCGR-PIH algorithm and its potential for large-scale genome sequence analysis.


Assuntos
Algoritmos , Biologia Computacional , Animais , Análise por Conglomerados , Humanos , Filogenia , Alinhamento de Sequência
20.
Genomics ; 113(3): 1428-1437, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33713823

RESUMO

Numerical representation of biological sequences plays an important role in bioinformatics and has many practical applications. One of the most popular approaches is the chaos game representation. In this paper, the authors propose a novel look into chaos game construction - an analytical description of this procedure. This type enables to build more general number sequences using different weight functions. The authors suggest three conditions that these functions should hold. Additionally, they present some criteria to compare them and check whether they provide a unique representation. One of the most important advantages of our approach is the possibility to construct such a description that is less sensitive to mutations and as a result, give more reliable values for free-alignment phylogenetic trees constructions. Finally, the authors applied the DFT method using four types of functions and compared the obtained results using the BLAST tool.


Assuntos
Algoritmos , Biologia Computacional , Mutação , Filogenia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA