Pesquisa | Secretaria de Estado da Saúde

1.

TEMPROT: protein function annotation using transformers embeddings and homology search.

Oliveira, Gabriel B; Pedrini, Helio; Dias, Zanoni.

BMC Bioinformatics ; 24(1): 242, 2023 Jun 08.

Artigo em Inglês | MEDLINE | ID: mdl-37291492

RESUMO

BACKGROUND: Although the development of sequencing technologies has provided a large number of protein sequences, the analysis of functions that each one plays is still difficult due to the efforts of laboratorial methods, making necessary the usage of computational methods to decrease this gap. As the main source of information available about proteins is their sequences, approaches that can use this information, such as classification based on the patterns of the amino acids and the inference based on sequence similarity using alignment tools, are able to predict a large collection of proteins. The methods available in the literature that use this type of feature can achieve good results, however, they present restrictions of protein length as input to their models. In this work, we present a new method, called TEMPROT, based on the fine-tuning and extraction of embeddings from an available architecture pre-trained on protein sequences. We also describe TEMPROT+, an ensemble between TEMPROT and BLASTp, a local alignment tool that analyzes sequence similarity, which improves the results of our former approach. RESULTS: The evaluation of our proposed classifiers with the literature approaches has been conducted on our dataset, which was derived from CAFA3 challenge database. Both TEMPROT and TEMPROT+ achieved competitive results on [Formula: see text], [Formula: see text], AuPRC and IAuPRC metrics on Biological Process (BP), Cellular Component (CC) and Molecular Function (MF) ontologies compared to state-of-the-art models, with the main results equal to 0.581, 0.692 and 0.662 of [Formula: see text] on BP, CC and MF, respectively. CONCLUSIONS: The comparison with the literature showed that our model presented competitive results compared the state-of-the-art approaches considering the amino acid sequence pattern recognition and homology analysis. Our model also presented improvements related to the input size that the model can use to train compared to the literature methods.

Assuntos

Aminoácidos , Proteínas , Proteínas/química , Anotação de Sequência Molecular , Sequência de Aminoácidos , Aminas

2.

Ensemble of Template-Free and Template-Based Classifiers for Protein Secondary Structure Prediction.

de Oliveira, Gabriel Bianchin; Pedrini, Helio; Dias, Zanoni.

Int J Mol Sci ; 22(21)2021 Oct 23.

Artigo em Inglês | MEDLINE | ID: mdl-34768880

RESUMO

Protein secondary structures are important in many biological processes and applications. Due to advances in sequencing methods, there are many proteins sequenced, but fewer proteins with secondary structures defined by laboratory methods. With the development of computer technology, computational methods have (started to) become the most important methodologies for predicting secondary structures. We evaluated two different approaches to this problem-driven by the recent results obtained by computational methods in this task-(i) template-free classifiers, based on machine learning techniques; and (ii) template-based classifiers, based on searching tools. Both approaches are formed by different sub-classifiers-six for template-free and two for template-based, each with a specific view of the protein. Our results show that these ensembles improve the results of each approach individually.

Assuntos

Biologia Computacional/métodos , Estrutura Secundária de Proteína , Proteínas/química , Algoritmos , Bases de Dados de Proteínas , Aprendizado de Máquina , Redes Neurais de Computação , Conformação Proteica , Software

3.

Sorting by weighted inversions considering length and symmetry.

Baudet, Christian; Dias, Ulisses; Dias, Zanoni.

BMC Bioinformatics ; 16 Suppl 19: S3, 2015.

Artigo em Inglês | MEDLINE | ID: mdl-26695591

RESUMO

Large-scale mutational events that occur when stretches of DNA sequence move throughout genomes are called genome rearrangements. In bacteria, inversions are one of the most frequently observed rearrangements. In some bacterial families, inversions are biased in favor of symmetry as shown by recent research. In addition, several results suggest that short segment inversions are more frequent in the evolution of microbial genomes. Despite the fact that symmetry and length of the reversed segments seem very important, they have not been considered together in any problem in the genome rearrangement field. Here, we define the problem of sorting genomes (or permutations) using inversions whose costs are assigned based on their lengths and asymmetries. We consider two formulations of the same problem depending on whether we know the orientation of the genes. Several procedures are presented and we assess these procedure performances on a large set of more than 4.4 × 109 permutations. The ideas presented in this paper provide insights to solve the problem and set the stage for a proper theoretical analysis.

Assuntos

Inversão Cromossômica/genética , Biologia Computacional/métodos , Algoritmos , Bactérias/genética , Quebra Cromossômica , Genoma , Fatores de Tempo

4.

Genome Rearrangement Distance With a Flexible Intergenic Regions Aspect.

Brito, Klairton Lima; Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1641-1653, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-35385387

RESUMO

Most mathematical models for genome rearrangement problems have considered only gene order. In this way, the rearrangement distance considering some set of events, such as reversal and transposition events, is commonly defined as the minimum number of rearrangement events that transform the gene order from a genome G1 into the gene order from a genome G2. Recent works initiate incorporating more information such as the sizes of the intergenic regions (i.e., number of nucleotides between pairs of consecutive genes), which yields good results for estimated distances on real data. In these models, besides transforming the gene order, the sequence of rearrangement events must transform the list of intergenic regions sizes from G1 into the list of intergenic regions sizes from G2 (target list). We study a new variation where the target list is flexible, in the sense that each target intergenic region size is in a range of acceptable values. This allows us to model scenarios where the main objective is still to transform the order of genes from the source genome into the target genome, allowing flexibility in the sizes of the intergenic regions, since the nucleotides in these regions tend to undergo more changes when compared to genes. We investigate the rearrangement distance considering three sets of events, two with the exclusive use of reversals or transpositions, and the other allowing both rearrangement events. We present approximation algorithms for the problems and an NP-hardness proof. Our results rely on the Flexible Weighted Cycle Graph, adapted from the breakpoint graph to deal with flexible intergenic regions sizes.

Assuntos

Rearranjo Gênico , Genômica , Genômica/métodos , Rearranjo Gênico/genética , Genoma , Algoritmos , Nucleotídeos , Modelos Genéticos

5.

Reversal and Indel Distance With Intergenic Region Information.

Alexandrino, Alexsandro Oliveira; Brito, Klairton Lima; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 1628-1640, 2023.

Artigo em Inglês | MEDLINE | ID: mdl-36260590

RESUMO

Recent works on genome rearrangements have shown that incorporating intergenic region information along with gene order in models provides better estimations for the rearrangement distance than using gene order alone. The reversal distance is one of the main problems in genome rearrangements. It has a polynomial time algorithm when only gene order is used to model genomes, assuming that repeated genes do not exist and that gene orientation is known, even when the genomes have distinct gene sets. The reversal distance is NP-hard and has a 2-approximation algorithm when incorporating intergenic regions. However, the problem has only been studied assuming genomes with the same set of genes. In this work, we consider the variation that incorporates intergenic regions and that allows genomes to have distinct sets of genes, a scenario that leads us to include indels operations (insertions and deletions). We present a 2.5-approximation algorithm using the labeled intergenic breakpoint graph, which is based on the well-known breakpoint graph structure. We also present an experimental analysis of the proposed algorithm using simulated data, which showed that the practical approximation factor is considerably less than 2.5. Furthermore, we used the algorithm in real genomes to construct a phylogenetic tree.

Assuntos

Genoma , Modelos Genéticos , Filogenia , Mutação INDEL/genética , Rearranjo Gênico , Algoritmos

6.

Reversal and Transposition Distance on Unbalanced Genomes Using Intergenic Information.

Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Jean, Géraldine; Fertin, Guillaume; Dias, Ulisses; Dias, Zanoni.

J Comput Biol ; 30(8): 861-876, 2023 08.

Artigo em Inglês | MEDLINE | ID: mdl-37222724

RESUMO

The most common way to calculate the rearrangement distance between two genomes is to use the size of a minimum length sequence of rearrangements that transforms one of the two given genomes into the other, where the genomes are represented as permutations using only their gene order, based on the assumption that genomes have the same gene content. With the advance of research in genome rearrangements, new works extended the classical models by either considering genomes with different gene content (unbalanced genomes) or including more genomic characteristics to the mathematical representation of the genomes, such as the distribution of intergenic regions sizes. In this study, we study the Reversal, Transposition, and Indel (Insertion and Deletion) Distance using intergenic information, which allows comparing unbalanced genomes, because indels are included in the rearrangement model (i.e., the set of possible rearrangements allowed when we compute the distance). For the particular case of transpositions and indels on unbalanced genomes, we present a 4-approximation algorithm, improving a previous 4.5 approximation. This algorithm is extended so as to deal with gene orientation and to maintain the 4-approximation factor for the Reversal, Transposition, and Indel Distance on unbalanced genomes. Furthermore, we evaluate the proposed algorithms using experiments on simulated data.

Assuntos

Rearranjo Gênico , Modelos Genéticos , Genoma/genética , Genômica , Mutação INDEL , Algoritmos

7.

Rearrangement distance with reversals, indels, and moves in intergenic regions on signed and unsigned permutations.

Brito, Klairton Lima; Oliveira, Andre Rodrigues; Alexandrino, Alexsandro Oliveira; Dias, Ulisses; Dias, Zanoni.

J Bioinform Comput Biol ; 21(2): 2350009, 2023 04.

Artigo em Inglês | MEDLINE | ID: mdl-37104034

RESUMO

Genome rearrangement events are widely used to estimate a minimum-size sequence of mutations capable of transforming a genome into another. The length of this sequence is called distance, and determining it is the main goal in genome rearrangement distance problems. Problems in the genome rearrangement field differ regarding the set of rearrangement events allowed and the genome representation. In this work, we consider the scenario where the genomes share the same set of genes, gene orientation is known or unknown, and intergenic regions (structures between a pair of genes and at the extremities of the genome) are taken into account. We use two models, the first model allows only conservative events (reversals and moves), and the second model includes non-conservative events (insertions and deletions) in the intergenic regions. We show that both models result in NP-hard problems no matter if gene orientation is known or unknown. When the information regarding the orientation of genes is available, we present for both models an approximation algorithm with a factor of 2. For the scenario where this information is unavailable, we propose a 4-approximation algorithm for both models.

Assuntos

Rearranjo Gênico , Modelos Genéticos , DNA Intergênico/genética , Genoma , Mutação , Algoritmos

8.

SIS: a program to generate draft genome sequence scaffolds for prokaryotes.

Dias, Zanoni; Dias, Ulisses; Setubal, João C.

BMC Bioinformatics ; 13: 96, 2012 May 14.

Artigo em Inglês | MEDLINE | ID: mdl-22583530

RESUMO

BACKGROUND: Decreasing costs of DNA sequencing have made prokaryotic draft genome sequences increasingly common. A contig scaffold is an ordering of contigs in the correct orientation. A scaffold can help genome comparisons and guide gap closure efforts. One popular technique for obtaining contig scaffolds is to map contigs onto a reference genome. However, rearrangements that may exist between the query and reference genomes may result in incorrect scaffolds, if these rearrangements are not taken into account. Large-scale inversions are common rearrangement events in prokaryotic genomes. Even in draft genomes it is possible to detect the presence of inversions given sufficient sequencing coverage and a sufficiently close reference genome. RESULTS: We present a linear-time algorithm that can generate a set of contig scaffolds for a draft genome sequence represented in contigs given a reference genome. The algorithm is aimed at prokaryotic genomes and relies on the presence of matching sequence patterns between the query and reference genomes that can be interpreted as the result of large-scale inversions; we call these patterns inversion signatures. Our algorithm is capable of correctly generating a scaffold if at least one member of every inversion signature pair is present in contigs and no inversion signatures have been overwritten in evolution. The algorithm is also capable of generating scaffolds in the presence of any kind of inversion, even though in this general case there is no guarantee that all scaffolds in the scaffold set will be correct. We compare the performance of sis, the program that implements the algorithm, to seven other scaffold-generating programs. The results of our tests show that sis has overall better performance. CONCLUSIONS: sis is a new easy-to-use tool to generate contig scaffolds, available both as stand-alone and as a web server. The good performance of sis in our tests adds evidence that large-scale inversions are widespread in prokaryotic genomes.

Assuntos

Algoritmos , Mapeamento Cromossômico/métodos , Genômica/métodos , Software , Biologia Computacional/métodos , Genoma , Células Procarióticas , Análise de Sequência de DNA/métodos

9.

Labeled Cycle Graph for Transposition and Indel Distance.

Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Comput Biol ; 29(3): 243-256, 2022 03.

Artigo em Inglês | MEDLINE | ID: mdl-34724796

RESUMO

In the comparative genomics field, one way to infer the evolutionary distance between two organisms of related species is by finding the minimum number of large-scale mutations, called genome rearrangements, that transform one genome into the other. This number is referred to as the rearrangement distance. Since problems in this area emerged in the mid-1990s, several genome rearrangements have been proposed. Rearrangements that do not alter the genome content are called conservative, and in this group we have the following: the reversal, which inverts a segment of the genome; the transposition, which exchanges two consecutive segments; and the double cut and join, which cuts two different pairs of adjacent blocks and joins them differently. Seminal works compared genomes sharing the same set of conserved blocks, but nowadays, researchers started looking at genomes with unequal gene content, by allowing the use of nonconservative rearrangements such as insertion and deletion (jointly called indel). The transposition distance and the transposition and indel distance are both NP-hard. We investigate the transposition and indel distance and present a structure called labeled cycle graph, representing an instance of rearrangement distance problems for genomes with unequal gene content. This structure is used to devise a lower bound and a 2-approximation algorithm for the transposition and indel distance.

Assuntos

Genoma , Mutação INDEL , Algoritmos , Rearranjo Gênico , Genômica , Modelos Genéticos

10.

Cassis: detection of genomic rearrangement breakpoints.

Baudet, Christian; Lemaitre, Claire; Dias, Zanoni; Gautier, Christian; Tannier, Eric; Sagot, Marie-France.

Bioinformatics ; 26(15): 1897-8, 2010 Aug 01.

Artigo em Inglês | MEDLINE | ID: mdl-20576622

RESUMO

SUMMARY: Genomes undergo large structural changes that alter their organization. The chromosomal regions affected by these rearrangements are called breakpoints, while those which have not been rearranged are called synteny blocks. Lemaitre et al. presented a new method to precisely delimit rearrangement breakpoints in a genome by comparison with the genome of a related species. Receiving as input a list of one2one orthologous genes found in the genomes of two species, the method builds a set of reliable and non-overlapping synteny blocks and refines the regions that are not contained into them. Through the alignment of each breakpoint sequence against its specific orthologous sequences in the other species, we can look for weak similarities inside the breakpoint, thus extending the synteny blocks and narrowing the breakpoints. The identification of the narrowed breakpoints relies on a segmentation algorithm and is statistically assessed. Here, we present the package Cassis that implements this method of precise detection of genomic rearrangement breakpoints. AVAILABILITY: Perl and R scripts are freely available for download at http://pbil.univ-lyon1.fr/software/Cassis/. Documentation with methodological background, technical aspects, download and setup instructions, as well as examples of applications are available together with the package. The package was tested on Linux and Mac OS environments and is distributed under the GNU GPL License.

Assuntos

Pontos de Quebra do Cromossomo , Biologia Computacional/métodos , Genoma/genética , Recombinação Genética , Software , Algoritmos , Cromossomos/genética , Sintenia

11.

Approximation algorithm for rearrangement distances considering repeated genes and intergenic regions.

Siqueira, Gabriel; Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Zanoni.

Algorithms Mol Biol ; 16(1): 21, 2021 Oct 13.

Artigo em Inglês | MEDLINE | ID: mdl-34645469

RESUMO

The rearrangement distance is a method to compare genomes of different species. Such distance is the number of rearrangement events necessary to transform one genome into another. Two commonly studied events are the transposition, which exchanges two consecutive blocks of the genome, and the reversal, which reverts a block of the genome. When dealing with such problems, seminal works represented genomes as sequences of genes without repetition. More realistic models started to consider gene repetition or the presence of intergenic regions, sequences of nucleotides between genes and in the extremities of the genome. This work explores the transposition and reversal events applied in a genome representation considering both gene repetition and intergenic regions. We define two problems called Minimum Common Intergenic String Partition and Reverse Minimum Common Intergenic String Partition. Using a relation with these two problems, we show a [Formula: see text]-approximation for the Intergenic Transposition Distance, the Intergenic Reversal Distance, and the Intergenic Reversal and Transposition Distance problems, where k is the maximum number of copies of a gene in the genomes. Our practical experiments on simulated genomes show that the use of partitions improves the estimates for the distances.

12.

Heuristics for Genome Rearrangement Distance With Replicated Genes.

Siqueira, Gabriel; Brito, Klairton Lima; Dias, Ulisses; Dias, Zanoni.

IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2094-2108, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-34232886

RESUMO

In comparative genomics, one goal is to find similarities between genomes of different organisms. Comparisons using genome features like genes, gene order, and regulatory sequences are carried out with this purpose in mind. Genome rearrangements are mutational events that affect large extensions of the genome. They are responsible for creating extant species with conserved genes in different positions across genomes. Close species - from an evolutionary point of view - tend to have the same set of genes or share most of them. When we consider gene order to compare two genomes, it is possible to use a parsimony criterion to estimate how close the species are. We are interested in the shortest sequence of genome rearrangements capable of transforming one genome into the other, which is named rearrangement distance. Reversal is one of the most studied genome rearrangements events. This event acts in a segment of the genome, inverting the position and the orientation of genes in it. Transposition is another widely studied event. This event swaps the position of two consecutive segments of the genome. When the genome has no gene repetition, a common approach is to map it as a permutation such that each element represents a conserved block. When genomes have replicated genes, this mapping is usually performed using strings. The number of replicas depends on the organisms being compared, but in many scenarios, it tends to be small. In this work, we study the rearrangement distance between genomes with replicated genes considering that the orientation of genes is unknown. We present four heuristics for the problem of genome rearrangement distance with replicated genes. We carry out experiments considering the exclusive use of the reversals or transpositions events, as well as the version in which both events are allowed. We developed a database of simulated genomes and compared our results with other algorithms from the literature. The experiments showed that our heuristics with more sophisticated rules presented a better performance than the known algorithms to estimate the evolutionary distance between genomes with replicated genes. In order to validate the application of our algorithms in real data, we construct a phylogenetic tree based on the distance provided by our algorithm and compare it with a know tree from the literature.

Assuntos

Algoritmos , Rearranjo Gênico/genética , Genômica/métodos , Dosagem de Genes/genética , Heurística

13.

Incorporating intergenic regions into reversal and transposition distances with indels.

Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Bioinform Comput Biol ; 19(6): 2140011, 2021 12.

Artigo em Inglês | MEDLINE | ID: mdl-34775923

RESUMO

Problems in the genome rearrangement field are often formulated in terms of pairwise genome comparison: given two genomes [Formula: see text] and [Formula: see text], find the minimum number of genome rearrangements that may have occurred during the evolutionary process. This broad definition lacks at least two important considerations: the first being which features are extracted from genomes to create a useful mathematical model, and the second being which types of genome rearrangement events should be represented. Regarding the first consideration, seminal works in the genome rearrangement field solely used gene order to represent genomes as permutations of integer numbers, neglecting many important aspects like gene duplication, intergenic regions, and complex interactions between genes. Regarding the second consideration, some rearrangement events are widely studied such as reversals and transpositions. In this paper, we shed light on the first consideration and created a model that takes into account gene order and the number of nucleotides in intergenic regions. In addition, we consider events of reversals, transpositions, and indels (insertions and deletions) of genomic material. We present a 4-approximation algorithm for reversals and indels, a [Formula: see text]-approximation algorithm for transpositions and indels, and a 6-approximation for reversals, transpositions, and indels.

Assuntos

Genoma , Modelos Genéticos , Algoritmos , DNA Intergênico/genética , Rearranjo Gênico , Genômica

14.

Genome Rearrangement Distance with Reversals, Transpositions, and Indels.

Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Comput Biol ; 28(3): 235-247, 2021 03.

Artigo em Inglês | MEDLINE | ID: mdl-33085536

RESUMO

The rearrangement distance is a well-known problem in the field of comparative genomics. Given two genomes, the rearrangement distance is the minimum number of rearrangements in a set of allowed rearrangements (rearrangement model), which transforms one genome into the other. In rearrangement distance problems, a genome is modeled as a string, where each element represents a conserved region within the two genomes. When the orientation of the genes is known, it is represented by (plus or minus) signs assigned to the elements of the string. Two of the most studied rearrangements are reversals, which invert a segment of the genome, and transpositions, which exchange the relative positions of two adjacent segments of the genome. The first works in genome rearrangements considered that the genomes being compared had the same genetic material and that rearrangement events were restricted to reversals, transpositions, or both. El-Mabrouk extended the reversal model on signed strings to include the operations of insertion and deletion of segments in the genome, which allowed the comparison of genomes with different genetic material. Other studies also addressed this problem and, recently, this problem was proved to be solvable in polynomial time by Willing et al. For unsigned strings, we still observe a lack of results. That said, in this study we prove that computing the rearrangement distance for the following models is NP-Hard: reversals and indels on unsigned strings; transpositions and indels on unsigned strings; and reversals, transpositions, and indels on signed and unsigned strings. Along with the NP-hardness proofs, we present a 2-approximation algorithm for reversals on unsigned strings and 3-approximation algorithms for the other models.

Assuntos

Rearranjo Gênico/genética , Genoma/genética , Mutação INDEL/genética , Algoritmos , Genômica/métodos , Modelos Genéticos

15.

Reversals and transpositions distance with proportion restriction.

Brito, Klairton Lima; Alexandrino, Alexsandro Oliveira; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

J Bioinform Comput Biol ; 19(4): 2150013, 2021 08.

Artigo em Inglês | MEDLINE | ID: mdl-34162319

RESUMO

In the field of comparative genomics, one way of comparing two genomes is through the analysis of how they distinguish themselves based on a set of mutations called rearrangement events. When considering that genomes undergo different types of rearrangements, it can be assumed that some events are more common than others. To model this assumption, one can assign different weights to different events, where common events tend to cost less than others. However, this approach, called weighted, does not guarantee that the rearrangement assumed to be the most frequent will be also the most frequently returned by proposed algorithms. To overcome this issue, we investigate a new problem where we seek the shortest sequence of rearrangement events able to transform one genome into the other, with a restriction regarding the proportion between the events returned. Here, we consider two rearrangement events: reversal, that inverts the order and the orientation of the genes inside a segment of the genome, and transposition, that moves a segment of the genome to another position. We show the complexity of this problem for any desired proportion considering scenarios where the orientation of the genes is known or unknown. We also develop an approximation algorithm with a constant approximation factor for each scenario and, in particular, we describe an improved (asymptotic) approximation algorithm for the case where the gene orientation is known. At last, we present the experimental tests comparing the proposed algorithms with others from the literature for the version of the problem without the proportion restriction.

Assuntos

Genoma , Genômica , Algoritmos , Rearranjo Gênico , Modelos Genéticos , Mutação

16.

An improved approximation algorithm for the reversal and transposition distance considering gene order and intergenic sizes.

Brito, Klairton L; Oliveira, Andre R; Alexandrino, Alexsandro O; Dias, Ulisses; Dias, Zanoni.

Algorithms Mol Biol ; 16(1): 24, 2021 Dec 29.

Artigo em Inglês | MEDLINE | ID: mdl-34965857

RESUMO

BACKGROUND: In the comparative genomics field, one of the goals is to estimate a sequence of genetic changes capable of transforming a genome into another. Genome rearrangement events are mutations that can alter the genetic content or the arrangement of elements from the genome. Reversal and transposition are two of the most studied genome rearrangement events. A reversal inverts a segment of a genome while a transposition swaps two consecutive segments. Initial studies in the area considered only the order of the genes. Recent works have incorporated other genetic information in the model. In particular, the information regarding the size of intergenic regions, which are structures between each pair of genes and in the extremities of a linear genome. RESULTS AND CONCLUSIONS: In this work, we investigate the SORTING BY INTERGENIC REVERSALS AND TRANSPOSITIONS problem on genomes sharing the same set of genes, considering the cases where the orientation of genes is known and unknown. Besides, we explored a variant of the problem, which generalizes the transposition event. As a result, we present an approximation algorithm that guarantees an approximation factor of 4 for both cases considering the reversal and transposition (classic definition) events, an improvement from the 4.5-approximation previously known for the scenario where the orientation of the genes is unknown. We also present a 3-approximation algorithm by incorporating the generalized transposition event, and we propose a greedy strategy to improve the performance of the algorithms. We performed practical tests adopting simulated data which indicated that the algorithms, in both cases, tend to perform better when compared with the best-known algorithms for the problem. Lastly, we conducted experiments using real genomes to demonstrate the applicability of the algorithms.

17.

Sorting Permutations by Intergenic Operations.

Oliveira, Andre Rodrigues; Jean, Geraldine; Fertin, Guillaume; Brito, Klairton Lima; Dias, Ulisses; Dias, Zanoni.

IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2080-2093, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-33945484

RESUMO

Genome Rearrangements are events that affect large stretches of genomes during evolution. Many mathematical models have been used to estimate the evolutionary distance between two genomes based on genome rearrangements. However, most of them focused on the (order of the) genes of a genome, disregarding other important elements in it. Recently, researchers have shown that considering regions between each pair of genes, called intergenic regions, can enhance distance estimation in realistic data. Two of the most studied genome rearrangements are the reversal, which inverts a sequence of genes, and the transposition, which occurs when two adjacent gene sequences swap their positions inside the genome. In this work, we study the transposition distance between two genomes, but we also consider intergenic regions, a problem we name Sorting by Intergenic Transpositions. We show that this problem is NP-hard and propose two approximation algorithms, with factors 3.5 and 2.5, considering two distinct definitions for the problem. We also investigate the signed reversal and transposition distance between two genomes considering their intergenic regions. This second problem is called Sorting by Signed Intergenic Reversals and Intergenic Transpositions. We show that this problem is NP-hard and develop two approximation algorithms, with factors 3 and 2.5. We check how these algorithms behave when assigning weights for genome rearrangements. Finally, we implemented all these algorithms and tested them on real and simulated data.

Assuntos

Algoritmos , Rearranjo Gênico/genética , Genoma/genética , Genômica/métodos , Elementos de DNA Transponíveis/genética , DNA Intergênico/genética , Análise de Sequência de DNA

18.

Sorting Signed Permutations by Intergenic Reversals.

Oliveira, Andre Rodrigues; Jean, Geraldine; Fertin, Guillaume; Brito, Klairton Lima; Bulteau, Laurent; Dias, Ulisses; Dias, Zanoni.

IEEE/ACM Trans Comput Biol Bioinform ; 18(6): 2870-2876, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-32396097

RESUMO

Genome rearrangements are mutations affecting large portions of a genome, and a reversal is one of the most studied genome rearrangements in the literature through the Sorting by Reversals (SbR) problem. SbR is solvable in polynomial time on signed permutations (i.e., the gene orientation is known), and it is NP-hard on unsigned permutations. This problem (and many others considering genome rearrangements) models genome as a list of its genes in the order they appear, ignoring all other information present in the genome. Recent works claimed that the incorporation of the size of intergenic regions, i.e., sequences of nucleotides between genes, may result in better estimators for the real distance between genomes. Here we introduce the Sorting Signed Permutations by Intergenic Reversals problem, that sorts a signed permutation using reversals both on gene order and intergenic sizes. We show that this problem is NP-hard by a reduction from the 3-partition problem. Then, we propose a 2-approximation algorithm for it. Finally, we also incorporate intergenic indels (i.e., insertions or deletions of intergenic regions) to overcome a limitation of sorting by conservative events (such as reversals) and propose two approximation algorithms.

Assuntos

DNA Intergênico/genética , Rearranjo Gênico/genética , Genômica/legislação & jurisprudência , Algoritmos , Mutação INDEL/genética , Modelos Genéticos , Mutação/genética

19.

Sorting permutations by fragmentation-weighted operations.

Alexandrino, Alexsandro Oliveira; Lintzmayer, Carla Negri; Dias, Zanoni.

J Bioinform Comput Biol ; 18(2): 2050006, 2020 04.

Artigo em Inglês | MEDLINE | ID: mdl-32326802

RESUMO

One of the main problems in Computational Biology is to find the evolutionary distance among species. In most approaches, such distance only involves rearrangements, which are mutations that alter large pieces of the species' genome. When we represent genomes as permutations, the problem of transforming one genome into another is equivalent to the problem of Sorting Permutations by Rearrangement Operations. The traditional approach is to consider that any rearrangement has the same probability to happen, and so, the goal is to find a minimum sequence of operations which sorts the permutation. However, studies have shown that some rearrangements are more likely to happen than others, and so a weighted approach is more realistic. In a weighted approach, the goal is to find a sequence which sorts the permutations, such that the cost of that sequence is minimum. This work introduces a new type of cost function, which is related to the amount of fragmentation caused by a rearrangement. We present some results about the lower and upper bounds for the fragmentation-weighted problems and the relation between the unweighted and the fragmentation-weighted approach. Our main results are 2-approximation algorithms for five versions of this problem involving reversals and transpositions. We also give bounds for the diameters concerning these problems and provide an improved approximation factor for simple permutations considering transpositions.

Assuntos

Algoritmos , Biologia Computacional/métodos , Genoma , Genômica/métodos , Rearranjo Gênico , Mutação , Probabilidade

20.

Heuristics for the Reversal and Transposition Distance Problem.

Brito, Klairton Lima; Oliveira, Andre Rodrigues; Dias, Ulisses; Dias, Zanoni.

IEEE/ACM Trans Comput Biol Bioinform ; 17(1): 2-13, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31603793

RESUMO

We present three heuristics-Sliding Window, Look Ahead, and Iterative Sliding Window-to improve solutions for the Sorting Signed Permutations by Reversals and Transpositions Problem. We investigate the classical version of the problem as well as versions restricted to prefix and prefix or suffix operations. To assess the heuristics based on its improvement, we implemented algorithms described in the literature to provide initial solutions. Although we have a limited number of problems, these heuristics can be applied to many others within the area of genome rearrangement. When time is a crucial factor, Sliding Window is a better choice because it runs in linear time. If the quality of the solution is a priority, Look Ahead should be preferred. Iterative Sliding Window is the most flexible heuristic and allows us to find a trade-off for specific scenarios where running time and solution quality are relevant.

Assuntos

Biologia Computacional/métodos , Rearranjo Gênico/genética , Heurística , Algoritmos , Modelos Genéticos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

Detalhe da pesquisa