Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 17 de 17
Filtrar
1.
Bull Math Biol ; 80(12): 3227-3246, 2018 12.
Artículo en Inglés | MEDLINE | ID: mdl-30288640

RESUMEN

Modellers of large-scale genome rearrangement events, in which segments of DNA are inverted, moved, swapped, or even inserted or deleted, have found a natural syntax in the language of permutations. Despite this, there has been a wide range of modelling choices, assumptions and interpretations that make navigating the literature a significant challenge. Indeed, even authors of papers that use permutations to model genome rearrangement can struggle to interpret each others' work, because of subtle differences in basic assumptions that are often deeply ingrained (and consequently sometimes not even mentioned). In this paper, we describe the different ways in which permutations have been used to model genomes and genome rearrangement events, presenting some features and limitations of each approach, and show how the various models are related. This paper will help researchers navigate the landscape of permutation-based genome rearrangement models and make it easier for authors to present clear and consistent models.


Asunto(s)
Reordenamiento Génico , Genómica , Modelos Genéticos , Algoritmos , Animales , Inversión Cromosómica , Mapeo Cromosómico , Evolución Molecular , Humanos , Conceptos Matemáticos , Filogenia
2.
BMC Bioinformatics ; 17(Suppl 14): 413, 2016 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-28185578

RESUMEN

BACKGROUND: During evolution, genomes are modified by large scale structural events, such as rearrangements, deletions or insertions of large blocks of DNA. Of particular interest, in order to better understand how this type of genomic evolution happens, is the reconstruction of ancestral genomes, given a phylogenetic tree with extant genomes at its leaves. One way of solving this problem is to assume a rearrangement model, such as Double Cut and Join (DCJ), and find a set of ancestral genomes that minimizes the number of events on the input tree. Since this problem is NP-hard for most rearrangement models, exact solutions are practical only for small instances, and heuristics have to be used for larger datasets. This type of approach can be called event-based. Another common approach is based on finding conserved structures between the input genomes, such as adjacencies between genes, possibly also assigning weights that indicate a measure of confidence or probability that this particular structure is present on each ancestral genome, and then finding a set of non conflicting adjacencies that optimize some given function, usually trying to maximize total weight and minimizing character changes in the tree. We call this type of methods homology-based. RESULTS: In previous work, we proposed an ancestral reconstruction method that combines homology- and event-based ideas, using the concept of intermediate genomes, that arise in DCJ rearrangement scenarios. This method showed better rate of correctly reconstructed adjacencies than other methods, while also being faster, since the use of intermediate genomes greatly reduces the search space. Here, we generalize the intermediate genome concept to genomes with unequal gene content, extending our method to account for gene insertions and deletions of any length. In many of the simulated datasets, our proposed method had better results than MLGO and MGRA, two state-of-the-art algorithms for ancestral reconstruction with unequal gene content, while running much faster, making it more scalable to larger datasets. CONCLUSION: Studing ancestral reconstruction problems under a new light, using the concept of intermediate genomes, allows the design of very fast algorithms by greatly reducing the solution search space, while also giving very good results. The algorithms introduced in this paper were implemented in an open-source software called RINGO (ancestral Reconstruction with INtermediate GenOmes), available at https://github.com/pedrofeijao/RINGO .


Asunto(s)
Algoritmos , Genoma , Modelos Genéticos , Evolución Molecular , Orden Génico , Internet , Interfaz Usuario-Computador
3.
BMC Bioinformatics ; 16 Suppl 14: S3, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26451811

RESUMEN

BACKGROUND: The problem of reconstructing ancestral genomes in a given phylogenetic tree arises in many different comparative genomics fields. Here, we focus on reconstructing the gene order of ancestral genomes, a problem that has been largely studied in the past 20 years, especially with the increasing availability of whole genome DNA sequences. There are two main approaches to this problem: event-based methods, that try to find the ancestral genomes that minimize the number of rearrangement events in the tree; and homology-based, that look for conserved structures, such as adjacent genes in the extant genomes, to build the ancestral genomes. RESULTS: We propose algorithms that use the concept of intermediate genomes, arising in optimal pairwise rearrangement scenarios. We show that intermediate genomes have combinatorial properties that make them easy to reconstruct, and develop fast algorithms with better reconstructed ancestral genomes than current event-based methods. The proposed framework is also designed to accept extra information, such as results from homology-based approaches, giving rise to combined algorithms with better results than the original methods.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Evolución Molecular , Orden Génico , Genoma , Modelos Genéticos , Filogenia , Animales , Genómica/métodos , Humanos
4.
BMC Bioinformatics ; 16 Suppl 19: S1, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26695008

RESUMEN

Finding the smallest sequence of operations to transform one genome into another is an important problem in comparative genomics. The breakpoint graph is a discrete structure that has proven to be effective in solving distance problems, and the number of cycles in a cycle decomposition of this graph is one of the remarkable parameters to help in the solution of related problems. For a fixed k, the number of linear unichromosomal genomes (signed or unsigned) with n elements such that the induced breakpoint graphs have k disjoint cycles, known as the Hultman number, has been already determined. In this work we extend these results to multichromosomal genomes, providing formulas to compute the number of multichromosal genomes having a fixed number of cycles and/or paths. We obtain an explicit formula for circular multichromosomal genomes and recurrences for general multichromosomal genomes, and discuss how these series can be used to calculate the distribution and expected value of the rearrangement distance between random genomes.


Asunto(s)
Rotura Cromosómica , Cromosomas/genética , Reordenamiento Génico , Algoritmos , ADN Circular/genética , Genoma , Tamaño del Genoma/genética
5.
J Hered ; 103(3): 342-8, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-22315242

RESUMEN

Cattle are divided into 2 groups referred to as taurine and indicine, both of which have been under strong artificial selection due to their importance for human nutrition. A side effect of this domestication includes a loss of genetic diversity within each specialized breed. Recently, the first taurine genome was sequenced and assembled, allowing for a better understanding of this ruminant species. However, genetic information from indicine breeds has been limited. Here, we present the first genome sequence of an indicine breed (Nellore) generated with 52X coverage by SOLiD sequencing platform. As expected, both genomes share high similarity at the nucleotide level for all autosomes and the X chromosome. Regarding the Y chromosome, the homology was considerably lower, most likely due to uncompleted assembly of the taurine Y chromosome. We were also able to cover 97% of the annotated taurine protein-coding genes.


Asunto(s)
Bovinos/genética , Genoma , Animales , Cromosomas de los Mamíferos/genética , Codón/genética , Mapeo Contig , Masculino , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico
6.
Curr Oncol ; 29(4): 2630-2643, 2022 04 11.
Artículo en Inglés | MEDLINE | ID: mdl-35448189

RESUMEN

Background: Despite meticulous surgery for non-small cell lung cancer (NSCLC), relapse is as high as 70% at 5 years. Many institutions do not conduct reflexive molecular testing on early stage specimens, although targeted gene therapy may extend life by years in the event of recurrence. This ultimately delays definitive treatment with additional biopsy risking suboptimal tissue acquisition and quality for molecular testing. Objective: To compare molecular profiles of genetic alterations in early and late NSCLC to provide evidence that reflexive molecular testing provides clinically valuable information. Methods: A single-center propensity matched retrospective analysis was conducted using prospectively collected data. Adults with early and late-stage NSCLC had tissue subject to targeted panel-based NGS. Frequencies of putative drivers were compared, with 1:3 matching on the propensity score; p < 0.05 deemed statistically significant. Results: In total, 635 NSCLC patients underwent NGS (59 early, 576 late); 276 (43.5%) females; age 70.9 (±10.2) years; never smokers 140 (22.0%); 527 (83.0%) adenocarcinomas. Unadjusted frequencies of EGFR mutations were higher in the early cohort (30% vs. 18%). Following adjustment for sex and smoking status, similar frequencies for both early and late NSCLC were observed for variants in EGFR, KRAS, ALK, MET, and ROS1. Conclusion: The frequency of clinically actionable variants in early and late-stage NSCLC was found to be similar, providing evidence that molecular profiling should be performed on surgical specimens. This pre-determined profile is essential to avoid treatment delay for patients who will derive clinical benefit from targeted systemic therapy, in the high likelihood of subsequent relapse.


Asunto(s)
Carcinoma de Pulmón de Células no Pequeñas , Neoplasias Pulmonares , Adulto , Anciano , Carcinoma de Pulmón de Células no Pequeñas/tratamiento farmacológico , Carcinoma de Pulmón de Células no Pequeñas/terapia , Receptores ErbB/genética , Femenino , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Neoplasias Pulmonares/tratamiento farmacológico , Neoplasias Pulmonares/terapia , Masculino , Análisis por Apareamiento , Recurrencia Local de Neoplasia , Proteínas Tirosina Quinasas/genética , Proteínas Proto-Oncogénicas/genética , Estudios Retrospectivos
7.
Algorithms Mol Biol ; 15: 8, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32391071

RESUMEN

BACKGROUND: In the field of genome rearrangement algorithms, models accounting for gene duplication lead often to hard problems. For example, while computing the pairwise distance is tractable in most duplication-free models, the problem is NP-complete for most extensions of these models accounting for duplicated genes. Moreover, problems involving more than two genomes, such as the genome median and the Small Parsimony problem, are intractable for most duplication-free models, with some exceptions, for example the Single-Cut-or-Join (SCJ) model. RESULTS: We introduce a variant of the SCJ distance that accounts for duplicated genes, in the context of directed evolution from an ancestral genome to a descendant genome where orthology relations between ancestral genes and their descendant are known. Our model includes two duplication mechanisms: single-gene tandem duplication and the creation of single-gene circular chromosomes. We prove that in this model, computing the directed distance and a parsimonious evolutionary scenario in terms of SCJ and single-gene duplication events can be done in linear time. We also show that the directed median problem is tractable for this distance, while the rooted median problem, where we assume that one of the given genomes is ancestral to the median, is NP-complete. We also describe an Integer Linear Program for solving this problem. We evaluate the directed distance and rooted median algorithms on simulated data. CONCLUSION: Our results provide a simple genome rearrangement model, extending the SCJ model to account for single-gene duplications, for which we prove a mix of tractability and hardness results. For the NP-complete rooted median problem, we design a simple Integer Linear Program. Our publicly available implementation of these algorithms for the directed distance and median problems allow to solve efficiently these problems on large instances.

8.
Mol Phylogenet Evol ; 48(3): 850-7, 2008 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-18621550

RESUMEN

We present the first two mitochondrial genomes of Muscidae dipterans for the species Haematobia irritans (the horn fly) and Stomoxys calcitrans (the stable fly). Typical insect mtDNA features are described, such as a high A+T content (79.1% and 78.9%, respectively), the preference for A+T-rich codons, and the evidence of a non-optimal codon usage. The strong A+T enrichment partially masks another nucleotide content bias maintained by A+C mutation pressure in these Muscidae mtDNAs. The analysis of this data provides a model of metazoans tRNA anticodon evolution, based on the selection hypothesis of anticodon versatility. H. irritans mitochondrial genome (16078 bp) is structurally similar to the hypothetical ancestral mitochondrial genome of arthropods and its control region (A+ T-rich region in insects) organization is consistent with the structure described for Brachycera dipterans. On the other hand, the mitochondrial genome of S. calcitrans is approximately 2kb longer (18 kb), characterized by the presence of approximately 550 bp tandem repeats in the control region, and an extra copy of trnI remarkably similar to a duplicated element of blowflies mtDNA. Putative sequence elements, involved in the regulation of transcription and replication of the mtDNA, were reliably identified in S. calcitrans control region despite the 0.8-1.5 kb gap uncovered from this genome. The use of amino acid and nucleotide sequences of concatenated mitochondrial protein-coding genes (PCGs) in phylogenetic reconstructions of Diptera does not support the monophyly of Muscomorpha, as well as the monophyly of Acalyptratae. Within the Calyptratae group, the inclusion of Muscidae (Muscoidea) as a sister group of Calliphoridae (Oestroidea) implies in a potential conflict concerning the monophyly of the superfamily Oestroidea.


Asunto(s)
ADN Mitocondrial/genética , Genoma Mitocondrial , Muscidae/genética , Animales , Evolución Biológica , Codón , Duplicación de Gen , Genes de Insecto , Genes Mitocondriales , Variación Genética , Genoma , Modelos Genéticos , Filogenia , ARN de Transferencia/metabolismo , Transcripción Genética
9.
Methods Mol Biol ; 1704: 331-342, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-29277872

RESUMEN

The comparison of genome structures across distinct species offers valuable insights into the species' phylogeny, genome organization, and gene associations. In this chapter, we review the family-free genome comparison tool FFGC which provides several methods for gene order analyses that do not require prior knowledge of evolutionary relationships between the genes across the studied genomes. Moreover, the tool features a complete workflow for genome comparison, requiring nothing but annotated genome sequences as input.


Asunto(s)
Evolución Molecular , Orden Génico , Genoma , Programas Informáticos , Biología Computacional , Modelos Genéticos , Anotación de Secuencia Molecular , Filogenia
10.
Microb Genom ; 4(2)2018 02.
Artículo en Inglés | MEDLINE | ID: mdl-29319471

RESUMEN

MLST (multi-locus sequence typing) is a classic technique for genotyping bacteria, widely applied for pathogen outbreak surveillance. Traditionally, MLST is based on identifying sequence types from a small number of housekeeping genes. With the increasing availability of whole-genome sequencing data, MLST methods have evolved towards larger typing schemes, based on a few hundred genes [core genome MLST (cgMLST)] to a few thousand genes [whole genome MLST (wgMLST)]. Such large-scale MLST schemes have been shown to provide a finer resolution and are increasingly used in various contexts such as hospital outbreaks or foodborne pathogen outbreaks. This methodological shift raises new computational challenges, especially given the large size of the schemes involved. Very few available MLST callers are currently capable of dealing with large MLST schemes. We introduce MentaLiST, a new MLST caller, based on a k-mer voting algorithm and written in the Julia language, specifically designed and implemented to handle large typing schemes. We test it on real and simulated data to show that MentaLiST is faster than any other available MLST caller while providing the same or better accuracy, and is capable of dealing with MLST schemes with up to thousands of genes while requiring limited computational resources. MentaLiST source code and easy installation instructions using a Conda package are available at https://github.com/WGS-TB/MentaLiST.


Asunto(s)
Bacterias/genética , Técnicas de Tipificación Bacteriana/métodos , Tipificación de Secuencias Multilocus/instrumentación , Tipificación de Secuencias Multilocus/métodos , Bacterias/clasificación , Bacterias/aislamiento & purificación , Brotes de Enfermedades , Enterococcus faecium/genética , Monitoreo Epidemiológico , Enfermedades Transmitidas por los Alimentos/microbiología , Genes Esenciales , Genoma Bacteriano , Humanos , Epidemiología Molecular/métodos , Mycobacterium tuberculosis/genética , Salmonella/genética , Programas Informáticos , Factores de Tiempo , Secuenciación Completa del Genoma
11.
Algorithms Mol Biol ; 12: 14, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28559921

RESUMEN

BACKGROUND: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. METHODS: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. RESULTS: The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. CONCLUSIONS: We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs.

12.
Algorithms Mol Biol ; 12: 3, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28293275

RESUMEN

BACKGROUND: Rearrangements are large-scale mutations in genomes, responsible for complex changes and structural variations. Most rearrangements that modify the organization of a genome can be represented by the double cut and join (DCJ) operation. Given two balanced genomes, i.e., two genomes that have exactly the same number of occurrences of each gene in each genome, we are interested in the problem of computing the rearrangement distance between them, i.e., finding the minimum number of DCJ operations that transform one genome into the other. This problem is known to be NP-hard. RESULTS: We propose a linear time approximation algorithm with approximation factor O(k) for the DCJ distance problem, where k is the maximum number of occurrences of any gene in the input genomes. Our algorithm works for linear and circular unichromosomal balanced genomes and uses as an intermediate step an O(k)-approximation for the minimum common string partition problem, which is closely related to the DCJ distance problem. CONCLUSIONS: Experiments on simulated data sets show that our approximation algorithm is very competitive both in efficiency and in quality of the solutions.

13.
Algorithms Mol Biol ; 10: 13, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25859276

RESUMEN

Structural variation in genomes can be revealed by many (dis)similarity measures. Rearrangement operations, such as the so called double-cut-and-join (DCJ), are large-scale mutations that can create complex changes and produce such variations in genomes. A basic task in comparative genomics is to find the rearrangement distance between two given genomes, i.e., the minimum number of rearragement operations that transform one given genome into another one. In a family-based setting, genes are grouped into gene families and efficient algorithms have already been presented to compute the DCJ distance between two given genomes. In this work we propose the problem of computing the DCJ distance of two given genomes without prior gene family assignment, directly using the pairwise similarities between genes. We prove that this new family-free DCJ distance problem is APX-hard and provide an integer linear program to its solution. We also study a family-free DCJ similarity and prove that its computation is NP-hard.

14.
Artículo en Inglés | MEDLINE | ID: mdl-24334378

RESUMEN

Algebraic rearrangement theory, as introduced by Meidanis and Dias, focuses on representing the order in which genes appear in chromosomes, and applies to circular chromosomes only. By shifting our attention to genome adjacencies, we introduce the adjacency algebraic theory, extending the original algebraic theory to linear chromosomes in a very natural way, also allowing the original algebraic distance formula to be used to the general multichromosomal case, with both linear and circular chromosomes. The resulting distance, which we call algebraic distance here, is very similar to, but not quite the same as, double-cut-and-join distance. We present linear time algorithms to compute it and to sort genomes. We show how to compute the rearrangement distance from the adjacency graph, for an easier comparison with other rearrangement distances. A thorough discussion on the relationship between the chromosomal and adjacency representation is also given, and we show how all classic rearrangement operations can be modeled using the algebraic theory.


Asunto(s)
Algoritmos , Reordenamiento Génico/genética , Genómica/métodos , Modelos Genéticos , Genoma , Modelos Lineales , Telómero
15.
Artículo en Inglés | MEDLINE | ID: mdl-23702549

RESUMEN

Recently, the Single-Cut-or-Join (SCJ) operation was proposed as a basis for a new rearrangement distance between multichromosomal genomes, leading to very fast algorithms, both in theory and in practice. However, it was not clear how well this new distance fares when it comes to using it to solve relevant problems, such as the reconstruction of evolutionary history. In this paper, we advance current knowledge, by testing SCJ's ability regarding evolutionary reconstruction in two aspects: 1) How well does SCJ reconstruct evolutionary topologies? and 2) How well does SCJ reconstruct ancestral genomes? In the process of answering these questions, we implemented SCJ-based methods, and made them available to the community. We ran experiments using as many as 200 genomes, with as many as 3,000 genes. For the first question, we found out that SCJ can recover typically between 60 percent and more than 95 percent of the topology, as measured through the Robinson-Foulds distance (a.k.a. split distance) between trees. In other words, 60 percent to more than 95 percent of the original splits are also present in the reconstructed tree. For the second question, given a topology, SCJ's ability to reconstruct ancestral genomes depends on how far from the leaves the ancestral is. For nodes close to the leaves, about 85 percent of the gene adjacencies can be recovered. This percentage decreases as we move up the tree, but, even at the root, about 50 percent of the adjacencies are recovered, for as many as 64 leaves. Our findings corroborate the fact that SCJ leads to very conservative genome reconstructions, yielding very few false-positive gene adjacencies in the ancestrals, at the expense of a relatively larger amount of false negatives. In addition, experiments with real data from the Campanulaceae and Protostomes groups show that SCJ reconstructs topologies of quality comparable to the accepted trees of the species involved. As far as time is concerned, the methods we implemented can find a topology for 64 genomes with 2,000 genes each in about 10.7 minutes, and reconstruct the ancestral genomes in a 64-leaf tree in about 3 seconds, both on a typical desktop computer. It should be noted that our code is written in Java and we made no significant effort to optimize it.


Asunto(s)
Reordenamiento Génico , Genómica/métodos , Modelos Genéticos , Filogenia , Animales , Campanulaceae , Simulación por Computador , Evolución Molecular , Genoma , Programas Informáticos
16.
Artículo en Inglés | MEDLINE | ID: mdl-21339538

RESUMEN

The breakpoint distance is one of the most straightforward genome comparison measures. Surprisingly, when it comes to defining it precisely for multichromosomal genomes with both linear and circular chromosomes, there is more than one way to go about it. Pevzner and Tesler gave a definition in a 2003 paper, Tannier et al. defined it differently in 2008, and in this paper we provide yet another alternative, calling it SCJ for single-cut-or-join, in analogy to the popular double cut and join (DCJ) measure. We show that several genome rearrangement problems, such as median and halving, become easy for SCJ, and provide linear and higher polynomial time algorithms for them. For the multichromosomal linear genome median problem, this is the first polynomial time algorithm described, since for other distances this problem is NP-hard. In addition, we show that small parsimony under SCJ is also easy, and can be solved by a variant of Fitch's algorithm. In contrast, big parsimony is NP-hard under SCJ. This new distance measure may be of value as a speedily computable, first approximation to distances based on more realistic rearrangement models.


Asunto(s)
Algoritmos , Reordenamiento Génico/genética , Genómica/métodos , Modelos Genéticos , Filogenia
17.
Bioinformatics ; 22(7): 902-3, 2006 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-16446277

RESUMEN

UNLABELLED: The Arthropodan Mitochondrial Genomes Accessible database (AMiGA) is a relational database developed to help in managing access to the increasing amount of data arising from developments in arthropodan mitochondrial genomics (136 mitochondrial genomes as of September 2005). The strengths of AMiGA include (1) a more accessible and up-to-date database containing a more comprehensive set of mitochondrial genomes for this phylum, (2) the provision of flexible search options for retrieving detailed information such as bibliographical data, genomic graphics, FASTA sequences and taxonomical status, (3) the possibility of enhanced comparative analyses by multiple alignment of single or concatenated sets of genes, (4) more accurate and updated information resulting from a specific curation process called AMiGA Notes and (5) the possibility of including unpublished sequences in a password-restricted area for comparative analysis with the other sequences stored in the database. AVAILABILITY: http://amiga.cbmeg.unicamp.br CONTACT: lessinger@amiga.cbmeg.unicamp.br SUPPLEMENTARY INFORMATION: Detailed information, including an illustrated tutorial, is available from the above URL.


Asunto(s)
Artrópodos/genética , ADN Mitocondrial , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Genómica/métodos , Animales , Artrópodos/metabolismo , Biología Computacional
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda