RESUMO
BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. RESULTS: This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. CONCLUSION: The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.
Assuntos
Genoma Mitocondrial , Software , Análise de Sequência de DNA/métodos , Algoritmos , Anotação de Sequência Molecular , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
The spectrum of viruses in insects is important for subjects as diverse as public health, veterinary medicine, food production, and biodiversity conservation. The traditional interest in vector-borne diseases of humans and livestock has drawn the attention of virus studies to hematophagous insect species. However, these represent only a tiny fraction of the broad diversity of Hexapoda, the most speciose group of animals. Here, we systematically probed the diversity of negative strand RNA viruses in the largest and most representative collection of insect transcriptomes from samples representing all 34 extant orders of Hexapoda and 3 orders of Entognatha, as well as outgroups, altogether representing 1243 species. Based on profile hidden Markov models we detected 488 viral RNA-directed RNA polymerase (RdRp) sequences with similarity to negative strand RNA viruses. These were identified in members of 324 arthropod species. Selection for length, quality, and uniqueness left 234 sequences for analyses, showing similarity to genomes of viruses classified in Bunyavirales (n = 86), Articulavirales (n = 54), and several orders within Haploviricotina (n = 94). Coding-complete genomes or nearly-complete subgenomic assemblies were obtained in 61 cases. Based on phylogenetic topology and the availability of coding-complete genomes we estimate that at least 20 novel viral genera in seven families need to be defined, only two of them monospecific. Seven additional viral clades emerge when adding sequences from the present study to formerly monospecific lineages, potentially requiring up to seven additional genera. One long sequence may indicate a novel family. For segmented viruses, cophylogenies between genome segments were generally improved by the inclusion of viruses from the present study, suggesting that in silico misassembly of segmented genomes is rare or absent. Contrary to previous assessments, significant virus-host codivergence was identified in major phylogenetic lineages based on two different approaches of codivergence analysis in a hypotheses testing framework. In spite of these additions to the known spectrum of viruses in insects, we caution that basing taxonomic decisions on genome information alone is challenging due to technical uncertainties, such as the inability to prove integrity of complete genome assemblies of segmented viruses.
Assuntos
Insetos/virologia , Infecções por Vírus de RNA/virologia , Vírus de RNA , AnimaisRESUMO
With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the peculiar processing of mitochondrial transcripts, however, makes the determination of precise gene boundaries a surprisingly difficult problem. We have analyzed the properties of annotated start and stop codon positions in detail, and use the inferred patterns to devise a new method for predicting gene boundaries in de novo annotations. Our method benefits from empirically observed prevalances of start/stop codons and gene lengths, and considers the dependence of these features on variations of genetic codes. Albeit not being perfect, our new approach yields a drastic improvement in the accuracy of gene boundaries and upgrades the mitochondrial genome annotation server MITOS to an even more sophisticated tool for fully automatic annotation of metazoan mitochondrial genomes.
Assuntos
Proteínas Mitocondriais/genética , Anotação de Sequência Molecular/métodos , Animais , Código Genético , Genoma Mitocondrial , Proteínas Mitocondriais/metabolismo , Anotação de Sequência Molecular/normas , RNA Mensageiro/genética , RNA Mensageiro/metabolismoRESUMO
Social networks mediate the spread of information and disease. The dynamics of spreading depends, among other factors, on the distribution of times between successive contacts in the network. Heavy-tailed (bursty) time distributions are characteristic of human communication networks, including face-to-face contacts and electronic communication via mobile phone calls, email, and internet communities. Burstiness has been cited as a possible cause for slow spreading in these networks relative to a randomized reference network. However, it is not known whether burstiness is an epiphenomenon of human-specific patterns of communication. Moreover, theory predicts that fast, bursty communication networks should also exist. Here, we present a high-throughput technology for automated monitoring of social interactions of individual honeybees and the analysis of a rich and detailed dataset consisting of more than 1.2 million interactions in five honeybee colonies. We find that bees, like humans, also interact in bursts but that spreading is significantly faster than in a randomized reference network and remains so even after an experimental demographic perturbation. Thus, while burstiness may be an intrinsic property of social interactions, it does not always inhibit spreading in real-world communication networks. We anticipate that these results will inform future models of large-scale social organization and information and disease transmission, and may impact health management of threatened honeybee populations.
Assuntos
Comunicação Animal , Abelhas/fisiologia , Comportamento Social , Animais , Modelos BiológicosRESUMO
The two-machine permutation flow shop scheduling problem with buffer is studied for the special case that all processing times on one of the two machines are equal to a constant c. This case is interesting because it occurs in various applications, for example, when one machine is a packing machine or when materials have to be transported. Different types of buffers and buffer usage are considered. It is shown that all considered buffer flow shop problems remain NP-hard for the makespan criterion even with the restriction to equal processing times on one machine. However, the special case where the constant c is larger or smaller than all processing times on the other machine is shown to be polynomially solvable by presenting an algorithm (2BF-OPT) that calculates optimal schedules in O(nlogn) steps. Two heuristics for solving the NP-hard flow shop problems are proposed: (i) a modification of the commonly used NEH heuristic (mNEH) and (ii) an Iterated Local Search heuristic (2BF-ILS) that uses the mNEH heuristic for computing its initial solution. It is shown experimentally that the proposed 2BF-ILS heuristic obtains better results than two state-of-the-art algorithms for buffered flow shop problems from the literature and an Ant Colony Optimization algorithm. In addition, it is shown experimentally that 2BF-ILS obtains the same solution quality as the standard NEH heuristic, however, with a smaller number of function evaluations.
Assuntos
Algoritmos , HeurísticaRESUMO
BACKGROUND: To study the differences between two unichromosomal circular genomes, e.g., mitochondrial genomes, under the tandem duplication random loss (TDRL) rearrangement it is important to consider the whole set of potential TDRL rearrangement events that could have taken place. The reason is that for two given circular gene orders there can exist different TDRL rearrangements that transform one of the gene orders into the other. Hence, a TDRL event cannot always be reconstructed only from the knowledge of the circular gene order before a TDRL event and the circular gene order after it. RESULTS: We present the program EqualTDRL that computes and illustrates the complete set of TDRLs for pairs of circular gene orders that differ by only one TDRL. EqualTDRL considers the circularity of the given genomes and certain restrictions on the TDRL rearrangements. Examples for the latter are sequences of genes that have to be conserved during a TDRL or pairs of genes that frame intergenic regions which might represent remnants of duplicated genes. Additionally, EqualTDRL allows to determine the set of TDRLs that are minimum with respect to the number of duplicated genes. CONCLUSION: EqualTDRL supports scientists to study the complete set of TDRLs that possibly could have taken place in the evolution of mitochondrial genomes. EqualTDRL is implemented in C++ using the ggplot2 package of the open source programming language R and is freely available from http://pacosy.informatik.uni-leipzig.de/equaltdrl .
Assuntos
Evolução Molecular , Genoma Mitocondrial , Software , DNA Intergênico , Duplicação Gênica , Ordem dos Genes , Genes DuplicadosRESUMO
Phylogenomics heavily relies on well-curated sequence data sets that comprise, for each gene, exclusively 1:1 orthologos. Paralogs are treated as a dangerous nuisance that has to be detected and removed. We show here that this severe restriction of the data sets is not necessary. Building upon recent advances in mathematical phylogenetics, we demonstrate that gene duplications convey meaningful phylogenetic information and allow the inference of plausible phylogenetic trees, provided orthologs and paralogs can be distinguished with a degree of certainty. Starting from tree-free estimates of orthology, cograph editing can sufficiently reduce the noise to find correct event-annotated gene trees. The information of gene trees can then directly be translated into constraints on the species trees. Although the resolution is very poor for individual gene families, we show that genome-wide data sets are sufficient to generate fully resolved phylogenetic trees, even in the presence of horizontal gene transfer.
Assuntos
Genômica , FilogeniaRESUMO
Remolding of tRNAs is a well-documented process in mitochondrial genomes that changes the identity of a tRNA. It involves a duplication of a tRNA gene, a mutation that changes the anticodon and the loss of the ancestral tRNA gene. The net effect is a functional tRNA that is more closely related to tRNAs of a different alloacceptor family than to tRNAs with the same anticodon in related species. Beyond being of interest for understanding mitochondrial tRNA function and evolution, tRNA remolding events can lead to artifacts in the annotation of mitogenomes and thus in studies of mitogenomic evolution. Therefore, it is important to identify and catalog these events. Here we describe novel methods to detect tRNA remolding in large-scale data sets and apply them to survey tRNA remolding throughout animal evolution. We identify several novel remolding events in addition to the ones previously mentioned in the literature. A detailed analysis of these remoldings showed that many of them are derived from ancestral events.
Assuntos
Evolução Molecular , Genoma Mitocondrial , RNA de Transferência/genética , Animais , Anticódon , Códon , Crustáceos/genética , Mutação , Poríferos/genética , RNA de Transferência de Leucina/genética , Alinhamento de SequênciaRESUMO
MOTIVATION: Computer-assisted studies of structure, function and evolution of viruses remains a neglected area of research. The attention of bioinformaticians to this interesting and challenging field is far from commensurate with its medical and biotechnological importance. It is telling that out of >200 talks held at ISMB 2013, the largest international bioinformatics conference, only one presentation explicitly dealt with viruses. In contrast to many broad, established and well-organized bioinformatics communities (e.g. structural genomics, ontologies, next-generation sequencing, expression analysis), research groups focusing on viruses can probably be counted on the fingers of two hands. RESULTS: The purpose of this review is to increase awareness among bioinformatics researchers about the pressing needs and unsolved problems of computational virology. We focus primarily on RNA viruses that pose problems to many standard bioinformatics analyses owing to their compact genome organization, fast mutation rate and low evolutionary conservation. We provide an overview of tools and algorithms for handling viral sequencing data, detecting functionally important RNA structures, classifying viral proteins into families and investigating the origin and evolution of viruses.
Assuntos
Biologia Computacional , Vírus de RNA/genética , Animais , Biologia Computacional/métodos , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Filogenia , RNA Viral/química , RNA Viral/genéticaRESUMO
We present an empirically based group model of foraging interactions in Messor pergandei, the Sonoran desert harvesting ant. M. pergandei colonies send out daily foraging columns consisting of tens of thousands of individual ants. Each day, the directions of the columns may change depending on the resource availability and the neighbor interactions. If neighboring columns meet, ants fight, and subsequent foraging is suppressed. M. pergandei colonies face a general problem which is present in many systems: dynamic spatial partitioning in a constantly changing environment, while simultaneously minimizing negative competitive interactions with multiple neighbors. Our simulation model of a population of column foragers is spatially explicit and includes neighbor interactions. We study how different behavioral strategies influence resource exploitation and space use for different nest distributions and densities. Column foraging in M. pergandei is adapted to the spatial and temporal properties of their natural habitat. Resource and space use is maximized both at the colony and the population level by a model with a behavioral strategy including learning and fast forgetting rates.
Assuntos
Agressão , Formigas/fisiologia , Comportamento Apetitivo , Comportamento Competitivo/fisiologia , Comportamento Social , Algoritmos , Animais , Comportamento Animal , Simulação por Computador , Ecossistema , Movimento , Territorialidade , Fatores de TempoRESUMO
Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit 'bizarre' secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading 'pseudogenes', even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders.
Assuntos
Evolução Molecular , Genoma Mitocondrial , Anotação de Sequência Molecular/métodos , RNA de Transferência/química , RNA de Transferência/genética , RNA/química , RNA/genética , Animais , Ordem dos Genes , Genes Mitocondriais , Pseudogenes , RNA MitocondrialRESUMO
Summary: DeGeCI is a command line tool that generates fully automated de novo gene predictions from mitochondrial nucleotide sequences by using a reference database of annotated mitogenomes which is represented as a de Bruijn graph. The input genome is mapped to this graph, creating a subgraph, which is then post-processed by a clustering routine. Version 1.1 of DeGeCI offers a web front-end for GUI-based input. It also introduces a new taxonomic filter pipeline that allows the species in the reference database to be restricted to a user-specified taxonomic classification and allows for gene boundary optimization when providing the translation table of the input genome. Availability and implementation: The web platform is accessible at https://degeci.informatik.uni-leipzig.de. Source code is freely available at https://git.informatik.uni-leipzig.de/lfiedler/degeci.
RESUMO
Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.
Assuntos
Algoritmos , Rearranjo Gênico , Genômica , Família Multigênica , Software , Humanos , Biologia Computacional/métodos , Genoma/genética , Genômica/métodos , Modelos Genéticos , AnimaisRESUMO
In this review we provide an overview of various bioinformatics methods and tools for the analysis of metazoan mitochondrial genomes. We compare available dedicated databases and present current tools for accurate genome annotation, identification of protein coding genes, and determination of tRNA and rRNA models.We also evaluate various tools and models for phylogenetic tree inference using gene order or sequence based data. As for gene order based methods, we compare rearrangement based and gene cluster based methods for gene order rearrangement analysis. As for sequence based methods, we give special emphasis to substitution models or data treatment that reduces certain systematic biases that are typical for metazoan mitogenomes such as within genome and/or among lineage compositional heterogeneity.
Assuntos
Biologia Computacional/métodos , Genoma Mitocondrial , Análise de Sequência de DNA/métodos , Animais , Bases de Dados Genéticas , Evolução Molecular , Ordem dos Genes , Rearranjo Gênico , Modelos Genéticos , Anotação de Sequência Molecular , Filogenia , RNA Ribossômico/genética , RNA de Transferência/genéticaRESUMO
About 2000 completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.
Assuntos
Biologia Computacional , Genoma Mitocondrial , Anotação de Sequência Molecular , Software , Animais , Evolução Molecular , Internet , Filogenia , Análise de Sequência de DNARESUMO
About 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies. Nevertheless, mitochondrial genome data might still be promising as it allows for a larger taxon sampling, while presenting a smaller amount of sequence information. We present the most comprehensive analysis of bilaterian relationships based on mitochondrial genome data. The analyzed data set comprises more than 650 mitochondrial genomes that have been chosen to represent a profound sample of the phylogenetic as well as sequence diversity. The results are based on high quality amino acid alignments obtained from a complete reannotation of the mitogenomic sequences from NCBI RefSeq database. However, the results failed to give support for many otherwise undisputed high-ranking taxa, like Mollusca, Hexapoda, Arthropoda, and suffer from extreme long branches of Nematoda, Platyhelminthes, and some other taxa. In order to identify the sources of misleading phylogenetic signals, we discuss several problems associated with mitochondrial genome data sets, e.g. the nucleotide and amino acid landscapes and a strong correlation of gene rearrangements with long branches.
Assuntos
Ordem dos Genes , Genoma Mitocondrial , Filogenia , Substituição de Aminoácidos , Aminoácidos/genética , Animais , Teorema de Bayes , Rearranjo Gênico , Funções Verossimilhança , Modelos Genéticos , Nucleotídeos/genética , Alinhamento de SequênciaRESUMO
A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.
RESUMO
Barcode-based tracking of individuals is revolutionizing animal behavior studies, but further progress hinges on whether in addition to determining an individual's location, specific behaviors can be identified and monitored. We achieve this goal using information from the barcodes to identify tightly bounded image regions that potentially show the behavior of interest. These image regions are then analyzed with convolutional neural networks to verify that the behavior occurred. When applied to a challenging test case, detecting social liquid transfer (trophallaxis) in the honey bee hive, this approach yielded a 67% higher sensitivity and an 11% lower error rate than the best detector for honey bee trophallaxis so far. We were furthermore able to automatically detect whether a bee donates or receives liquid, which previously required manual observations. By applying our trophallaxis detector to recordings from three honey bee colonies and performing simulations, we discovered that liquid exchanges among bees generate two distinct social networks with different transmission capabilities. Finally, we demonstrate that our approach generalizes to detecting other specific behaviors. We envision that its broad application will enable automatic, high-resolution behavioral studies that address a broad range of previously intractable questions in evolutionary biology, ethology, neuroscience, and molecular biology.
Assuntos
Inteligência Artificial , Comportamento Animal , Abelhas , Animais , Comportamento SocialRESUMO
Ants live in dynamically changing environments, where food sources become depleted and alternative sources appear. Yet most mathematical models of ant foraging assume that the ants' foraging environment is static. Here we describe a mathematical model of ant foraging in a dynamic environment. Our model attempts to explain recent empirical data on dynamic foraging in the Argentine ant Linepithema humile (Mayr). The ants are able to find the shortest path in a Towers of Hanoi maze, a complex network containing 32,768 alternative paths, even when the maze is altered dynamically. We modify existing models developed to explain ant foraging in static environments, to elucidate what possible mechanisms allow the ants to quickly adapt to changes in their foraging environment. Our results suggest that navigation of individual ants based on a combination of one pheromone deposited during foraging and directional information enables the ants to adapt their foraging trails and recreates the experimental results.
Assuntos
Formigas/fisiologia , Comportamento Alimentar/fisiologia , Modelos Biológicos , Adaptação Fisiológica/fisiologia , Algoritmos , Animais , Aprendizagem em Labirinto/fisiologia , Feromônios/fisiologiaRESUMO
Learning from previous actions is a key feature of decision-making. Diverse biological systems, from neuronal assemblies to insect societies, use a combination of positive feedback and forgetting of stored memories to process and respond to input signals. Here we look how these systems deal with a dynamic two-armed bandit problem of detecting a very weak signal in the presence of a high degree of noise. We show that by tuning the form of positive feedback and the decay rate to appropriate values, a single tracking variable can effectively detect dynamic inputs even in the presence of a large degree of noise. In particular, we show that when tuned appropriately a simple positive feedback algorithm is Fisher efficient, in that it can track changes in a signal on a time of order L(h)=(|h|/σ)(-2), where |h| is the magnitude of the signal and σ the magnitude of the noise.