Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 39
Filtrar
1.
Genome Res ; 33(2): 261-268, 2023 02.
Artigo em Inglês | MEDLINE | ID: mdl-36828587

RESUMO

There are thousands of well-maintained high-quality open-source software utilities for all aspects of scientific data analysis. For more than a decade, the Galaxy Project has been providing computational infrastructure and a unified user interface for these tools to make them accessible to a wide range of researchers. To streamline the process of integrating tools and constructing workflows as much as possible, we have developed Planemo, a software development kit for tool and workflow developers and Galaxy power users. Here we outline Planemo's implementation and describe its broad range of functionality for designing, testing, and executing Galaxy tools, workflows, and training material. In addition, we discuss the philosophy underlying Galaxy tool and workflow development, and how Planemo encourages the use of development best practices, such as test-driven development, by its users, including those who are not professional software developers.


Assuntos
Biologia Computacional , Software , Fluxo de Trabalho , Análise de Dados
2.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35849097

RESUMO

Many chemicals are present in our environment, and all living species are exposed to them. However, numerous chemicals pose risks, such as developing severe diseases, if they occur at the wrong time in the wrong place. For the majority of the chemicals, these risks are not known. Chemical risk assessment and subsequent regulation of use require efficient and systematic strategies. Lab-based methods-even if high throughput-are too slow to keep up with the pace of chemical innovation. Existing computational approaches are designed for specific chemical classes or sub-problems but not usable on a large scale. Further, the application range of these approaches is limited by the low amount of available labeled training data. We present the ready-to-use and stand-alone program deepFPlearn that predicts the association between chemical structures and effects on the gene/pathway level using a combined deep learning approach. deepFPlearn uses a deep autoencoder for feature reduction before training a deep feed-forward neural network to predict the target association. We received good prediction qualities and showed that our feature compression preserves relevant chemical structural information. Using a vast chemical inventory (unlabeled data) as input for the autoencoder did not reduce our prediction quality but allowed capturing a much more comprehensive range of chemical structures. We predict meaningful-experimentally verified-associations of chemicals and effects on unseen data. deepFPlearn classifies hundreds of thousands of chemicals in seconds. We provide deepFPlearn as an open-source and flexible tool that can be easily retrained and customized to different application settings at https://github.com/yigbt/deepFPlearn.


Assuntos
Compressão de Dados , Redes Neurais de Computação , Medição de Risco
3.
Bioinformatics ; 39(12)2023 12 01.
Artigo em Inglês | MEDLINE | ID: mdl-38011648

RESUMO

SUMMARY: Sophisticated approaches for the in silico prediction of toxicity are required to support the risk assessment of chemicals. The number of chemicals on the global chemical market and the speed of chemical innovation stand in massive contrast to the capacity for regularizing chemical use. We recently proved our ready-to-use application deepFPlearn as a suitable approach for this task. Here, we present its extension deepFPlearn+ incorporating (i) a graph neural network to feed our AI with a more sophisticated molecular structure representation and (ii) alternative train-test splitting strategies that involve scaffold structures and the molecular weights of chemicals. We show that the GNNs outperform the previous model substantially and that our models can generalize on unseen data even with a more robust and challenging test set. Therefore, we highly recommend the application of deepFPlearn+ on the chemical inventory to prioritize chemicals for experimental testing or any chemical subset of interest in monitoring studies. AVAILABILITY AND IMPLEMENTATION: The software is compatible with python 3.6 or higher, and the source code can be found on our GitHub repository: https://github.com/yigbt/deepFPlearn. The data underlying this article are available in Zenodo, and can be accessed with the link below: https://zenodo.org/record/8146252. Detailed installation guides via Docker, Singularity, and Conda are provided within the repository for operability across all operating systems.


Assuntos
Redes Neurais de Computação , Software
4.
BMC Bioinformatics ; 24(1): 235, 2023 Jun 05.
Artigo em Inglês | MEDLINE | ID: mdl-37277700

RESUMO

BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. RESULTS: This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI 's ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI 's applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. CONCLUSION: The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.


Assuntos
Genoma Mitocondrial , Software , Análise de Sequência de DNA/métodos , Algoritmos , Anotação de Sequência Molecular , Sequenciamento de Nucleotídeos em Larga Escala/métodos
5.
Expert Rev Proteomics ; 20(11): 251-266, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37787106

RESUMO

INTRODUCTION: Continuous advances in mass spectrometry (MS) technologies have enabled deeper and more reproducible proteome characterization and a better understanding of biological systems when integrated with other 'omics data. Bioinformatic resources meeting the analysis requirements of increasingly complex MS-based proteomic data and associated multi-omic data are critically needed. These requirements included availability of software that would span diverse types of analyses, scalability for large-scale, compute-intensive applications, and mechanisms to ease adoption of the software. AREAS COVERED: The Galaxy ecosystem meets these requirements by offering a multitude of open-source tools for MS-based proteomics analyses and applications, all in an adaptable, scalable, and accessible computing environment. A thriving global community maintains these software and associated training resources to empower researcher-driven analyses. EXPERT OPINION: The community-supported Galaxy ecosystem remains a crucial contributor to basic biological and clinical studies using MS-based proteomics. In addition to the current status of Galaxy-based resources, we describe ongoing developments for meeting emerging challenges in MS-based proteomic informatics. We hope this review will catalyze increased use of Galaxy by researchers employing MS-based proteomics and inspire software developers to join the community and implement new tools, workflows, and associated training content that will add further value to this already rich ecosystem.


Assuntos
Proteômica , Humanos , Biologia Computacional/métodos , Espectrometria de Massas/métodos , Proteômica/métodos , Software
7.
Nucleic Acids Res ; 47(20): 10543-10552, 2019 11 18.
Artigo em Inglês | MEDLINE | ID: mdl-31584075

RESUMO

With the rapid increase of sequenced metazoan mitochondrial genomes, a detailed manual annotation is becoming more and more infeasible. While it is easy to identify the approximate location of protein-coding genes within mitogenomes, the peculiar processing of mitochondrial transcripts, however, makes the determination of precise gene boundaries a surprisingly difficult problem. We have analyzed the properties of annotated start and stop codon positions in detail, and use the inferred patterns to devise a new method for predicting gene boundaries in de novo annotations. Our method benefits from empirically observed prevalances of start/stop codons and gene lengths, and considers the dependence of these features on variations of genetic codes. Albeit not being perfect, our new approach yields a drastic improvement in the accuracy of gene boundaries and upgrades the mitochondrial genome annotation server MITOS to an even more sophisticated tool for fully automatic annotation of metazoan mitochondrial genomes.


Assuntos
Proteínas Mitocondriais/genética , Anotação de Sequência Molecular/métodos , Animais , Código Genético , Genoma Mitocondrial , Proteínas Mitocondriais/metabolismo , Anotação de Sequência Molecular/normas , RNA Mensageiro/genética , RNA Mensageiro/metabolismo
8.
BMC Bioinformatics ; 19(1): 192, 2018 05 30.
Artigo em Inglês | MEDLINE | ID: mdl-29843612

RESUMO

BACKGROUND: To study the differences between two unichromosomal circular genomes, e.g., mitochondrial genomes, under the tandem duplication random loss (TDRL) rearrangement it is important to consider the whole set of potential TDRL rearrangement events that could have taken place. The reason is that for two given circular gene orders there can exist different TDRL rearrangements that transform one of the gene orders into the other. Hence, a TDRL event cannot always be reconstructed only from the knowledge of the circular gene order before a TDRL event and the circular gene order after it. RESULTS: We present the program EqualTDRL that computes and illustrates the complete set of TDRLs for pairs of circular gene orders that differ by only one TDRL. EqualTDRL considers the circularity of the given genomes and certain restrictions on the TDRL rearrangements. Examples for the latter are sequences of genes that have to be conserved during a TDRL or pairs of genes that frame intergenic regions which might represent remnants of duplicated genes. Additionally, EqualTDRL allows to determine the set of TDRLs that are minimum with respect to the number of duplicated genes. CONCLUSION: EqualTDRL supports scientists to study the complete set of TDRLs that possibly could have taken place in the evolution of mitochondrial genomes. EqualTDRL is implemented in C++ using the ggplot2 package of the open source programming language R and is freely available from http://pacosy.informatik.uni-leipzig.de/equaltdrl .


Assuntos
Evolução Molecular , Genoma Mitocondrial , Software , DNA Intergênico , Duplicação Gênica , Ordem dos Genes , Genes Duplicados
9.
Mol Phylogenet Evol ; 106: 209-216, 2017 01.
Artigo em Inglês | MEDLINE | ID: mdl-27693569

RESUMO

Mitochondrial genome sequences are available in large number and new sequences become published nowadays with increasing pace. Fast, automatic, consistent, and high quality annotations are a prerequisite for downstream analyses. Therefore, we present an automated pipeline for fast de novo annotation of mitochondrial protein-coding genes. The annotation is based on enhanced phylogeny-aware hidden Markov models (HMMs). The pipeline builds taxon-specific enhanced multiple sequence alignments (MSA) of already annotated sequences and corresponding HMMs using an approximation of the phylogeny. The MSAs are enhanced by fixing unannotated frameshifts, purging of wrong sequences, and removal of non-conserved columns from both ends. A comparison with reference annotations highlights the high quality of the results. The frameshift correction method predicts a large number of frameshifts, many of which are unknown. A detailed analysis of the frameshifts in nad3 of the Archosauria-Testudines group has been conducted.


Assuntos
Genoma Mitocondrial , Animais , Sequência de Bases , Aves/classificação , DNA Mitocondrial/química , DNA Mitocondrial/classificação , DNA Mitocondrial/metabolismo , Bases de Dados Genéticas , Mutação da Fase de Leitura , Cadeias de Markov , Proteínas Mitocondriais/química , Proteínas Mitocondriais/classificação , Proteínas Mitocondriais/genética , Proteínas Mitocondriais/metabolismo , Dados de Sequência Molecular , Filogenia , Répteis/classificação , Alinhamento de Sequência
10.
Nucleic Acids Res ; 43(16): 8044-56, 2015 Sep 18.
Artigo em Inglês | MEDLINE | ID: mdl-26227972

RESUMO

Remolding of tRNAs is a well-documented process in mitochondrial genomes that changes the identity of a tRNA. It involves a duplication of a tRNA gene, a mutation that changes the anticodon and the loss of the ancestral tRNA gene. The net effect is a functional tRNA that is more closely related to tRNAs of a different alloacceptor family than to tRNAs with the same anticodon in related species. Beyond being of interest for understanding mitochondrial tRNA function and evolution, tRNA remolding events can lead to artifacts in the annotation of mitogenomes and thus in studies of mitogenomic evolution. Therefore, it is important to identify and catalog these events. Here we describe novel methods to detect tRNA remolding in large-scale data sets and apply them to survey tRNA remolding throughout animal evolution. We identify several novel remolding events in addition to the ones previously mentioned in the literature. A detailed analysis of these remoldings showed that many of them are derived from ancestral events.


Assuntos
Evolução Molecular , Genoma Mitocondrial , RNA de Transferência/genética , Animais , Anticódon , Códon , Crustáceos/genética , Mutação , Poríferos/genética , RNA de Transferência de Leucina/genética , Alinhamento de Sequência
11.
Nucleic Acids Res ; 40(7): 2833-45, 2012 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-22139921

RESUMO

Transfer RNAs (tRNAs) are present in all types of cells as well as in organelles. tRNAs of animal mitochondria show a low level of primary sequence conservation and exhibit 'bizarre' secondary structures, lacking complete domains of the common cloverleaf. Such sequences are hard to detect and hence frequently missed in computational analyses and mitochondrial genome annotation. Here, we introduce an automatic annotation procedure for mitochondrial tRNA genes in Metazoa based on sequence and structural information in manually curated covariance models. The method, applied to re-annotate 1876 available metazoan mitochondrial RefSeq genomes, allows to distinguish between remaining functional genes and degrading 'pseudogenes', even at early stages of divergence. The subsequent analysis of a comprehensive set of mitochondrial tRNA genes gives new insights into the evolution of structures of mitochondrial tRNA sequences as well as into the mechanisms of genome rearrangements. We find frequent losses of tRNA genes concentrated in basal Metazoa, frequent independent losses of individual parts of tRNA genes, particularly in Arthropoda, and wide-spread conserved overlaps of tRNAs in opposite reading direction. Direct evidence for several recent Tandem Duplication-Random Loss events is gained, demonstrating that this mechanism has an impact on the appearance of new mitochondrial gene orders.


Assuntos
Evolução Molecular , Genoma Mitocondrial , Anotação de Sequência Molecular/métodos , RNA de Transferência/química , RNA de Transferência/genética , RNA/química , RNA/genética , Animais , Ordem dos Genes , Genes Mitocondriais , Pseudogenes , RNA Mitocondrial
12.
Bioinform Adv ; 4(1): vbae072, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38799704

RESUMO

Summary: DeGeCI is a command line tool that generates fully automated de novo gene predictions from mitochondrial nucleotide sequences by using a reference database of annotated mitogenomes which is represented as a de Bruijn graph. The input genome is mapped to this graph, creating a subgraph, which is then post-processed by a clustering routine. Version 1.1 of DeGeCI offers a web front-end for GUI-based input. It also introduces a new taxonomic filter pipeline that allows the species in the reference database to be restricted to a user-specified taxonomic classification and allows for gene boundary optimization when providing the translation table of the input genome. Availability and implementation: The web platform is accessible at https://degeci.informatik.uni-leipzig.de. Source code is freely available at https://git.informatik.uni-leipzig.de/lfiedler/degeci.

13.
Methods Mol Biol ; 2802: 215-245, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38819562

RESUMO

Genome rearrangements are mutations that change the gene content of a genome or the arrangement of the genes on a genome. Several years of research on genome rearrangements have established different algorithmic approaches for solving some fundamental problems in comparative genomics based on gene order information. This review summarizes the literature on genome rearrangement analysis along two lines of research. The first line considers rearrangement models that are particularly well suited for a theoretical analysis. These models use rearrangement operations that cut chromosomes into fragments and then join the fragments into new chromosomes. The second line works with rearrangement models that reflect several biologically motivated constraints, e.g., the constraint that gene clusters have to be preserved. In this chapter, the border between algorithmically "easy" and "hard" rearrangement problems is sketched and a brief review is given on the available software tools for genome rearrangement analysis.


Assuntos
Algoritmos , Rearranjo Gênico , Genômica , Família Multigênica , Software , Humanos , Biologia Computacional/métodos , Genoma/genética , Genômica/métodos , Modelos Genéticos , Animais
14.
Genome Biol Evol ; 2024 Oct 22.
Artigo em Inglês | MEDLINE | ID: mdl-39437314

RESUMO

Mitochondrial tRNAs have acquired a diverse portfolio of aberrant structures throughout metazoan evolution. With the availability of more than 12,500 mitogenome sequences, it is essential to compile a comprehensive overview of the pattern changes with regard to mt-tRNA repertoire and structural variations. This, of course, requires reanalysis of the sequence data of more than 250,000 mt-tRNAs with a uniform workflow. Here, we report our results on the complete reannotation of all mitogenomes available in the RefSeq database by September 2022 using mitos2. Based on the individual cases of mt-tRNA variants reported throughout the literature, our data pinpoint the respective hotspots of change, i.e. Acanthocephala (Lophotrochozoa), Nematoda, Acariformes and Araneae (Arthropoda). Less dramatic deviations of mt-tRNAs from the norm are observed throughout many other clades. Loss of arms in animal mt-tRNA clearly is a phenomenon that occurred independently many times, not limited to a small number of specific clades. The summary data here provide a starting point for systematic investigations into the detailed evolutionary processes of structural reduction and and loss of mt-tRNAs as well as a resource for further improvements of annotation workflows for mt-tRNA annotation.

15.
Mol Phylogenet Evol ; 69(2): 328-38, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23142697

RESUMO

Many years of extensive studies of metazoan mitochondrial genomes have established differences in gene arrangements and genetic codes as valuable phylogenetic markers. Understanding the underlying mechanisms of replication, transcription and the role of the control regions which cause e.g. different gene orders is important to assess the phylogenetic signal of such events. This review summarises and discusses, for the Metazoa, the general aspects of mitochondrial transcription and replication with respect to control regions as well as several proposed models of gene rearrangements. As whole genome sequencing projects accumulate, more and more observations about mitochondrial gene transfer to the nucleus are reported. Thus occurrence and phylogenetic aspects concerning nuclear mitochondrial-like sequences (NUMTS) is another aspect of this review.


Assuntos
Replicação do DNA , Evolução Molecular , Genoma Mitocondrial , Animais , Núcleo Celular/genética , Reparo do DNA , DNA Mitocondrial/genética , Rearranjo Gênico , Código Genético , Mitocôndrias/genética , Modelos Genéticos , Filogenia , Análise de Sequência de DNA , Transcriptoma
16.
Mol Phylogenet Evol ; 69(2): 339-51, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23891951

RESUMO

Unraveling the base of metazoan evolution is of crucial importance for rooting the metazoan Tree of Life. This subject has attracted substantial attention for more than a century and recently fueled a burst of modern phylogenetic studies. Conflicting scenarios from different studies and incongruent results from nuclear versus mitochondrial markers challenge current molecular phylogenetic approaches. Here we analyze the presently most comprehensive data sets of mitochondrial genomes from non-bilaterian animals to illuminate the phylogenetic relationships among early branching metazoan phyla. The results of our analyses illustrate the value of mitogenomics and support previously known topologies between animal phyla but also identify several problematic taxa, which are sensitive to long branch artifacts or missing data.


Assuntos
Evolução Molecular , Genoma Mitocondrial , Filogenia , Animais , Cnidários/classificação , Ctenóforos/classificação , Modelos Genéticos , Placozoa/classificação , Poríferos/classificação , Análise de Sequência de DNA
17.
Mol Phylogenet Evol ; 69(2): 320-7, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23023207

RESUMO

In this review we provide an overview of various bioinformatics methods and tools for the analysis of metazoan mitochondrial genomes. We compare available dedicated databases and present current tools for accurate genome annotation, identification of protein coding genes, and determination of tRNA and rRNA models.We also evaluate various tools and models for phylogenetic tree inference using gene order or sequence based data. As for gene order based methods, we compare rearrangement based and gene cluster based methods for gene order rearrangement analysis. As for sequence based methods, we give special emphasis to substitution models or data treatment that reduces certain systematic biases that are typical for metazoan mitogenomes such as within genome and/or among lineage compositional heterogeneity.


Assuntos
Biologia Computacional/métodos , Genoma Mitocondrial , Análise de Sequência de DNA/métodos , Animais , Bases de Dados Genéticas , Evolução Molecular , Ordem dos Genes , Rearranjo Gênico , Modelos Genéticos , Anotação de Sequência Molecular , Filogenia , RNA Ribossômico/genética , RNA de Transferência/genética
18.
Mol Phylogenet Evol ; 69(2): 313-9, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-22982435

RESUMO

About 2000 completely sequenced mitochondrial genomes are available from the NCBI RefSeq data base together with manually curated annotations of their protein-coding genes, rRNAs, and tRNAs. This annotation information, which has accumulated over two decades, has been obtained with a diverse set of computational tools and annotation strategies. Despite all efforts of manual curation it is still plagued by misassignments of reading directions, erroneous gene names, and missing as well as false positive annotations in particular for the RNA genes. Taken together, this causes substantial problems for fully automatic pipelines that aim to use these data comprehensively for studies of animal phylogenetics and the molecular evolution of mitogenomes. The MITOS pipeline is designed to compute a consistent de novo annotation of the mitogenomic sequences. We show that the results of MITOS match RefSeq and MitoZoa in terms of annotation coverage and quality. At the same time we avoid biases, inconsistencies of nomenclature, and typos originating from manual curation strategies. The MITOS pipeline is accessible online at http://mitos.bioinf.uni-leipzig.de.


Assuntos
Biologia Computacional , Genoma Mitocondrial , Anotação de Sequência Molecular , Software , Animais , Evolução Molecular , Internet , Filogenia , Análise de Sequência de DNA
19.
Mol Phylogenet Evol ; 69(2): 352-64, 2013 Nov.
Artigo em Inglês | MEDLINE | ID: mdl-23684911

RESUMO

About 2800 mitochondrial genomes of Metazoa are present in NCBI RefSeq today, two thirds belonging to vertebrates. Metazoan phylogeny was recently challenged by large scale EST approaches (phylogenomics), stabilizing classical nodes while simultaneously supporting new sister group hypotheses. The use of mitochondrial data in deep phylogeny analyses was often criticized because of high substitution rates on nucleotides, large differences in amino acid substitution rate between taxa, and biases in nucleotide frequencies. Nevertheless, mitochondrial genome data might still be promising as it allows for a larger taxon sampling, while presenting a smaller amount of sequence information. We present the most comprehensive analysis of bilaterian relationships based on mitochondrial genome data. The analyzed data set comprises more than 650 mitochondrial genomes that have been chosen to represent a profound sample of the phylogenetic as well as sequence diversity. The results are based on high quality amino acid alignments obtained from a complete reannotation of the mitogenomic sequences from NCBI RefSeq database. However, the results failed to give support for many otherwise undisputed high-ranking taxa, like Mollusca, Hexapoda, Arthropoda, and suffer from extreme long branches of Nematoda, Platyhelminthes, and some other taxa. In order to identify the sources of misleading phylogenetic signals, we discuss several problems associated with mitochondrial genome data sets, e.g. the nucleotide and amino acid landscapes and a strong correlation of gene rearrangements with long branches.


Assuntos
Ordem dos Genes , Genoma Mitocondrial , Filogenia , Substituição de Aminoácidos , Aminoácidos/genética , Animais , Teorema de Bayes , Rearranjo Gênico , Funções Verossimilhança , Modelos Genéticos , Nucleotídeos/genética , Alinhamento de Sequência
20.
Front Genet ; 14: 1250907, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37636259

RESUMO

A wide range of scientific fields, such as forensics, anthropology, medicine, and molecular evolution, benefits from the analysis of mitogenomic data. With the development of new sequencing technologies, the amount of mitochondrial sequence data to be analyzed has increased exponentially over the last few years. The accurate annotation of mitochondrial DNA is a prerequisite for any mitogenomic comparative analysis. To sustain with the growth of the available mitochondrial sequence data, highly efficient automatic computational methods are, hence, needed. Automatic annotation methods are typically based on databases that contain information about already annotated (and often pre-curated) mitogenomes of different species. However, the existing approaches have several shortcomings: 1) they do not scale well with the size of the database; 2) they do not allow for a fast (and easy) update of the database; and 3) they can only be applied to a relatively small taxonomic subset of all species. Here, we present a novel approach that does not have any of these aforementioned shortcomings, (1), (2), and (3). The reference database of mitogenomes is represented as a richly annotated de Bruijn graph. To generate gene predictions for a new user-supplied mitogenome, the method utilizes a clustering routine that uses the mapping information of the provided sequence to this graph. The method is implemented in a software package called DeGeCI (De Bruijn graph Gene Cluster Identification). For a large set of mitogenomes, for which expert-curated annotations are available, DeGeCI generates gene predictions of high conformity. In a comparative evaluation with MITOS2, a state-of-the-art annotation tool for mitochondrial genomes, DeGeCI shows better database scalability while still matching MITOS2 in terms of result quality and providing a fully automated means to update the underlying database. Moreover, unlike MITOS2, DeGeCI can be run in parallel on several processors to make use of modern multi-processor systems.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA