Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 72
Filtrar
Más filtros












Base de datos
Intervalo de año de publicación
1.
PLoS Comput Biol ; 18(8): e1010303, 2022 08.
Artículo en Inglés | MEDLINE | ID: mdl-35939516

RESUMEN

Most methods for phylogenetic tree reconstruction are based on sequence alignments; they infer phylogenies from substitutions that may have occurred at the aligned sequence positions. Gaps in alignments are usually not employed as phylogenetic signal. In this paper, we explore an alignment-free approach that uses insertions and deletions (indels) as an additional source of information for phylogeny inference. For a set of four or more input sequences, we generate so-called quartet blocks of four putative homologous segments each. For pairs of such quartet blocks involving the same four sequences, we compare the distances between the two blocks in these sequences, to obtain hints about indels that may have happened between the blocks since the respective four sequences have evolved from their last common ancestor. A prototype implementation that we call Gap-SpaM is presented to infer phylogenetic trees from these data, using a quartet-tree approach or, alternatively, under the maximum-parsimony paradigm. This approach should not be regarded as an alternative to established methods, but rather as a complementary source of phylogenetic information. Interestingly, however, our software is able to produce phylogenetic trees from putative indels alone that are comparable to trees obtained with existing alignment-free methods.


Asunto(s)
Mutación INDEL , Programas Informáticos , Algoritmos , Mutación INDEL/genética , Filogenia , Alineación de Secuencia
2.
Microb Genom ; 7(10)2021 10.
Artículo en Inglés | MEDLINE | ID: mdl-34661518

RESUMEN

The intriguing recent discovery of Campylobacter coli strains, especially of clade 1, that (i) possess mosaic C. coli/C. jejuni alleles, (ii) demonstrate mixed multilocus sequence types (MLSTs) and (iii) have undergone genome-wide introgression has led to the speculation that these two species may be involved in an accelerated rate of horizontal gene transfer that is progressively leading to the merging of both species in a process coined 'despeciation'. In an MLST-based neighbour-joining tree of a number of C. coli and C. jejuni isolates of different clades, three prominent Campylobacter isolates formed a seemingly separate cluster besides the previously described C. coli and C. jejuni clades. In the light of the suspected, ongoing genetic introgression between the C. coli and C. jejuni species, this cluster of Campylobacter isolates is proposed to present one of the hybrid clonal complexes in the despeciation process of the genus. Specific DNA methylation as well as restriction modification systems are known to be involved in selective uptake of external DNA and their role in such genetic introgression remains to be further investigated. In this study, the phylogeny and DNA methylation of these putative C. coli/C. jejuni hybrid strains were explored, their genomic mosaic structure caused by C. jejuni introgression was demonstrated and basic phenotypic assays were used to characterize these isolates. The genomes of the three hybrid Campylobacter strains were sequenced using PacBio SMRT sequencing, followed by methylome analysis by Restriction-Modification Finder and genome analysis by Parsnp, Smash++ and blast. Additionally, the strains were phenotypically characterized with respect to growth behaviour, motility, eukaryotic cell invasion and adhesion, autoagglutination, biofilm formation, and water survival ability. Our analyses show that the three hybrid Campylobacter strains are clade 1 C. coli strains, which have acquired between 8.1 and 9.1 % of their genome from C. jejuni. The C. jejuni genomic segments acquired are distributed over the entire genome and do not form a coherent cluster. Most of the genes originating from C. jejuni are involved in chemotaxis and motility, membrane transport, cell signalling, or the resistance to toxic compounds such as bile acids. Interspecies gene transfer from C. jejuni has contributed 8.1-9.1% to the genome of three C. coli isolates and initiated the despeciation between C. jejuni and C. coli. Based on their functional annotation, the genes originating from C. jejuni enable the adaptation of the three strains to an intra-intestinal habitat. The transfer of a fused type II restriction-modification system that recognizes the CAYNNNNNCTC/GAGNNNNNRTG motif seems to be the key for the recombination of the C. jejuni genetic material with C. coli genomes.


Asunto(s)
Campylobacter coli/genética , Campylobacter jejuni/genética , Epigenoma , Genoma Bacteriano , Filogenia , Técnicas de Tipificación Bacteriana , Infecciones por Campylobacter/microbiología , Cromosomas Bacterianos/genética , ADN Bacteriano/genética , Transferencia de Gen Horizontal , Genómica , Tipificación de Secuencias Multilocus , Análisis de Secuencia de ADN
3.
Mol Plant Pathol ; 22(8): 939-953, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-33955130

RESUMEN

Amphidiploid fungal Verticillium longisporum strains Vl43 and Vl32 colonize the plant host Brassica napus but differ in their ability to cause disease symptoms. These strains represent two V. longisporum lineages derived from different hybridization events of haploid parental Verticillium strains. Vl32 and Vl43 carry same-sex mating-type genes derived from both parental lineages. Vl32 and Vl43 similarly colonize and penetrate plant roots, but asymptomatic Vl32 proliferation in planta is lower than virulent Vl43. The highly conserved Vl43 and Vl32 genomes include less than 1% unique genes, and the karyotypes of 15 or 16 chromosomes display changed genetic synteny due to substantial genomic reshuffling. A 20 kb Vl43 lineage-specific (LS) region apparently originating from the Verticillium dahliae-related ancestor is specific for symptomatic Vl43 and encodes seven genes, including two putative transcription factors. Either partial or complete deletion of this LS region in Vl43 did not reduce virulence but led to induction of even more severe disease symptoms in rapeseed. This suggests that the LS insertion in the genome of symptomatic V. longisporum Vl43 mediates virulence-reducing functions, limits damage on the host plant, and therefore tames Vl43 from being even more virulent.


Asunto(s)
Enfermedades de las Plantas , Verticillium , Ascomicetos , Genómica , Enfermedades de las Plantas/genética , Verticillium/genética , Virulencia/genética
4.
BMC Bioinformatics ; 22(1): 64, 2021 Feb 11.
Artículo en Inglés | MEDLINE | ID: mdl-33573603

RESUMEN

BACKGROUND: The advancement of SMRT technology has unfolded new opportunities of genome analysis with its longer read length and low GC bias. Alignment of the reads to their appropriate positions in the respective reference genome is the first but costliest step of any analysis pipeline based on SMRT sequencing. However, the state-of-the-art aligners often fail to identify distant homologies due to lack of conserved regions, caused by frequent genetic duplication and recombination. Therefore, we developed a novel alignment-free method of sequence mapping that is fast and accurate. RESULTS: We present a new mapper called S-conLSH that uses Spaced context based Locality Sensitive Hashing. With multiple spaced patterns, S-conLSH facilitates a gapped mapping of noisy long reads to the corresponding target locations of a reference genome. We have examined the performance of the proposed method on 5 different real and simulated datasets. S-conLSH is at least 2 times faster than the recently developed method lordFAST. It achieves a sensitivity of 99%, without using any traditional base-to-base alignment, on human simulated sequence data. By default, S-conLSH provides an alignment-free mapping in PAF format. However, it has an option of generating aligned output as SAM-file, if it is required for any downstream processing. CONCLUSIONS: S-conLSH is one of the first alignment-free reference genome mapping tools achieving a high level of sensitivity. The spaced-context is especially suitable for extracting distant similarities. The variable-length spaced-seeds or patterns add flexibility to the proposed algorithm by introducing gapped mapping of the noisy long reads. Therefore, S-conLSH may be considered as a prominent direction towards alignment-free sequence analysis.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Alineación de Secuencia , Análisis de Secuencia de ADN , Programas Informáticos
5.
Bioinform Adv ; 1(1): vbab027, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-36700102

RESUMEN

Motivation: Phylogenetic placement is the task of placing a query sequence of unknown taxonomic origin into a given phylogenetic tree of a set of reference sequences. A major field of application of such methods is, for example, the taxonomic identification of reads in metabarcoding or metagenomic studies. Several approaches to phylogenetic placement have been proposed in recent years. The most accurate of them requires a multiple sequence alignment of the references as input. However, calculating multiple alignments is not only time-consuming but also limits the applicability of these approaches. Results: Herein, we propose Alignment-free phylogenetic placement algorithm based on Spaced-word Matches (App-SpaM), an efficient algorithm for the phylogenetic placement of short sequencing reads on a tree of a set of reference sequences. App-SpaM produces results of high quality that are on a par with the best available approaches to phylogenetic placement, while our software is two orders of magnitude faster than these existing methods. Our approach neither requires a multiple alignment of the reference sequences nor alignments of the queries to the references. This enables App-SpaM to perform phylogenetic placement on a broad variety of datasets. Availability and implementation: The source code of App-SpaM is freely available on Github at https://github.com/matthiasblanke/App-SpaM together with detailed instructions for installation and settings. App-SpaM is furthermore available as a Conda-package on the Bioconda channel. Contact: matthias.blanke@biologie.uni-goettingen.de. Supplementary information: Supplementary data are available at Bioinformatics Advances online.

6.
Methods Mol Biol ; 2231: 121-134, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33289890

RESUMEN

Sequence alignment is at the heart of DNA and protein sequence analysis. For the data volumes that are nowadays produced by massively parallel sequencing technologies, however, pairwise and multiple alignment methods are often too slow. Therefore, fast alignment-free approaches to sequence comparison have become popular in recent years. Most of these approaches are based on word frequencies, for words of a fixed length, or on word-matching statistics. Other approaches are using the length of maximal word matches. While these methods are very fast, most of them rely on ad hoc measures of sequences similarity or dissimilarity that are hard to interpret. In this chapter, I describe a number of alignment-free methods that we developed in recent years. Our approaches are based on spaced-word matches ("SpaM"), i.e. on inexact word matches, that are allowed to contain mismatches at certain pre-defined positions. Unlike most previous alignment-free approaches, our approaches are able to accurately estimate phylogenetic distances between DNA or protein sequences using a stochastic model of molecular evolution.


Asunto(s)
Bioestadística/métodos , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Análisis de Secuencia de Proteína/métodos , Programas Informáticos , Algoritmos , Evolución Molecular , Filogenia , Alineación de Secuencia
7.
Front Microbiol ; 11: 1876, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32849460

RESUMEN

Verticillia cause a vascular wilt disease affecting a broad range of economically valuable crops. The fungus enters its host plants through the roots and colonizes the vascular system. It requires extracellular proteins for a successful plant colonization. The exoproteomes of the allodiploid Verticillium longisporum upon cultivation in different media or xylem sap extracted from its host plant Brassica napus were compared. Secreted fungal proteins were identified by label free liquid chromatography-tandem mass spectrometry screening. V. longisporum induced two main secretion patterns. One response pattern was elicited in various non-plant related environments. The second pattern includes the exoprotein responses to the plant-related media, pectin-rich simulated xylem medium and pure xylem sap, which exhibited similar but additional distinct features. These exoproteomes include a shared core set of 221 secreted and similarly enriched fungal proteins. The pectin-rich medium significantly induced the secretion of 143 proteins including a number of pectin degrading enzymes, whereas xylem sap triggered a smaller but unique fungal exoproteome pattern with 32 enriched proteins. The latter pattern included proteins with domains of known pathogenicity factors, metallopeptidases and carbohydrate-active enzymes. The most abundant proteins of these different groups are the necrosis and ethylene inducing-like proteins Nlp2 and Nlp3, the cerato-platanin proteins Cp1 and Cp2, the metallopeptidases Mep1 and Mep2 and the carbohydrate-active enzymes Gla1, Amy1 and Cbd1. Their pathogenicity contribution was analyzed in the haploid parental strain V. dahliae. Deletion of the majority of the corresponding genes caused no phenotypic changes during ex planta growth or invasion and colonization of tomato plants. However, we discovered that the MEP1, NLP2, and NLP3 deletion strains were compromised in plant infections. Overall, our exoproteome approach revealed that the fungus induces specific secretion responses in different environments. The fungus has a general response to non-plant related media whereas it is able to fine-tune its exoproteome in the presence of plant material. Importantly, the xylem sap-specific exoproteome pinpointed Nlp2 and Nlp3 as single effectors required for successful V. dahliae colonization.

8.
Gigascience ; 9(5)2020 05 01.
Artículo en Inglés | MEDLINE | ID: mdl-32432328

RESUMEN

BACKGROUND: The development of high-throughput sequencing technologies and, as its result, the production of huge volumes of genomic data, has accelerated biological and medical research and discovery. Study on genomic rearrangements is crucial owing to their role in chromosomal evolution, genetic disorders, and cancer. RESULTS: We present Smash++, an alignment-free and memory-efficient tool to find and visualize small- and large-scale genomic rearrangements between 2 DNA sequences. This computational solution extracts information contents of the 2 sequences, exploiting a data compression technique to find rearrangements. We also present Smash++ visualizer, a tool that allows the visualization of the detected rearrangements along with their self- and relative complexity, by generating an SVG (Scalable Vector Graphics) image. CONCLUSIONS: Tested on several synthetic and real DNA sequences from bacteria, fungi, Aves, and Mammalia, the proposed tool was able to accurately find genomic rearrangements. The detected regions were in accordance with previous studies, which took alignment-based approaches or performed FISH (fluorescence in situ hybridization) analysis. The maximum peak memory usage among all experiments was ∼1 GB, which makes Smash++ feasible to run on present-day standard computers.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , Programas Informáticos , Algoritmos , Reordenamiento Génico , Genoma , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ADN/métodos
9.
PLoS One ; 15(2): e0228070, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32040534

RESUMEN

We study the number Nk of length-k word matches between pairs of evolutionarily related DNA sequences, as a function of k. We show that the Jukes-Cantor distance between two genome sequences-i.e. the number of substitutions per site that occurred since they evolved from their last common ancestor-can be estimated from the slope of a function F that depends on Nk and that is affine-linear within a certain range of k. Integers kmin and kmax can be calculated depending on the length of the input sequences, such that the slope of F in the relevant range can be estimated from the values F(kmin) and F(kmax). This approach can be generalized to so-called Spaced-word Matches (SpaM), where mismatches are allowed at positions specified by a user-defined binary pattern. Based on these theoretical results, we implemented a prototype software program for alignment-free sequence comparison called Slope-SpaM. Test runs on real and simulated sequence data show that Slope-SpaM can accurately estimate phylogenetic distances for distances up to around 0.5 substitutions per position. The statistical stability of our results is improved if spaced words are used instead of contiguous words. Unlike previous alignment-free methods that are based on the number of (spaced) word matches, Slope-SpaM produces accurate results, even if sequences share only local homologies.


Asunto(s)
Filogenia , Análisis de Secuencia de ADN , Alineación de Secuencia , Homología de Secuencia de Ácido Nucleico
10.
NAR Genom Bioinform ; 2(1): lqz013, 2020 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-33575565

RESUMEN

Word-based or 'alignment-free' methods for phylogeny inference have become popular in recent years. These methods are much faster than traditional, alignment-based approaches, but they are generally less accurate. Most alignment-free methods calculate 'pairwise' distances between nucleic-acid or protein sequences; these distance values can then be used as input for tree-reconstruction programs such as neighbor-joining. In this paper, we propose the first word-based phylogeny approach that is based on 'multiple' sequence comparison and 'maximum likelihood'. Our algorithm first samples small, gap-free alignments involving four taxa each. For each of these alignments, it then calculates a quartet tree and, finally, the program 'Quartet MaxCut' is used to infer a super tree for the full set of input taxa from the calculated quartet trees. Experimental results show that trees produced with our approach are of high quality.

11.
BMC Bioinformatics ; 20(Suppl 20): 638, 2019 Dec 17.
Artículo en Inglés | MEDLINE | ID: mdl-31842735

RESUMEN

BACKGROUND: In many fields of biomedical research, it is important to estimate phylogenetic distances between taxa based on low-coverage sequencing reads. Major applications are, for example, phylogeny reconstruction, species identification from small sequencing samples, or bacterial strain typing in medical diagnostics. RESULTS: We adapted our previously developed software program Filtered Spaced-Word Matches (FSWM) for alignment-free phylogeny reconstruction to take unassembled reads as input; we call this implementation Read-SpaM. CONCLUSIONS: Test runs on simulated reads from semi-artificial and real-world bacterial genomes show that our approach can estimate phylogenetic distances with high accuracy, even for large evolutionary distances and for very low sequencing coverage.


Asunto(s)
Genoma Bacteriano , Alineación de Secuencia , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Secuencia de Bases , Escherichia coli/genética , Filogenia
12.
Genome Biol ; 20(1): 144, 2019 07 25.
Artículo en Inglés | MEDLINE | ID: mdl-31345254

RESUMEN

BACKGROUND: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. RESULTS: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. CONCLUSION: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions.


Asunto(s)
Análisis de Secuencia , Benchmarking , Transferencia de Gen Horizontal , Internet , Filogenia , Secuencias Reguladoras de Ácidos Nucleicos , Alineación de Secuencia , Análisis de Secuencia de Proteína , Programas Informáticos
13.
Bioinformatics ; 35(2): 211-218, 2019 01 15.
Artículo en Inglés | MEDLINE | ID: mdl-29992260

RESUMEN

Motivation: Most methods for pairwise and multiple genome alignment use fast local homology search tools to identify anchor points, i.e. high-scoring local alignments of the input sequences. Sequence segments between those anchor points are then aligned with slower, more sensitive methods. Finding suitable anchor points is therefore crucial for genome sequence comparison; speed and sensitivity of genome alignment depend on the underlying anchoring methods. Results: In this article, we use filtered spaced word matches to generate anchor points for genome alignment. For a given binary pattern representing match and don't-care positions, we first search for spaced-word matches, i.e. ungapped local pairwise alignments with matching nucleotides at the match positions of the pattern and possible mismatches at the don't-care positions. Those spaced-word matches that have similarity scores above some threshold value are then extended using a standard X-drop algorithm; the resulting local alignments are used as anchor points. To evaluate this approach, we used the popular multiple-genome-alignment pipeline Mugsy and replaced the exact word matches that Mugsy uses as anchor points with our spaced-word-based anchor points. For closely related genome sequences, the two anchoring procedures lead to multiple alignments of similar quality. For distantly related genomes, however, alignments calculated with our filtered-spaced-word matches are superior to alignments produced with the original Mugsy program where exact word matches are used to find anchor points. Availability and implementation: http://spacedanchor.gobics.de. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Genoma , Alineación de Secuencia/métodos , Programas Informáticos , Biología Computacional
14.
Gigascience ; 8(3)2019 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-30535314

RESUMEN

Word-based or 'alignment-free' sequence comparison has become an active research area in bioinformatics. While previous word-frequency approaches calculated rough measures of sequence similarity or dissimilarity, some new alignment-free methods are able to accurately estimate phylogenetic distances between genomic sequences. One of these approaches is Filtered Spaced Word Matches. Here, we extend this approach to estimate evolutionary distances between complete or incomplete proteomes; our implementation of this approach is called Prot-SpaM. We compare the performance of Prot-SpaM to other alignment-free methods on simulated sequences and on various groups of eukaryotic and prokaryotic taxa. Prot-SpaM can be used to calculate high-quality phylogenetic trees for dozens of whole-proteome sequences in a matter of seconds or minutes and often outperforms other alignment-free approaches. The source code of our software is available through Github: https://github.com/jschellh/ProtSpaM.


Asunto(s)
Filogenia , Proteoma/química , Alineación de Secuencia/métodos , Programas Informáticos , Secuencia de Aminoácidos , Animales , Bacterias/clasificación , Bases de Datos de Proteínas , Plantas/clasificación
15.
Algorithms Mol Biol ; 12: 27, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-29238399

RESUMEN

BACKGROUND: Various approaches to alignment-free sequence comparison are based on the length of exact or inexact word matches between pairs of input sequences. Haubold et al. (J Comput Biol 16:1487-1500, 2009) showed how the average number of substitutions per position between two DNA sequences can be estimated based on the average length of exact common substrings. RESULTS: In this paper, we study the length distribution of k-mismatch common substrings between two sequences. We show that the number of substitutions per position can be accurately estimated from the position of a local maximum in the length distribution of their k-mismatch common substrings.

16.
Bioinformatics ; 33(7): 971-979, 2017 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-28073754

RESUMEN

Motivation: Word-based or 'alignment-free' algorithms are increasingly used for phylogeny reconstruction and genome comparison, since they are much faster than traditional approaches that are based on full sequence alignments. Existing alignment-free programs, however, are less accurate than alignment-based methods. Results: We propose Filtered Spaced Word Matches (FSWM) , a fast alignment-free approach to estimate phylogenetic distances between large genomic sequences. For a pre-defined binary pattern of match and don't-care positions, FSWM rapidly identifies spaced word-matches between input sequences, i.e. gap-free local alignments with matching nucleotides at the match positions and with mismatches allowed at the don't-care positions. We then estimate the number of nucleotide substitutions per site by considering the nucleotides aligned at the don't-care positions of the identified spaced-word matches. To reduce the noise from spurious random matches, we use a filtering procedure where we discard all spaced-word matches for which the overall similarity between the aligned segments is below a threshold. We show that our approach can accurately estimate substitution frequencies even for distantly related sequences that cannot be analyzed with existing alignment-free methods; phylogenetic trees constructed with FSWM distances are of high quality. A program run on a pair of eukaryotic genomes of a few hundred Mb each takes a few minutes. Availability and Implementation: The program source code for FSWM including a documentation, as well as the software that we used to generate artificial genome sequences are freely available at http://fswm.gobics.de/. Contact: chris.leimeister@stud.uni-goettingen.de. Supplementary information: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Filogenia , Secuencia de Bases , Simulación por Computador , Genoma Bacteriano , Genoma de Planta , Genómica/métodos , Alineación de Secuencia , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Programas Informáticos , Factores de Tiempo
17.
PLoS Comput Biol ; 12(10): e1005107, 2016 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-27760124

RESUMEN

Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.


Asunto(s)
Algoritmos , ADN/genética , Sistemas de Administración de Bases de Datos , Bases de Datos Genéticas , Análisis de Secuencia de ADN/métodos , Programas Informáticos , ADN/química , Análisis Mutacional de ADN/métodos , Minería de Datos/métodos , Aprendizaje Automático , Reconocimiento de Normas Patrones Automatizadas/métodos , Alineación de Secuencia/métodos
18.
Nat Commun ; 6: 7822, 2015 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-26215380

RESUMEN

Genetic screens are powerful tools to identify the genes required for a given biological process. However, for technical reasons, comprehensive screens have been restricted to very few model organisms. Therefore, although deep sequencing is revealing the genes of ever more insect species, the functional studies predominantly focus on candidate genes previously identified in Drosophila, which is biasing research towards conserved gene functions. RNAi screens in other organisms promise to reduce this bias. Here we present the results of the iBeetle screen, a large-scale, unbiased RNAi screen in the red flour beetle, Tribolium castaneum, which identifies gene functions in embryonic and postembryonic development, physiology and cell biology. The utility of Tribolium as a screening platform is demonstrated by the identification of genes involved in insect epithelial adhesion. This work transcends the restrictions of the candidate gene approach and opens fields of research not accessible in Drosophila.


Asunto(s)
Desarrollo Embrionario/genética , Proteínas de Insectos/genética , Metamorfosis Biológica/genética , Oogénesis/genética , Interferencia de ARN , Tribolium/genética , Animales , Escarabajos/embriología , Escarabajos/genética , Escarabajos/fisiología , Secuenciación de Nucleótidos de Alto Rendimiento , Larva/genética , Pupa/genética , Tribolium/embriología , Tribolium/fisiología
19.
Metabolomics ; 11(3): 764-777, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25972773

RESUMEN

A central aim in the evaluation of non-targeted metabolomics data is the detection of intensity patterns that differ between experimental conditions as well as the identification of the underlying metabolites and their association with metabolic pathways. In this context, the identification of metabolites based on non-targeted mass spectrometry data is a major bottleneck. In many applications, this identification needs to be guided by expert knowledge and interactive tools for exploratory data analysis can significantly support this process. Additionally, the integration of data from other omics platforms, such as DNA microarray-based transcriptomics, can provide valuable hints and thereby facilitate the identification of metabolites via the reconstruction of related metabolic pathways. We here introduce the MarVis-Pathway tool, which allows the user to identify metabolites by annotation of pathways from cross-omics data. The analysis is supported by an extensive framework for pathway enrichment and meta-analysis. The tool allows the mapping of data set features by ID, name, and accurate mass, and can incorporate information from adduct and isotope correction of mass spectrometry data. MarVis-Pathway was integrated in the MarVis-Suite (http://marvis.gobics.de), which features the seamless highly interactive filtering, combination, clustering, and visualization of omics data sets. The functionality of the new software tool is illustrated using combined mass spectrometry and DNA microarray data. This application confirms jasmonate biosynthesis as important metabolic pathway that is upregulated during the wound response of Arabidopsis plants.

20.
Algorithms Mol Biol ; 10: 5, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25685176

RESUMEN

Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at: http://spaced.gobics.de/.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...