Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 23
Filtrar
1.
BMC Genomics ; 23(1): 477, 2022 Jun 28.
Artículo en Inglés | MEDLINE | ID: mdl-35764934

RESUMEN

BACKGROUND: Calling germline SNP variants from bisulfite-converted sequencing data poses a challenge for conventional software, which have no inherent capability to dissociate true polymorphisms from artificial mutations induced by the chemical treatment. Nevertheless, SNP data is desirable both for genotyping and to understand the DNA methylome in the context of the genetic background. The confounding effect of bisulfite conversion however can be conceptually resolved by observing differences in allele counts on a per-strand basis, whereby artificial mutations are reflected by non-complementary base pairs. RESULTS: Herein, we present a computational pre-processing approach for adapting sequence alignment data, thus indirectly enabling downstream analysis on a per-strand basis using conventional variant calling software such as GATK or Freebayes. In comparison to specialised tools, the method represents a marked improvement in precision-sensitivity based on high-quality, published benchmark datasets for both human and model plant variants. CONCLUSION: The presented "double-masking" procedure represents an open source, easy-to-use method to facilitate accurate variant calling using conventional software, thus negating any dependency on specialised tools and mitigating the need to generate additional, conventional sequencing libraries alongside bisulfite sequencing experiments. The method is available at https://github.com/bio15anu/revelio and an implementation with Freebayes is available at https://github.com/EpiDiverse/SNP.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Teorema de Bayes , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Humanos , Alineación de Secuencia , Análisis de Secuencia de ADN/métodos , Sulfitos
2.
Plant Biotechnol J ; 20(5): 944-963, 2022 05.
Artículo en Inglés | MEDLINE | ID: mdl-34990041

RESUMEN

Thlaspi arvense (field pennycress) is being domesticated as a winter annual oilseed crop capable of improving ecosystems and intensifying agricultural productivity without increasing land use. It is a selfing diploid with a short life cycle and is amenable to genetic manipulations, making it an accessible field-based model species for genetics and epigenetics. The availability of a high-quality reference genome is vital for understanding pennycress physiology and for clarifying its evolutionary history within the Brassicaceae. Here, we present a chromosome-level genome assembly of var. MN106-Ref with improved gene annotation and use it to investigate gene structure differences between two accessions (MN108 and Spring32-10) that are highly amenable to genetic transformation. We describe non-coding RNAs, pseudogenes and transposable elements, and highlight tissue-specific expression and methylation patterns. Resequencing of forty wild accessions provided insights into genome-wide genetic variation, and QTL regions were identified for a seedling colour phenotype. Altogether, these data will serve as a tool for pennycress improvement in general and for translational research across the Brassicaceae.


Asunto(s)
Thlaspi , Cromosomas , Ecosistema , Genoma de Planta/genética , Anotación de Secuencia Molecular , Thlaspi/genética , Investigación Biomédica Traslacional
3.
Quant Plant Biol ; 3: e19, 2022.
Artículo en Inglés | MEDLINE | ID: mdl-37077980

RESUMEN

Whole-genome bisulfite sequencing (WGBS) is the standard method for profiling DNA methylation at single-nucleotide resolution. Different tools have been developed to extract differentially methylated regions (DMRs), often built upon assumptions from mammalian data. Here, we present MethylScore, a pipeline to analyse WGBS data and to account for the substantially more complex and variable nature of plant DNA methylation. MethylScore uses an unsupervised machine learning approach to segment the genome by classification into states of high and low methylation. It processes data from genomic alignments to DMR output and is designed to be usable by novice and expert users alike. We show how MethylScore can identify DMRs from hundreds of samples and how its data-driven approach can stratify associated samples without prior information. We identify DMRs in the A. thaliana 1,001 Genomes dataset to unveil known and unknown genotype-epigenotype associations .

4.
Epigenomes ; 5(2)2021 May 04.
Artículo en Inglés | MEDLINE | ID: mdl-34968299

RESUMEN

Bisulfite sequencing is a widely used technique for determining DNA methylation and its relationship with epigenetics, genetics, and environmental parameters. Various techniques were implemented for epigenome-wide association studies (EWAS) to reveal meaningful associations; however, there are only very few plant studies available to date. Here, we developed the EpiDiverse EWAS pipeline and tested it using two plant datasets, from P. abies (Norway spruce) and Q. lobata (valley oak). Hence, we present an EWAS implementation tested for non-model plant species and describe its use.

5.
NAR Genom Bioinform ; 3(4): lqab106, 2021 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34805989

RESUMEN

The expanding scope and scale of next generation sequencing experiments in ecological plant epigenetics brings new challenges for computational analysis. Existing tools built for model data may not address the needs of users looking to apply these techniques to non-model species, particularly on a population or community level. Here we present a toolkit suitable for plant ecologists working with whole genome bisulfite sequencing; it includes pipelines for mapping, the calling of methylation values and differential methylation between groups, epigenome-wide association studies, and a novel implementation for both variant calling and discriminating between genetic and epigenetic variation.

7.
Brief Bioinform ; 22(5)2021 09 02.
Artículo en Inglés | MEDLINE | ID: mdl-33624017

RESUMEN

Whole genome bisulfite sequencing is currently at the forefront of epigenetic analysis, facilitating the nucleotide-level resolution of 5-methylcytosine (5mC) on a genome-wide scale. Specialized software have been developed to accommodate the unique difficulties in aligning such sequencing reads to a given reference, building on the knowledge acquired from model organisms such as human, or Arabidopsis thaliana. As the field of epigenetics expands its purview to non-model plant species, new challenges arise which bring into question the suitability of previously established tools. Herein, nine short-read aligners are evaluated: Bismark, BS-Seeker2, BSMAP, BWA-meth, ERNE-BS5, GEM3, GSNAP, Last and segemehl. Precision-recall of simulated alignments, in comparison to real sequencing data obtained from three natural accessions, reveals on-balance that BWA-meth and BSMAP are able to make the best use of the data during mapping. The influence of difficult-to-map regions, characterized by deviations in sequencing depth over repeat annotations, is evaluated in terms of the mean absolute deviation of the resulting methylation calls in comparison to a realistic methylome. Downstream methylation analysis is responsive to the handling of multi-mapping reads relative to mapping quality (MAPQ), and potentially susceptible to bias arising from the increased sequence complexity of densely methylated reads.


Asunto(s)
Benchmarking/métodos , Metilación de ADN/genética , Epigenómica/métodos , Fragaria/genética , Genoma de Planta , Poaceae/genética , Programas Informáticos , Sulfitos/farmacología , Thlaspi/genética , Mapeo Cromosómico/métodos , ADN de Plantas/efectos de los fármacos , ADN de Plantas/genética , Epigénesis Genética , Alineación de Secuencia/métodos , Secuenciación Completa del Genoma/métodos
8.
RNA ; 23(8): 1259-1269, 2017 08.
Artículo en Inglés | MEDLINE | ID: mdl-28473453

RESUMEN

The hard tick Ixodes ricinus is an important disease vector whose salivary secretions mediate blood-feeding success on vertebrate hosts, including humans. Here we describe the expression profiles and downstream analysis of de novo-discovered microRNAs (miRNAs) expressed in I. ricinus salivary glands and saliva. Eleven tick-derived libraries were sequenced to produce 67,375,557 Illumina reads. De novo prediction yielded 67 bona fide miRNAs out of which 35 are currently not present in miRBase. We report for the first time the presence of microRNAs in tick saliva, obtaining furthermore molecular indicators that those might be of exosomal origin. Ten out of these microRNAs are at least 100 times more represented in saliva. For the four most expressed microRNAs from this subset, we analyzed their combinatorial effects upon their host transcriptome using a novel in silico target network approach. We show that only the inclusion of combinatorial effects reveals the functions in important pathways related to inflammation and pain sensing. A control set of highly abundant microRNAs in both saliva and salivary glands indicates no significant pathways and a far lower number of shared target genes. Therefore, the analysis of miRNAs from pure tick saliva strongly supports the hypothesis that tick saliva miRNAs can modulate vertebrate host homeostasis and represents the first direct evidence of tick miRNA-mediated regulation of vertebrate host gene expression at the tick-host interface. As such, the herein described miRNAs may support future drug discovery and development projects that will also experimentally question their predicted molecular targets in the vertebrate host.


Asunto(s)
Redes Reguladoras de Genes , Interacciones Huésped-Parásitos/genética , Ixodes/genética , MicroARNs/análisis , Saliva/química , Infestaciones por Garrapatas/parasitología , Transcriptoma , Animales , Simulación por Computador , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , MicroARNs/genética , Saliva/metabolismo , Glándulas Salivales/metabolismo , Infestaciones por Garrapatas/genética , Vertebrados/parasitología
9.
Methods Mol Biol ; 1097: 437-56, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-24639171

RESUMEN

The computational identification of novel microRNA (miRNA) genes is a challenging task in bioinformatics. Massive amounts of data describing unknown functional RNA transcripts have to be analyzed for putative miRNA candidates with automated computational pipelines. Beyond those miRNAs that meet the classical definition, high-throughput sequencing techniques have revealed additional miRNA-like molecules that are derived by alternative biogenesis pathways. Exhaustive bioinformatics analyses on such data involve statistical issues as well as precise sequence and structure inspection not only of the functional mature part but also of the whole precursor sequence of the putative miRNA. Apart from a considerable amount of species-specific miRNAs, the majority of all those genes are conserved at least among closely related organisms. Some miRNAs, however, can be traced back to very early points in the evolution of eukaryotic species. Thus, the investigation of the conservation of newly found miRNA candidates comprises an important step in the computational annotation of miRNAs.Topics covered in this chapter include a review on the obvious problem of miRNA annotation and family definition, recommended pipelines of computational miRNA annotation or detection, and an overview of current computer tools for the prediction of miRNAs and their limitations. The chapter closes discussing how those bioinformatic approaches address the problem of faithful miRNA prediction and correct annotation.


Asunto(s)
Biología Computacional/métodos , Genómica/métodos , MicroARNs/química , MicroARNs/genética , Bases de Datos de Ácidos Nucleicos , Internet , Programas Informáticos
10.
Genome Biol ; 15(2): R34, 2014 Feb 10.
Artículo en Inglés | MEDLINE | ID: mdl-24512684

RESUMEN

Numerous high-throughput sequencing studies have focused on detecting conventionally spliced mRNAs in RNA-seq data. However, non-standard RNAs arising through gene fusion, circularization or trans-splicing are often neglected. We introduce a novel, unbiased algorithm to detect splice junctions from single-end cDNA sequences. In contrast to other methods, our approach accommodates multi-junction structures. Our method compares favorably with competing tools for conventionally spliced mRNAs and, with a gain of up to 40% of recall, systematically outperforms them on reads with multiple splits, trans-splicing and circular products. The algorithm is integrated into our mapping tool segemehl (http://www.bioinf.uni-leipzig.de/Software/segemehl/).


Asunto(s)
Algoritmos , Empalme del ARN/genética , ARN/genética , Trans-Empalme/genética , ADN Complementario/genética , Secuenciación de Nucleótidos de Alto Rendimiento , ARN Circular , ARN Mensajero/metabolismo , Programas Informáticos
11.
Front Plant Sci ; 5: 708, 2014.
Artículo en Inglés | MEDLINE | ID: mdl-25566282

RESUMEN

High-throughput sequencing techniques have made it possible to assay an organism's entire repertoire of small non-coding RNAs (ncRNAs) in an efficient and cost-effective manner. The moderate size of small RNA-seq datasets makes it feasible to provide free web services to the research community that provide many basic features of a small RNA-seq analysis, including quality control, read normalization, ncRNA quantification, and the prediction of putative novel ncRNAs. DARIO is one such system that so far has been focussed on animals. Here we introduce an extension of this system to plant short non-coding RNAs (sncRNAs). It includes major modifications to cope with plant-specific sncRNA processing. The current version of plantDARIO covers analyses of mapping files, small RNA-seq quality control, expression analyses of annotated sncRNAs, including the prediction of novel miRNAs and snoRNAs from unknown expressed loci and expression analyses of user-defined loci. At present Arabidopsis thaliana, Beta vulgaris, and Solanum lycopersicum are covered. The web tool links to a plant specific visualization browser to display the read distribution of the analyzed sample. The easy-to-use platform of plantDARIO quantifies RNA expression of annotated sncRNAs from different sncRNA databases together with new sncRNAs, annotated by our group. The plantDARIO website can be accessed at http://plantdario.bioinf.uni-leipzig.de/.

12.
PLoS Genet ; 9(7): e1003588, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23861667

RESUMEN

The chromosome 9p21 (Chr9p21) locus of coronary artery disease has been identified in the first surge of genome-wide association and is the strongest genetic factor of atherosclerosis known today. Chr9p21 encodes the long non-coding RNA (ncRNA) antisense non-coding RNA in the INK4 locus (ANRIL). ANRIL expression is associated with the Chr9p21 genotype and correlated with atherosclerosis severity. Here, we report on the molecular mechanisms through which ANRIL regulates target-genes in trans, leading to increased cell proliferation, increased cell adhesion and decreased apoptosis, which are all essential mechanisms of atherogenesis. Importantly, trans-regulation was dependent on Alu motifs, which marked the promoters of ANRIL target genes and were mirrored in ANRIL RNA transcripts. ANRIL bound Polycomb group proteins that were highly enriched in the proximity of Alu motifs across the genome and were recruited to promoters of target genes upon ANRIL over-expression. The functional relevance of Alu motifs in ANRIL was confirmed by deletion and mutagenesis, reversing trans-regulation and atherogenic cell functions. ANRIL-regulated networks were confirmed in 2280 individuals with and without coronary artery disease and functionally validated in primary cells from patients carrying the Chr9p21 risk allele. Our study provides a molecular mechanism for pro-atherogenic effects of ANRIL at Chr9p21 and suggests a novel role for Alu elements in epigenetic gene regulation by long ncRNAs.


Asunto(s)
Elementos Alu/genética , Aterosclerosis/genética , Enfermedad de la Arteria Coronaria/genética , ARN Largo no Codificante/genética , Apoptosis/genética , Aterosclerosis/patología , Adhesión Celular/genética , Proliferación Celular , Cromosomas Humanos Par 9/genética , Enfermedad de la Arteria Coronaria/patología , Epigénesis Genética , Regulación de la Expresión Génica , Redes Reguladoras de Genes , Predisposición Genética a la Enfermedad , Estudio de Asociación del Genoma Completo , Células HEK293 , Humanos , Proteínas del Grupo Polycomb , Polimorfismo de Nucleótido Simple
13.
RNA Biol ; 10(7): 1204-10, 2013 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-23702463

RESUMEN

Prokaryotic transcripts constitute almost always uninterrupted intervals when mapped back to the genome. Split reads, i.e., RNA-seq reads consisting of parts that only map to discontiguous loci, are thus disregarded in most analysis pipelines. There are, however, some well-known exceptions, in particular, tRNA splicing and circularized small RNAs in Archaea as well as self-splicing introns. Here, we reanalyze a series of published RNA-seq data sets, screening them specifically for non-contiguously mapping reads. We recover most of the known cases together with several novel archaeal ncRNAs associated with circularized products. In Eubacteria, only a handful of interesting candidates were obtained beyond a few previously described group I and group II introns. Most of the atypically mapping reads do not appear to correspond to well-defined, specifically processed products. Whether this diffuse background is, at least in part, an incidental by-product of prokaryotic RNA processing or whether it consists entirely of technical artifacts of reverse transcription or amplification remains unknown.


Asunto(s)
Biología Computacional/métodos , Células Procariotas/metabolismo , ARN/química , Análisis de Secuencia de ARN , Transcriptoma , Archaea/genética , Bacterias/genética , Genómica/métodos , Anotación de Secuencia Molecular , ARN/genética
14.
J Exp Zool B Mol Dev Evol ; 320(1): 35-46, 2013 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-23165937

RESUMEN

Canonical microRNAs are excised from their hairpin-shaped precursors by Dicer. In order to find possible exceptions to this rule and to identify additional substrates for Dicer processing we re-evaluate the small RNA sequencing data of the Dicer knockdown experiment in MCF-7 cells orignally published by Friedländer et al. [Friedländer et al., 2012, Nucleic Acids Res 40:37-52]. While the well-known non-Dicer mir-451 is not sufficiently expressed in these experiments, there are several additional Dicer-independent microRNAs, among them the important tumor supressor mir-663a. We recover previously described examples of non-miRNA Dicer substrates such as tRNA-Gln and several snoRNAs. Interestingly, sdRNAs derived from box C/D snoRNAs are Dicer-independent, while those derived from box H/ACA snoRNAs are often Dicer dependent. Several pol-III transcripts, in particular the vault RNAs and the great ape specific snaRs are processed by Dicer, while the small RNAs originating from Y RNAs seem to be Dicer independent.


Asunto(s)
ARN Helicasas DEAD-box/metabolismo , Genoma Humano/genética , MicroARNs/metabolismo , Ribonucleasa III/metabolismo , ARN Helicasas DEAD-box/genética , ADN Polimerasa III/metabolismo , Técnicas de Silenciamiento del Gen , Humanos , Células MCF-7 , Ribonucleasa III/genética
15.
Nat Genet ; 44(12): 1316-20, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-23143595

RESUMEN

Burkitt lymphoma is a mature aggressive B-cell lymphoma derived from germinal center B cells. Its cytogenetic hallmark is the Burkitt translocation t(8;14)(q24;q32) and its variants, which juxtapose the MYC oncogene with one of the three immunoglobulin loci. Consequently, MYC is deregulated, resulting in massive perturbation of gene expression. Nevertheless, MYC deregulation alone seems not to be sufficient to drive Burkitt lymphomagenesis. By whole-genome, whole-exome and transcriptome sequencing of four prototypical Burkitt lymphomas with immunoglobulin gene (IG)-MYC translocation, we identified seven recurrently mutated genes. One of these genes, ID3, mapped to a region of focal homozygous loss in Burkitt lymphoma. In an extended cohort, 36 of 53 molecularly defined Burkitt lymphomas (68%) carried potentially damaging mutations of ID3. These were strongly enriched at somatic hypermutation motifs. Only 6 of 47 other B-cell lymphomas with the IG-MYC translocation (13%) carried ID3 mutations. These findings suggest that cooperation between ID3 inactivation and IG-MYC translocation is a hallmark of Burkitt lymphomagenesis.


Asunto(s)
Linfoma de Burkitt/genética , Proteínas Inhibidoras de la Diferenciación/genética , Mutación , Proteínas de Neoplasias/genética , Transcriptoma/genética , Secuencia de Bases , Mapeo Cromosómico , Cromosomas Humanos Par 14/genética , Cromosomas Humanos Par 8/genética , Estudios de Cohortes , Femenino , Genes de Inmunoglobulinas , Genes myc/genética , Genoma Humano , Humanos , Masculino , Datos de Secuencia Molecular , Análisis de Secuencia de ADN , Hipermutación Somática de Inmunoglobulina , Translocación Genética/genética
16.
Bioinformatics ; 28(1): 17-24, 2012 Jan 01.
Artículo en Inglés | MEDLINE | ID: mdl-22053076

RESUMEN

MOTIVATION: High-throughput sequencing methods allow whole transcriptomes to be sequenced fast and cost-effectively. Short RNA sequencing provides not only quantitative expression data but also an opportunity to identify novel coding and non-coding RNAs. Many long transcripts undergo post-transcriptional processing that generates short RNA sequence fragments. Mapped back to a reference genome, they form distinctive patterns that convey information on both the structure of the parent transcript and the modalities of its processing. The miR-miR* pattern from microRNA precursors is the best-known, but by no means singular, example. RESULTS: deepBlockAlign introduces a two-step approach to align RNA-seq read patterns with the aim of quickly identifying RNAs that share similar processing footprints. Overlapping mapped reads are first merged to blocks and then closely spaced blocks are combined to block groups, each representing a locus of expression. In order to compare block groups, the constituent blocks are first compared using a modified sequence alignment algorithm to determine similarity scores for pairs of blocks. In the second stage, block patterns are compared by means of a modified Sankoff algorithm that takes both block similarities and similarities of pattern of distances within the block groups into account. Hierarchical clustering of block groups clearly separates most miRNA and tRNA, and also identifies about a dozen tRNAs clustering together with miRNA. Most of these putative Dicer-processed tRNAs, including eight cases reported to generate products with miRNA-like features in literature, exhibit read blocks distinguished by precise start position of reads. AVAILABILITY: The program deepBlockAlign is available as source code from http://rth.dk/resources/dba/. CONTACT: gorodkin@rth.dk; studla@bioinf.uni-leipzig.de SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Secuencia de Bases , Humanos , MicroARNs/genética , ARN no Traducido/análisis , ARN no Traducido/genética , Alineación de Secuencia , Transcriptoma
17.
Nucleic Acids Res ; 39(Web Server issue): W112-7, 2011 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-21622957

RESUMEN

Small non-coding RNAs (ncRNAs) such as microRNAs, snoRNAs and tRNAs are a diverse collection of molecules with several important biological functions. Current methods for high-throughput sequencing for the first time offer the opportunity to investigate the entire ncRNAome in an essentially unbiased way. However, there is a substantial need for methods that allow a convenient analysis of these overwhelmingly large data sets. Here, we present DARIO, a free web service that allows to study short read data from small RNA-seq experiments. It provides a wide range of analysis features, including quality control, read normalization, ncRNA quantification and prediction of putative ncRNA candidates. The DARIO web site can be accessed at http://dario.bioinf.uni-leipzig.de/.


Asunto(s)
ARN no Traducido/química , ARN no Traducido/metabolismo , Análisis de Secuencia de ARN , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , Internet , ARN no Traducido/análisis , Interfaz Usuario-Computador
18.
Biol Chem ; 392(4): 305-13, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21345160

RESUMEN

Many aspects of the RNA maturation leave traces in RNA sequencing data in the form of deviations from the reference genomic DNA. This includes, in particular, genomically non-encoded nucleotides and chemical modifications. The latter leave their signatures in the form of mismatches and conspicuous patterns of sequencing reads. Modified mapping procedures focusing on particular types of deviations can help to unravel post-transcriptional modification, maturation and degradation processes. Here, we focus on small RNA sequencing data that is produced in large quantities aimed at the analysis of microRNA expression. Starting from the recovery of many well known modified sites in tRNAs, we provide evidence that modified nucleotides are a pervasive phenomenon in these data sets. Regarding non-encoded nucleotides we concentrate on CCA tails, which surprisingly can be found in a diverse collection of transcripts including sub-populations of mature microRNAs. Although small RNA sequencing libraries alone are insufficient to obtain a complete picture, they can inform on many aspects of the complex processes of RNA maturation.


Asunto(s)
Biología Computacional , Procesamiento Postranscripcional del ARN , ARN/genética , ARN/metabolismo , Análisis de Secuencia de ADN , Animales , Secuencia de Bases , Biblioteca de Genes , Humanos , MicroARNs/genética , MicroARNs/metabolismo , ARN/química , ARN Nucleotidiltransferasas/metabolismo , ARN Pequeño no Traducido/química , ARN Pequeño no Traducido/genética , ARN Pequeño no Traducido/metabolismo , ARN de Transferencia/química , ARN de Transferencia/genética , ARN de Transferencia/metabolismo
19.
PLoS Biol ; 8(9)2010 Sep 07.
Artículo en Inglés | MEDLINE | ID: mdl-20838655

RESUMEN

A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo). Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb) includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.


Asunto(s)
Genoma , Pavos/genética , Animales , Secuencia de Bases , Mapeo Cromosómico , ADN/genética , Polimorfismo de Nucleótido Simple , Análisis de Secuencia de ADN , Homología de Secuencia de Ácido Nucleico , Especificidad de la Especie
20.
BMC Bioinformatics ; 11: 292, 2010 May 28.
Artículo en Inglés | MEDLINE | ID: mdl-20509939

RESUMEN

BACKGROUND: Virtually all currently available microRNA target site prediction algorithms require the presence of a (conserved) seed match to the 5' end of the microRNA. Recently however, it has been shown that this requirement might be too stringent, leading to a substantial number of missed target sites. RESULTS: We developed TargetSpy, a novel computational approach for predicting target sites regardless of the presence of a seed match. It is based on machine learning and automatic feature selection using a wide spectrum of compositional, structural, and base pairing features covering current biological knowledge. Our model does not rely on evolutionary conservation, which allows the detection of species-specific interactions and makes TargetSpy suitable for analyzing unconserved genomic sequences.In order to allow for an unbiased comparison of TargetSpy to other methods, we classified all algorithms into three groups: I) no seed match requirement, II) seed match requirement, and III) conserved seed match requirement. TargetSpy predictions for classes II and III are generated by appropriate postfiltering. On a human dataset revealing fold-change in protein production for five selected microRNAs our method shows superior performance in all classes. In Drosophila melanogaster not only our class II and III predictions are on par with other algorithms, but notably the class I (no-seed) predictions are just marginally less accurate. We estimate that TargetSpy predicts between 26 and 112 functional target sites without a seed match per microRNA that are missed by all other currently available algorithms. CONCLUSION: Only a few algorithms can predict target sites without demanding a seed match and TargetSpy demonstrates a substantial improvement in prediction accuracy in that class. Furthermore, when conservation and the presence of a seed match are required, the performance is comparable with state-of-the-art algorithms. TargetSpy was trained on mouse and performs well in human and drosophila, suggesting that it may be applicable to a broad range of species. Moreover, we have demonstrated that the application of machine learning techniques in combination with upcoming deep sequencing data results in a powerful microRNA target site prediction tool http://www.targetspy.org.


Asunto(s)
Inteligencia Artificial , MicroARNs/química , Programas Informáticos , Animales , Drosophila , Proteínas/química , ARN Mensajero/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA