Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
2.
Bioinformatics ; 31(24): 3897-905, 2015 Dec 15.
Artículo en Inglés | MEDLINE | ID: mdl-26315901

RESUMEN

MOTIVATION: Long non-coding RNAs (lncRNAs), which are non-coding RNAs of length above 200 nucleotides, play important biological functions such as gene expression regulation. To fully reveal the functions of lncRNAs, a fundamental step is to annotate them in various species. However, as lncRNAs tend to encode one or multiple open reading frames, it is not trivial to distinguish these long non-coding transcripts from protein-coding genes in transcriptomic data. RESULTS: In this work, we design a new tool that calculates the coding potential of a transcript using a machine learning model (random forest) based on multiple features including sequence characteristics of putative open reading frames, translation scores based on ribosomal coverage, and conservation against characterized protein families. The experimental results show that our tool competes favorably with existing coding potential computation tools in lncRNA identification. AVAILABILITY AND IMPLEMENTATION: The scripts and data can be downloaded at https://github.com/zhangy72/LncRNA-ID.


Asunto(s)
Aprendizaje Automático , ARN Largo no Codificante/genética , Programas Informáticos , Animales , Humanos , Ratones , Sistemas de Lectura Abierta , Proteínas/genética , Ribosomas/metabolismo
3.
Plant Physiol ; 164(2): 513-24, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24306534

RESUMEN

We have optimized and extended the widely used annotation engine MAKER in order to better support plant genome annotation efforts. New features include better parallelization for large repeat-rich plant genomes, noncoding RNA annotation capabilities, and support for pseudogene identification. We have benchmarked the resulting software tool kit, MAKER-P, using the Arabidopsis (Arabidopsis thaliana) and maize (Zea mays) genomes. Here, we demonstrate the ability of the MAKER-P tool kit to automatically update, extend, and revise the Arabidopsis annotations in light of newly available data and to annotate pseudogenes and noncoding RNAs absent from The Arabidopsis Informatics Resource 10 build. Our results demonstrate that MAKER-P can be used to manage and improve the annotations of even Arabidopsis, perhaps the best-annotated plant genome. We have also installed and benchmarked MAKER-P on the Texas Advanced Computing Center. We show that this public resource can de novo annotate the entire Arabidopsis and maize genomes in less than 3 h and produce annotations of comparable quality to those of the current The Arabidopsis Information Resource 10 and maize V2 annotation builds.


Asunto(s)
Arabidopsis/genética , Biología Computacional/métodos , Genoma de Planta/genética , Anotación de Secuencia Molecular/métodos , Programas Informáticos , Zea mays/genética , Empalme Alternativo/genética , Exones/genética , Genes de Plantas/genética , Seudogenes/genética , Secuencias Repetitivas de Ácidos Nucleicos/genética , Reproducibilidad de los Resultados
4.
PLoS Genet ; 8(11): e1003064, 2012.
Artículo en Inglés | MEDLINE | ID: mdl-23166516

RESUMEN

Unicellular marine algae have promise for providing sustainable and scalable biofuel feedstocks, although no single species has emerged as a preferred organism. Moreover, adequate molecular and genetic resources prerequisite for the rational engineering of marine algal feedstocks are lacking for most candidate species. Heterokonts of the genus Nannochloropsis naturally have high cellular oil content and are already in use for industrial production of high-value lipid products. First success in applying reverse genetics by targeted gene replacement makes Nannochloropsis oceanica an attractive model to investigate the cell and molecular biology and biochemistry of this fascinating organism group. Here we present the assembly of the 28.7 Mb genome of N. oceanica CCMP1779. RNA sequencing data from nitrogen-replete and nitrogen-depleted growth conditions support a total of 11,973 genes, of which in addition to automatic annotation some were manually inspected to predict the biochemical repertoire for this organism. Among others, more than 100 genes putatively related to lipid metabolism, 114 predicted transcription factors, and 109 transcriptional regulators were annotated. Comparison of the N. oceanica CCMP1779 gene repertoire with the recently published N. gaditana genome identified 2,649 genes likely specific to N. oceanica CCMP1779. Many of these N. oceanica-specific genes have putative orthologs in other species or are supported by transcriptional evidence. However, because similarity-based annotations are limited, functions of most of these species-specific genes remain unknown. Aside from the genome sequence and its analysis, protocols for the transformation of N. oceanica CCMP1779 are provided. The availability of genomic and transcriptomic data for Nannochloropsis oceanica CCMP1779, along with efficient transformation protocols, provides a blueprint for future detailed gene functional analysis and genetic engineering of Nannochloropsis species by a growing academic community focused on this genus.


Asunto(s)
Genoma , Anotación de Secuencia Molecular , Estramenopilos/genética , Secuencia de Bases , Genómica , Nitrógeno/administración & dosificación , Nitrógeno/metabolismo , Análisis de Secuencia de ADN , Análisis de Secuencia de ARN/métodos , Especificidad de la Especie , Estramenopilos/crecimiento & desarrollo , Transformación Genética
5.
BMC Bioinformatics ; 14 Suppl 2: S1, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-23369147

RESUMEN

BACKGROUND: Accurate secondary structure prediction provides important information to undefirstafinding the tertiary structures and thus the functions of ncRNAs. However, the accuracy of the native structure derivation of ncRNAs is still not satisfactory, especially on sequences containing pseudoknots. It is recently shown that using the abstract shapes, which retain adjacency and nesting of structural features but disregard the length details of helix and loop regions, can improve the performance of structure prediction. In this work, we use SVM-based feature selection to derive the consensus abstract shape of homologous ncRNAs and apply the predicted shape to structure prediction including pseudoknots. RESULTS: Our approach was applied to predict shapes and secondary structures on hundreds of ncRNA data sets with and without psuedoknots. The experimental results show that we can achieve 18% higher accuracy in shape prediction than the state-of-the-art consensus shape prediction tools. Using predicted shapes in structure prediction allows us to achieve approximate 29% higher sensitivity and 10% higher positive predictive value than other pseudoknot prediction tools. CONCLUSIONS: Extensive analysis of RNA properties based on SVM allows us to identify important properties of sequences and structures related to their shapes. The combination of mass data analysis and SVM-based feature selection makes our approach a promising method for shape and structure prediction. The implemented tools, Knot Shape and Knot Structure are open source software and can be downloaded at: http://www.cse.msu.edu/~achawana/KnotShape.


Asunto(s)
Conformación de Ácido Nucleico , ARN no Traducido/química , Programas Informáticos , Máquina de Vectores de Soporte , Biología Computacional , ARN no Traducido/genética
6.
Mamm Genome ; 24(11-12): 484-99, 2013 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-24202129

RESUMEN

The diversity of dog breeds make the domestic dog a valuable model for identifying genes responsible for many phenotypic and behavioral traits. The brain, in particular, is a region of interest for the analysis of molecular changes that are involved in dog-specific behavioral phenotypes. However, such studies are handicapped due to incomplete annotation of the dog genome. We present a high-coverage transcriptome of the dog brain using RNA-Seq. Two areas of the brain, hypothalamus and cerebral cortex, were selected for their roles in cognition, emotion, and neuroendocrine functions. We detected many novel features of the dog transcriptome, including 13,799 novel exons, 51,357 exons with unique 5' or 3' modifications, and many novel alternative splicing events. We provide some examples of novel features in genes that are related to domestication, including ADCY8, SMOC2, and PRNP. We also found 247 novel protein-coding genes and 328 noncoding RNAs, including 57 long noncoding RNAs that represent the first empirical evidence for a large fraction of noncoding RNAs in the dog. In addition, we analyze both gene expression and alternative splicing differences between the hypothalamus and cerebral cortex and find that there is very little overlap between genes that are differentially alternatively spliced and genes that are differentially expressed. We thereby suggest that researchers who want to pinpoint the genetic causes for dog breed-specific traits and diseases should not confine their studies to gene expression alone, but should consider other factors such as alternative splicing and changes in untranslated regions.


Asunto(s)
Corteza Cerebral/metabolismo , Perros/genética , Hipotálamo/metabolismo , Transcriptoma , Empalme Alternativo , Animales , Encéfalo/metabolismo , Corteza Cerebral/química , Perros/metabolismo , Exones , Masculino , ARN no Traducido/genética , ARN no Traducido/metabolismo
7.
J Bioinform Comput Biol ; 9(2): 317-37, 2011 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-21523935

RESUMEN

Many noncoding RNAs (ncRNAs) function through both their sequences and secondary structures. Thus, secondary structure derivation is an important issue in today's RNA research. The state-of-the-art structure annotation tools are based on comparative analysis, which derives consensus structure of homologous ncRNAs. Despite promising results from existing ncRNA aligning and consensus structure derivation tools, there is a need for more efficient and accurate ncRNA secondary structure modeling and alignment methods. In this work, we introduce a consensus structure derivation approach based on grammar string, a novel ncRNA secondary structure representation that encodes an ncRNA's sequence and secondary structure in the parameter space of a context-free grammar (CFG) and a full RNA grammar including pseudoknots. Being a string defined on a special alphabet constructed from a grammar, grammar string converts ncRNA alignment into sequence alignment. We derive consensus secondary structures from hundreds of ncRNA families from BraliBase 2.1 and 25 families containing pseudoknots using grammar string alignment. Our experiments have shown that grammar string-based structure derivation competes favorably in consensus structure quality with Murlet and RNASampler. Source code and experimental data are available at http://www.cse.msu.edu/~yannisun/grammar-string.


Asunto(s)
Conformación de Ácido Nucleico , ARN no Traducido/química , ARN no Traducido/genética , Alineación de Secuencia/estadística & datos numéricos , Biología Computacional , Simulación por Computador , Secuencia de Consenso , Genoma Humano , Humanos , Modelos Moleculares , Termodinámica
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA