Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros

Banco de datos
Tipo del documento
Publication year range
1.
BMC Plant Biol ; 24(1): 417, 2024 May 17.
Artículo en Inglés | MEDLINE | ID: mdl-38760756

RESUMEN

BACKGROUND: The Polygonaceae is a family well-known for its weeds, and edible plants, Fagopyrum (buckwheat) and Rheum (rhubarb), which are primarily herbaceous and temperate in distribution. Yet, the family also contains a number of lineages that are principally distributed in the tropics and subtropics. Notably, these lineages are woody, unlike their temperate relatives. To date, full-genome sequencing has focused on the temperate and herbaceous taxa. In an effort to increase breadth of genetic knowledge of the Polygonaceae, we here present six fully assembled and annotated chloroplast genomes from six of the tropical, woody genera: Coccoloba rugosa (a narrow and endangered Puerto Rican endemic), Gymnopodium floribundum, Neomillspaughia emarginata, Podopterus mexicanus, Ruprechtia coriacea, and Triplaris cumingiana. RESULTS: These assemblies represent the first publicly-available assembled and annotated plastomes for the genera Podopterus, Gymnopodium, and Neomillspaughia, and the first assembled and annotated plastomes for the species Coccoloba rugosa, Ruprechtia coriacea, and Triplaris cumingiana. We found the assembled chloroplast genomes to be above the median size of Polygonaceae plastomes, but otherwise exhibit features typical of the family. The features of greatest sequence variation are found among the ndh genes and in the small single copy (SSC) region of the plastome. The inverted repeats show high GC content and little sequence variation across genera. When placed in a phylogenetic context, our sequences were resolved within the Eriogonoideae. CONCLUSIONS: These six plastomes from among the tropical woody Polygonaceae appear typical within the family. The plastome assembly of Ruprechtia coriacea presented here calls into question the sequence identity of a previously published plastome assembly of R. albida.


Asunto(s)
Genoma del Cloroplasto , Polygonaceae , Polygonaceae/genética , Polygonaceae/clasificación , Filogenia , Anotación de Secuencia Molecular
2.
RNA Biol ; 20(1): 48-58, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36727270

RESUMEN

Automated genome annotation is essential for extracting biological information from sequence data. The identification and annotation of tRNA genes is frequently performed by the software package tRNAscan-SE, the output of which is listed for selected genomes in the Genomic tRNA database (GtRNAdb). Here, we highlight a pervasive error in prokaryotic tRNA gene sets on GtRNAdb: the mis-categorization of partial, non-canonical tRNA genes as standard, canonical tRNA genes. Firstly, we demonstrate the issue using the tRNA gene sets of 20 organisms from the archaeal taxon Thermococcaceae. According to GtRNAdb, these organisms collectively deviate from the expected set of tRNA genes in 15 instances, including the listing of eleven putative canonical tRNA genes. However, after detailed manual annotation, only one of these eleven remains; the others are either partial, non-canonical tRNA genes resulting from the integration of genetic elements or CRISPR-Cas activity (seven instances), or attributable to ambiguities in input sequences (three instances). Secondly, we show that similar examples of the mis-categorization of predicted tRNA sequences occur throughout the prokaryotic sections of GtRNAdb. While both canonical and non-canonical prokaryotic tRNA gene sequences identified by tRNAscan-SE are biologically interesting, the challenge of reliably distinguishing between them remains. We recommend employing a combination of (i) screening input sequences for the genetic elements typically associated with non-canonical tRNA genes, and ambiguities, (ii) activating the tRNAscan-SE automated pseudogene detection function, and (iii) scrutinizing predicted tRNA genes with low isotype scores. These measures greatly reduce manual annotation efforts, and lead to improved prokaryotic tRNA gene set predictions.


Asunto(s)
Genoma , ARN de Transferencia , ARN de Transferencia/genética
3.
Hum Hered ; 83(3): 163-172, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30685762

RESUMEN

BACKGROUND: tRNAscan-SE is the leading tool for transfer RNA (tRNA) annotation, which has been widely used in the field. However, tRNAscan-SE can return a significant number of false positives when applied to large sequences. Recently, conventional machine learning methods have been proposed to address this issue, but their efficiency can be still limited due to their dependency on handcrafted features. With the growing availability of large-scale genomic data-sets, deep learning methods, especially convolutional neural networks, have demonstrated excellent power in characterizing sequence patterns in genomic sequences. Thus, we hypothesize that deep learning may bring further improvement for tRNA prediction. METHODS: We proposed a new computational approach based on deep neural networks to predict tRNA gene sequences. We designed and investigated various deep neural network architectures. We used the tRNA sequences as positive samples, and the false-positive tRNA sequences predicted by tRNAscan-SE in coding sequences as negative samples, to train and evaluate the proposed models by comparison with the conventional machine learning methods and popular tRNA prediction tools. RESULTS: Using the one-hot encoding method, our proposed models can extract features without involving extensive manual feature engineering. Our proposed best model outperformed the existing methods under different performance metrics. CONCLUSION: The proposed deep learning methods can substantially reduce the false positive output by the state-of-the-art tool tRNAscan-SE. Coupled with tRNAscan-SE, it can serve as a useful complementary tool for tRNA annotation. The application to tRNA prediction demonstrates the superiority of deep learning in automatic feature generation for characterizing sequence patterns.


Asunto(s)
Algoritmos , Aprendizaje Profundo , ARN de Transferencia/genética , Área Bajo la Curva , Humanos , Modelos Teóricos , Redes Neurales de la Computación , Curva ROC , Programas Informáticos
4.
Bioinformation ; 14(7): 357-360, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-30262972

RESUMEN

Whole genome sequences (DNA sequences) of four uncultured archeon clones (1B6:CR626858.1, 4B7:CR626856.1, 22i07:JQ768096.1 and 19c08:JQ768095.1) were collected from NCBI BioSample database for the construction of digital data on tRNA. tRNAscan-SE 2.0 and ENDMEMO tools were used to identify and sketch tRNA structure as well as calculate Guanine-Cytosine (GC) percentage respectively. Eight true/functional tRNAs were identified from above 4 sequences which showed cove score greater than 20% with no variable loop. The tRNAs from the uncultured archeon clones were classified as Ala, Arg, Ile, Thr, Pro and Val type tRNA with cove score ranging from 34.22%-79.03%. The range of GC content was found 42.89%-56.91%; while tRNA contributed GC content ranging from 52%-64.86% to the total GC content in these sequences. The data fabricated in this study could be very useful for studying the diversity of tRNA among prokaryotes.

5.
Mol Inform ; 34(11-12): 761-70, 2015 11.
Artículo en Inglés | MEDLINE | ID: mdl-27491037

RESUMEN

tRNAScan-SE is a tRNA detection program that is widely used for tRNA annotation; however, the false positive rate of tRNAScan-SE is unacceptable for large sequences. Here, we used a machine learning method to try to improve the tRNAScan-SE results. A new predictor, tRNA-Predict, was designed. We obtained real and pseudo-tRNA sequences as training data sets using tRNAScan-SE and constructed three different tRNA feature sets. We then set up an ensemble classifier, LibMutil, to predict tRNAs from the training data. The positive data set of 623 tRNA sequences was obtained from tRNAdb 2009 and the negative data set was the false positive tRNAs predicted by tRNAscan-SE. Our in silico experiments revealed a prediction accuracy rate of 95.1 % for tRNA-Predict using 10-fold cross-validation. tRNA-Predict was developed to distinguish functional tRNAs from pseudo-tRNAs rather than to predict tRNAs from a genome-wide scan. However, tRNA-Predict can work with the output of tRNAscan-SE, which is a genome-wide scanning method, to improve the tRNAscan-SE annotation results. The tRNA-Predict web server is accessible at http://datamining.xmu.edu.cn/∼gjs/tRNA-Predict.


Asunto(s)
Bases de Datos de Ácidos Nucleicos , Anotación de Secuencia Molecular/métodos , ARN de Transferencia/genética , Análisis de Secuencia de ADN/métodos , Programas Informáticos
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda