Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
Nucleic Acids Res ; 52(D1): D1661-D1667, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37650644

RESUMEN

The genus Camellia consists of about 200 species, which include many economically important species widely used for making tea, ornamental flowers and edible oil. Here, we present an updated tea plant information archive for Camellia genomics (TPIA2; http://tpia.teaplants.cn) by integrating more novel large-scale genomic, transcriptomic, metabolic and genetic variation datasets as well as a variety of useful tools. Specifically, TPIA2 hosts all currently available and well assembled 10 Camellia genomes and their comprehensive annotations from three major sections of Camellia. A collection of 15 million SNPs and 950 950 small indels from large-scale genome resequencing of 350 diverse tea accessions were newly incorporated, followed by the implementation of a novel 'Variation' module to facilitate data retrieval and analysis of the functionally annotated variome. Moreover, 116 Camellia transcriptomes were newly assembled and added, leading to a significant extension of expression profiles of Camellia genes to 13 developmental stages and eight abiotic/biotic treatments. An updated 'Expression' function has also been implemented to provide a comprehensive gene expression atlas for Camellia. Two novel analytic tools (e.g. Gene ID Convert and Population Genetic Analysis) were specifically designed to facilitate the data exchange and population genomics in Camellia. Collectively, TPIA2 provides diverse updated valuable genomic resources and powerful functions, and will continue to be an important gateway for functional genomics and population genetic studies in Camellia.


Asunto(s)
Camellia , Bases de Datos Genéticas , Camellia/genética , Camellia sinensis/genética , Camellia sinensis/metabolismo , Genoma de Planta , Genómica , Té/metabolismo
3.
BMC Bioinformatics ; 23(1): 162, 2022 May 05.
Artículo en Inglés | MEDLINE | ID: mdl-35513802

RESUMEN

BACKGROUND: Orphan gene play an important role in the environmental stresses of many species and their identification is a critical step to understand biological functions. Moso bamboo has high ecological, economic and cultural value. Studies have shown that the growth of moso bamboo is influenced by various stresses. Several traditional methods are time-consuming and inefficient. Hence, the development of efficient and high-accuracy computational methods for predicting orphan genes is of great significance. RESULTS: In this paper, we propose a novel deep learning model (CNN + Transformer) for identifying orphan genes in moso bamboo. It uses a convolutional neural network in combination with a transformer neural network to capture k-mer amino acids and features between k-mer amino acids in protein sequences. The experimental results show that the average balance accuracy value of CNN + Transformer on moso bamboo dataset can reach 0.875, and the average Matthews Correlation Coefficient (MCC) value can reach 0.471. For the same testing set, the Balance Accuracy (BA), Geometric Mean (GM), Bookmaker Informedness (BM), and MCC values of the recurrent neural network, long short-term memory, gated recurrent unit, and transformer models are all lower than those of CNN + Transformer, which indicated that the model has the extensive ability for OG identification in moso bamboo. CONCLUSIONS: CNN + Transformer model is feasible and obtains the credible predictive results. It may also provide valuable references for other related research. As our knowledge, this is the first model to adopt the deep learning techniques for identifying orphan genes in plants.


Asunto(s)
Aprendizaje Profundo , Regulación de la Expresión Génica de las Plantas , Aminoácidos/metabolismo , Poaceae/genética
4.
Front Genet ; 11: 820, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-33133122

RESUMEN

Orphan genes are associated with regulatory patterns, but experimental methods for identifying orphan genes are both time-consuming and expensive. Designing an accurate and robust classification model to detect orphan and non-orphan genes in unbalanced distribution datasets poses a particularly huge challenge. Synthetic minority over-sampling algorithms (SMOTE) are selected in a preliminary step to deal with unbalanced gene datasets. To identify orphan genes in balanced and unbalanced Arabidopsis thaliana gene datasets, SMOTE algorithms were then combined with traditional and advanced ensemble classified algorithms respectively, using Support Vector Machine, Random Forest (RF), AdaBoost (adaptive boosting), GBDT (gradient boosting decision tree), and XGBoost (extreme gradient boosting). After comparing the performance of these ensemble models, SMOTE algorithms with XGBoost achieved an F1 score of 0.94 with the balanced A. thaliana gene datasets, but a lower score with the unbalanced datasets. The proposed ensemble method combines different balanced data algorithms including Borderline SMOTE (BSMOTE), Adaptive Synthetic Sampling (ADSYN), SMOTE-Tomek, and SMOTE-ENN with the XGBoost model separately. The performances of the SMOTE-ENN-XGBoost model, which combined over-sampling and under-sampling algorithms with XGBoost, achieved higher predictive accuracy than the other balanced algorithms with XGBoost models. Thus, SMOTE-ENN-XGBoost provides a theoretical basis for developing evaluation criteria for identifying orphan genes in unbalanced and biological datasets.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...