Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Sensors (Basel) ; 23(24)2023 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-38139713

RESUMO

The accurate measurement of soil organic matter (SOM) is vital for maintaining soil quality. We present an innovative model for SOM prediction by integrating spectral and profile features. We use PCA, Lasso, and SCARS methods to extract important spectral features and combine them with profile data. This hybrid approach significantly improves SOM prediction across various models, including Random Forest, ExtraTrees, and XGBoost, boosting the coefficient of determination (R2) by up to 26%. Notably, the ExtraTrees model, enriched with PCA-extracted features, achieves the highest accuracy with an R2 of 0.931 and an RMSE of 0.068. Compared with single-feature models, this approach improves the R2 by 17% and 26% for PCA features of full-band spectra and profile features, respectively. Our findings highlight the potential of feature integration, especially the ExtraTrees model with PCA-extracted features and profile features, as a stable and accurate tool for SOM prediction in extensive study areas.

2.
Phytochemistry ; 200: 113222, 2022 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-35561852

RESUMO

In crops, RNA editing is one of the most important post-transcriptional processes in which specific cytidines (C) in virtually all mitochondrial protein-coding genes are converted to uridines (U). Despite extensive recent research in RNA editing, exploring all of the C-to-U editing events efficiently on the genomic scale remains challengeable. Developing accurate prediction methods for the detection of RNA editing sites would dramatically reduce experimental determination. Therefore, we propose a novel method, iPReditor-CMG (improved predictive RNA editor for crop mitochondrial genomes), to predict crop mitochondrial editing sites using genome sequence and an optimised support vector machine (SVM). We first selected three mitochondrial genomes with known RNA editing sites from Arabidopsis thaliana, Brassica napus and Oryza sativa, released by NCBI, as the training and test sets. The genes and their transcripts from self-sequenced tobacco mitochondrial ATPase were selected as the validation set. The iPReditor-CMG first coded the genome sequences as numerical vectors and then performed an efficient feature selection on the high-dimensional feature space, where the SVM was employed in feature selection and following modelling. The average independent prediction accuracy of intraspecific editing sites across three species was 0.85, and up to 0.91 in A. thaliana, which outperformed the reference models. For the interspecific independent prediction, the prediction accuracy between dicotyledons was 0.78 and the accuracy between dicotyledons and monocotyledons was 0.56, which implies that there might be similarity in the C-to-U editing mechanism in close relatives. Finally, the best model was identified with an independent test accuracy of 0.91 and an AUC of 0.88, which suggested that five unreported feature sequences, i.e. TGACA, ACAAC, GTAGA, CCGTT and TAACA, are closely associated with the editing phenomenon. Multiple tests supported that the iPReditor-CMG could be effectively applied to predict editing sites in crop mitochondria, which may further contribute to understanding the mechanisms of site editing and post-transcriptional events in crop mitochondria.


Assuntos
Arabidopsis , Genoma Mitocondrial , Arabidopsis/genética , Arabidopsis/metabolismo , Genoma Mitocondrial/genética , Genômica , RNA/genética , RNA/metabolismo , Edição de RNA , Máquina de Vetores de Suporte
3.
Front Genet ; 11: 257, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32265988

RESUMO

BACKGROUND: Cytoplasmic male sterility (CMS) is a complex phenomenon of plant sterility that can produce non-functional pollen. It is caused by mutation, rearrangement or recombination in the mitochondrial genome. So far, the systematic structural characteristics of the changes in the mitochondrial genome from the maintainer lines to the CMS lines have not been reported in tobacco. RESULTS: The mitochondrial genomes of the flower buds from both CMS lines and maintainer lines of two Nicotiana tabacum cultivars (YY85, sYY85, ZY90, and sZY90) were sequenced using the PacBio and Illumina Hiseq technology, and several findings were produced by comparative analysis based on the de novo sequencing. (1) The genomes of the CMS lines were larger, and the different areas were mostly non-coding regions. (2) A large number of rearrangement regions were detected in the CMS lines, with many translocation regions. (3) Thirteen gene clusters were shared by the four mitochondrial genomes, among which two of the gene clusters, nad2-sdh3 and nad6-rps4, were far from each other in the CMS lines. (4) Thirty-three protein-coding genes were conserved in four mitochondrial genomes. However, nad3 was detected one additional copy in the maintainer lines, and sequence differences were revealed in the four candidate genes (atp6, cox2, nad2, and sdh3). Importantly, the evolutionary tree based on the four genes could be used to distinguish the CMS lines and the maintainer lines well for the sequenced mitochondrial genomes of the tobacco. (5) Sixteen CMS-specific open reading frames (ORFs) were found, three of which (orf91, orf115b, and orf100) were previously reported. (6) The differences in intensity of the protein-protein (PPI) interaction in ATP6 were further verified using the yeast two-hybrid analysis. CONCLUSION: Although the majority of the sequences, genes and gene clusters were shared by the mitochondrial genomes of the maintainer and the CMS lines in tobacco, extensive structural variations identified with comprehensive analysis based on the mitochondrial genomes, including rearrangement, gene order, the mitochondrial genome expansion and shrinkage events, might be related to CMS. Additionally, the candidate protein-coding genes and CMS-specific ORFs were closely associated with the CMS mechanism. Verification experiments of one of the candidate genes were performed, and the validity of our research results was supported.

4.
Front Genet ; 10: 1410, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-32082366

RESUMO

For precision medicine, there is a need to identify genes that accurately distinguish the physiological state or response to a particular therapy, but this can be challenging. Many methods of analyzing differential expression have been established and applied to this problem, such as t-test, edgeR, and DEseq2. A common feature of these methods is their focus on a linear relationship (differential expression) between gene expression and phenotype. However, they may overlook nonlinear relationships due to various factors, such as the degree of disease progression, sex, age, ethnicity, and environmental factors. Maximal information coefficient (MIC) was proposed to capture a wide range of associations of two variables in both linear and nonlinear relationships. However, with MIC it is difficult to highlight genes with nonlinear expression patterns as the genes giving the most strongly supported hits are linearly expressed, especially for noisy data. It is thus important to also efficiently identify nonlinearly expressed genes in order to unravel the molecular basis of disease and to reveal new therapeutic targets. We propose a novel nonlinearity measure called normalized differential correlation (NDC) to efficiently highlight nonlinearly expressed genes in transcriptome datasets. Validation using six real-world cancer datasets revealed that the NDC method could highlight nonlinearly expressed genes that could not be highlighted by t-test, MIC, edgeR, and DEseq2, although MIC could capture nonlinear correlations. The classification accuracy indicated that analysis of these genes could adequately distinguish cancer and paracarcinoma tissue samples. Furthermore, the results of biological interpretation of the identified genes suggested that some of them were involved in key functional pathways associated with cancer progression and metastasis. All of this evidence suggests that these nonlinearly expressed genes may play a central role in regulating cancer progression.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA