RESUMEN
BACKGROUND: With the significant reduction in the cost of high-throughput sequencing technology, genomic selection technology has been rapidly developed in the field of plant breeding. Although numerous genomic selection methods have been proposed by researchers, the existing genomic selection methods still face the problem of poor prediction accuracy in practical applications. RESULTS: This paper proposes a genome prediction method MSXFGP based on a multi-strategy improved sparrow search algorithm (SSA) to optimize XGBoost parameters and feature selection. Firstly, logistic chaos mapping, elite learning, adaptive parameter adjustment, Levy flight, and an early stop strategy are incorporated into the SSA. This integration serves to enhance the global and local search capabilities of the algorithm, thereby improving its convergence accuracy and stability. Subsequently, the improved SSA is utilized to concurrently optimize XGBoost parameters and feature selection, leading to the establishment of a new genomic selection method, MSXFGP. Utilizing both the coefficient of determination R2 and the Pearson correlation coefficient as evaluation metrics, MSXFGP was evaluated against six existing genomic selection models across six datasets. The findings reveal that MSXFGP prediction accuracy is comparable or better than existing widely used genomic selection methods, and it exhibits better accuracy when R2 is utilized as an assessment metric. Additionally, this research provides a user-friendly Python utility designed to aid breeders in the effective application of this innovative method. MSXFGP is accessible at https://github.com/DIBreeding/MSXFGP . CONCLUSIONS: The experimental results show that the prediction accuracy of MSXFGP is comparable or better than existing genome selection methods, providing a new approach for plant genome selection.
Asunto(s)
Genoma de Planta , Genómica , Algoritmos , Benchmarking , Correlación de DatosRESUMEN
Pixel-level information of remote sensing images is of great value in many fields. CNN has a strong ability to extract image backbone features, but due to the localization of convolution operation, it is challenging to directly obtain global feature information and contextual semantic interaction, which makes it difficult for a pure CNN model to obtain higher precision results in semantic segmentation of remote sensing images. Inspired by the Swin Transformer with global feature coding capability, we design a two-branch multi-scale semantic segmentation network (TMNet) for remote sensing images. The network adopts the structure of a double encoder and a decoder. The Swin Transformer is used to increase the ability to extract global feature information. A multi-scale feature fusion module (MFM) is designed to merge shallow spatial features from images of different scales into deep features. In addition, the feature enhancement module (FEM) and channel enhancement module (CEM) are proposed and added to the dual encoder to enhance the feature extraction. Experiments were conducted on the WHDLD and Potsdam datasets to verify the excellent performance of TMNet.
Asunto(s)
Tecnología de Sensores Remotos , Semántica , Suministros de Energía Eléctrica , Columna Vertebral , Procesamiento de Imagen Asistido por ComputadorRESUMEN
In modern breeding practices, genomic prediction (GP) uses high-density single nucleotide polymorphisms (SNPs) markers to predict genomic estimated breeding values (GEBVs) for crucial phenotypes, thereby speeding up selection breeding process and shortening generation intervals. However, due to the characteristic of genotype data typically having far fewer sample numbers than SNPs markers, overfitting commonly arise during model training. To address this, the present study builds upon the Least Squares Twin Support Vector Regression (LSTSVR) model by incorporating a Lasso regularization term named ILSTSVR. Because of the complexity of parameter tuning for different datasets, subtraction average based optimizer (SABO) is further introduced to optimize ILSTSVR, and then obtain the GP model named SABO-ILSTSVR. Experiments conducted on four different crop datasets demonstrate that SABO-ILSTSVR outperforms or is equivalent in efficiency to widely-used genomic prediction methods. Source codes and data are available at: https://github.com/MLBreeding/SABO-ILSTSVR.
RESUMEN
Accurate inference and prediction of gene regulatory network are very important for understanding dynamic cellular processes. The large-scale time series genomics data are helpful to reveal the molecular dynamics and dynamic biological processes of complex biological systems. Firstly, we collected the time series data of the rat pineal gland tissue in the natural state according to a fixed sampling rate, and performed whole-genome sequencing. The large-scale time-series sequencing data set of rat pineal gland was constructed, which includes 480 time points, the time interval between adjacent time points is 3 min, and the sampling period is 24 h. Then, we proposed a new method of constructing gene expression regulatory network, named the gene regulatory network based on time series data and entropy transfer (GRNTSTE) method. The method is based on transfer entropy and large-scale time-series gene expression data to infer the causal regulatory relationship between genes in a data-driven mode. The comparative experiments prove that GRNTSTE has better performance than dynamical gene network inference with ensemble of trees (dynGENIE3) and SCRIBE, and has similar performance to TENET. Meanwhile, we proved that the performance of GRNTSTE is slightly lower than that of SINCERITIES method and better than other gene regulatory network construction methods in BEELINE framework, which is based on the BEELINE data set. Finally, the rat pineal rhythm gene expression regulatory network was constructed by us based on the GRNTSTE method, which provides an important reference for the study of the pineal rhythm mechanism, and is of great significance to the study of the pineal rhythm mechanism.
Asunto(s)
Glándula Pineal , Animales , Ritmo Circadiano/genética , Entropía , Epistasis Genética , Redes Reguladoras de Genes , Glándula Pineal/metabolismo , Ratas , Factores de TiempoRESUMEN
Understanding the complete map of melatonin synthesis, the information transfer network among circadian genes in pineal gland, promises to resolve outstanding issues in endocrine systems and improve the clinical diagnosis and treatment level of insomnia, immune disease and hysterical depression. Currently, some landmark studies have revealed some genes that regulate circadian rhythm associated with melatonin synthesis. However, these studies don't give a complete map of melatonin synthesis, as transfer information among circadian genes in pineal gland is lost. New biotechnology, integrates dynamic sequential omics and multiplexed imaging method, has been used to visualize the complete process of melatonin synthesis. It is found that there are two extremely significant information transfer processes involved in melatonin synthesis. In the first stage, as the light intensity decreased, melatonin synthesis mechanism has started, which is embodied in circadian genes, Rel, Polr2A, Mafk, and Srbf1 become active. In the second stage, circadian genes Hif1a, Bach1, Clock, E2f6, and Per2 are regulated simultaneously by four genes, Rel, Polr2A, Mafk, and Srbf1 and contribute genetic information to Aanat. The expeditious growth in this technique offer reference for an overall understanding of gene-to-gene regulatory relationship among circadian genes in pineal gland. In the study, dynamic sequential omics and the analysis process well provide the current state and future perspectives to better diagnose and cure diseases associated with melatonin synthesis disorder.