Búsqueda | Biblioteca Virtual en Salud Odontología. Uruguay

1.

PQSDC: a parallel lossless compressor for quality scores data via sequences partition and run-length prediction mapping.

Sun, Hui; Zheng, Yingfeng; Xie, Haonan; Ma, Huidong; Zhong, Cheng; Yan, Meng; Liu, Xiaoguang; Wang, Gang.

Bioinformatics ; 40(5)2024 May 02.

Artículo en Inglés | MEDLINE | ID: mdl-38759114

RESUMEN

MOTIVATION: The quality scores data (QSD) account for 70% in compressed FastQ files obtained from the short and long reads sequencing technologies. Designing effective compressors for QSD that counterbalance compression ratio, time cost, and memory consumption is essential in scenarios such as large-scale genomics data sharing and long-term data backup. This study presents a novel parallel lossless QSD-dedicated compression algorithm named PQSDC, which fulfills the above requirements well. PQSDC is based on two core components: a parallel sequences-partition model designed to reduce peak memory consumption and time cost during compression and decompression processes, as well as a parallel four-level run-length prediction mapping model to enhance compression ratio. Besides, the PQSDC algorithm is also designed to be highly concurrent using multicore CPU clusters. RESULTS: We evaluate PQSDC and four state-of-the-art compression algorithms on 27 real-world datasets, including 61.857 billion QSD characters and 632.908 million QSD sequences. (1) For short reads, compared to baselines, the maximum improvement of PQSDC reaches 7.06% in average compression ratio, and 8.01% in weighted average compression ratio. During compression and decompression, the maximum total time savings of PQSDC are 79.96% and 84.56%, respectively; the maximum average memory savings are 68.34% and 77.63%, respectively. (2) For long reads, the maximum improvement of PQSDC reaches 12.51% and 13.42% in average and weighted average compression ratio, respectively. The maximum total time savings during compression and decompression are 53.51% and 72.53%, respectively; the maximum average memory savings are 19.44% and 17.42%, respectively. (3) Furthermore, PQSDC ranks second in compression robustness among the tested algorithms, indicating that it is less affected by the probability distribution of the QSD collections. Overall, our work provides a promising solution for QSD parallel compression, which balances storage cost, time consumption, and memory occupation primely. AVAILABILITY AND IMPLEMENTATION: The proposed PQSDC compressor can be downloaded from https://github.com/fahaihi/PQSDC.

Asunto(s)

Algoritmos , Compresión de Datos , Compresión de Datos/métodos , Genómica/métodos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Humanos

2.

cnnLSV: detecting structural variants by encoding long-read alignment information and convolutional neural network.

Ma, Huidong; Zhong, Cheng; Chen, Danyang; He, Haofa; Yang, Feng.

BMC Bioinformatics ; 24(1): 119, 2023 Mar 28.

Artículo en Inglés | MEDLINE | ID: mdl-36977976

RESUMEN

BACKGROUND: Genomic structural variant detection is a significant and challenging issue in genome analysis. The existing long-read based structural variant detection methods still have space for improvement in detecting multi-type structural variants. RESULTS: In this paper, we propose a method called cnnLSV to obtain detection results with higher quality by eliminating false positives in the detection results merged from the callsets of existing methods. We design an encoding strategy for four types of structural variants to represent long-read alignment information around structural variants into images, input the images into a constructed convolutional neural network to train a filter model, and load the trained model to remove the false positives to improve the detection performance. We also eliminate mislabeled training samples in the training model phase by using principal component analysis algorithm and unsupervised clustering algorithm k-means. Experimental results on both simulated and real datasets show that our proposed method outperforms existing methods overall in detecting insertions, deletions, inversions, and duplications. The program of cnnLSV is available at https://github.com/mhuidong/cnnLSV . CONCLUSIONS: The proposed cnnLSV can detect structural variants by using long-read alignment information and convolutional neural network to achieve overall higher performance, and effectively eliminate incorrectly labeled samples by using the principal component analysis and k-means algorithms in training model stage.

Asunto(s)

Secuenciación de Nucleótidos de Alto Rendimiento , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Algoritmos , Genoma , Redes Neurales de la Computación

3.

PMFFRC: a large-scale genomic short reads compression optimizer via memory modeling and redundant clustering.

Sun, Hui; Zheng, Yingfeng; Xie, Haonan; Ma, Huidong; Liu, Xiaoguang; Wang, Gang.

BMC Bioinformatics ; 24(1): 454, 2023 Nov 30.

Artículo en Inglés | MEDLINE | ID: mdl-38036969

RESUMEN

BACKGROUND: Genomic sequencing reads compressors are essential for balancing high-throughput sequencing short reads generation speed, large-scale genomic data sharing, and infrastructure storage expenditure. However, most existing short reads compressors rarely utilize big-memory systems and duplicative information between diverse sequencing files to achieve a higher compression ratio for conserving reads data storage space. RESULTS: We employ compression ratio as the optimization objective and propose a large-scale genomic sequencing short reads data compression optimizer, named PMFFRC, through novelty memory modeling and redundant reads clustering technologies. By cascading PMFFRC, in 982 GB fastq format sequencing data, with 274 GB and 3.3 billion short reads, the state-of-the-art and reference-free compressors HARC, SPRING, Mstcom, and FastqCLS achieve 77.89%, 77.56%, 73.51%, and 29.36% average maximum compression ratio gains, respectively. PMFFRC saves 39.41%, 41.62%, 40.99%, and 20.19% of storage space sizes compared with the four unoptimized compressors. CONCLUSIONS: PMFFRC rational usage big-memory of compression server, effectively saving the sequencing reads data storage space sizes, which relieves the basic storage facilities costs and community sharing transmitting overhead. Our work furnishes a novel solution for improving sequencing reads compression and saving storage space. The proposed PMFFRC algorithm is packaged in a same-name Linux toolkit, available un-limited at https://github.com/fahaihi/PMFFRC .

Asunto(s)

Compresión de Datos , Programas Informáticos , Algoritmos , Genómica , Secuenciación de Nucleótidos de Alto Rendimiento , Análisis por Conglomerados , Análisis de Secuencia de ADN

4.

Linear Characteristics of the Differences in Phase Tangents of Triple-Coil Electromagnetic Sensors and Their Application in Nonmagnetic Metal Classification.

Wang, Dong; Zhang, Zhijie; Yin, Wuliang; Chen, Haoze; Ma, Huidong; Zhou, Guangyu; Zhang, Yuchen.

Sensors (Basel) ; 22(19)2022 Oct 03.

Artículo en Inglés | MEDLINE | ID: mdl-36236610

RESUMEN

Metal sorting is the first step in scrap metal recycling. The traditional magnetic separation method can classify ferromagnetic metals, but it is not applicable to some nonmagnetic metals with higher value. To address this situation, we propose an eddy current testing (ECT) technology-based method for classifying nonmagnetic metals. In this study, a triple-coil electromagnetic sensor, which works as two coil pairs, is tested. By analyzing the physical model of the sensor, a feature related to the conductivity of the sample under test is obtained as the difference in the tangent of the impedance changes in the two coil pairs. Additionally, we derive a linear relationship between this feature and the lift-off height, which is verified experimentally and will help to solve the classification error caused by the variation in the lift-off height. In addition, we find that the excitation frequency does not affect this linear feature. Moreover, in this study, the spectrum scanning method is converted into a single-frequency measurement, and the time consumption is greatly reduced, which improves the efficiency of the real-time metal classification system.

5.

Conductivity Classification of Multi-Shape Nonmagnetic Metal Considering Spatial Position Drift Effect with a Triple-Coil Electromagnetic Sensor.

Wang, Dong; Zhang, Zhijie; Yin, Wuliang; Chen, Haoze; Ma, Huidong; Zhou, Guangyu; Zhang, Yuchen.

Sensors (Basel) ; 22(15)2022 Jul 29.

Artículo en Inglés | MEDLINE | ID: mdl-35957251

RESUMEN

The primary step in metal recovery is metal classification. During eddy current testing (ECT), the shape of the sample can have an impact on the measurement results. To classify nonmagnetic metals in three shapes-planar, cylindrical, and spherical-a triple-coil electromagnetic sensor that operates as two coil pairs is used, and the difference in the phase tangent of the impedance change of the two coil pairs is used as a feature for the classification. The effect of spatial position drift between the sensor and the sample divided into lift-off vertically and horizontal drift horizontally on this feature is considered. Experimental results prove that there is a linear relationship between the feature and lift-off regardless of the metal shape, whereas horizontal drift has no effect on this feature. In addition, the slope of the curve between the feature and the lift-off is different for different shapes. Finally, a classification method eliminating the effect of lift-off variation has been constructed, and the classification accuracy of Cu-Al-Zn-Ti metals reached 96.3%, 96.3%, 92.6%, and 100%, respectively, with an overall correct classification rate of 96.3%.

6.

Modeling the Effect of Stress Ratio, Loading Frequency and Fiber Orientation on the Fatigue Response of Composite Materials.

Ma, Huidong; Bai, Xuezong; Ran, Yawei; Wei, Xubing; An, Zongwen.

Polymers (Basel) ; 14(14)2022 Jul 06.

Artículo en Inglés | MEDLINE | ID: mdl-35890547

RESUMEN

Fatigue life models are widely used to predict the fatigue behavior at arbitrary cycle counts of composite structures subjected to cyclic or highly dynamic loads. However, their predictive capacity and determination of model parameters are strongly dependent on loading conditions and large experimental efforts. This research aims to develop a new model which uses a single model parameter to predict the variation trend and distribution pattern of fatigue experimental data points subjected to different stress ratios, loading frequencies and fiber orientations. Validation of the model with several sets of experimental data shows that the proposed model is capable of adequately considering the effects of stress ratio, loading frequency and fiber orientation on the fatigue behavior of composite materials and correctly predicting the variation trend of the experimental data points using only one set of model parameters regardless of stress ratios, loading frequencies and fiber orientations.

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

RESUMEN

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA