Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 1.573
Filter
1.
Proc Natl Acad Sci U S A ; 121(28): e2320870121, 2024 Jul 09.
Article in English | MEDLINE | ID: mdl-38959033

ABSTRACT

Efficient storage and sharing of massive biomedical data would open up their wide accessibility to different institutions and disciplines. However, compressors tailored for natural photos/videos are rapidly limited for biomedical data, while emerging deep learning-based methods demand huge training data and are difficult to generalize. Here, we propose to conduct Biomedical data compRession with Implicit nEural Function (BRIEF) by representing the target data with compact neural networks, which are data specific and thus have no generalization issues. Benefiting from the strong representation capability of implicit neural function, BRIEF achieves 2[Formula: see text]3 orders of magnitude compression on diverse biomedical data at significantly higher fidelity than existing techniques. Besides, BRIEF is of consistent performance across the whole data volume, and supports customized spatially varying fidelity. BRIEF's multifold advantageous features also serve reliable downstream tasks at low bandwidth. Our approach will facilitate low-bandwidth data sharing and promote collaboration and progress in the biomedical field.


Subject(s)
Information Dissemination , Neural Networks, Computer , Humans , Information Dissemination/methods , Data Compression/methods , Deep Learning , Biomedical Research/methods
2.
J Comput Biol ; 31(6): 524-538, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38820168

ABSTRACT

An essential task in computational genomics involves transforming input sequences into their constituent k-mers. The quest for an efficient representation of k-mer sets is crucial for enhancing the scalability of bioinformatic analyses. One widely used method involves converting the k-mer set into a de Bruijn graph (dBG), followed by seeking a compact graph representation via the smallest path cover. This study introduces USTAR* (Unitig STitch Advanced constRuction), a tool designed to compress both a set of k-mers and their associated counts. USTAR leverages the connectivity and density of dBGs, enabling a more efficient path selection for constructing the path cover. The efficacy of USTAR is demonstrated through its application in compressing real read data sets. USTAR improves the compression achieved by UST (Unitig STitch), the best algorithm, by percentages ranging from 2.3% to 26.4%, depending on the k-mer size, and it is up to 7× times faster.


Subject(s)
Algorithms , Data Compression , Genomics , Data Compression/methods , Genomics/methods , Software , Computational Biology/methods , Humans , Sequence Analysis, DNA/methods
3.
Magn Reson Med ; 92(3): 1232-1247, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38748852

ABSTRACT

PURPOSE: We present SCAMPI (Sparsity Constrained Application of deep Magnetic resonance Priors for Image reconstruction), an untrained deep Neural Network for MRI reconstruction without previous training on datasets. It expands the Deep Image Prior approach with a multidomain, sparsity-enforcing loss function to achieve higher image quality at a faster convergence speed than previously reported methods. METHODS: Two-dimensional MRI data from the FastMRI dataset with Cartesian undersampling in phase-encoding direction were reconstructed for different acceleration rates for single coil and multicoil data. RESULTS: The performance of our architecture was compared to state-of-the-art Compressed Sensing methods and ConvDecoder, another untrained Neural Network for two-dimensional MRI reconstruction. SCAMPI outperforms these by better reducing undersampling artifacts and yielding lower error metrics in multicoil imaging. In comparison to ConvDecoder, the U-Net architecture combined with an elaborated loss-function allows for much faster convergence at higher image quality. SCAMPI can reconstruct multicoil data without explicit knowledge of coil sensitivity profiles. Moreover, it is a novel tool for reconstructing undersampled single coil k-space data. CONCLUSION: Our approach avoids overfitting to dataset features, that can occur in Neural Networks trained on databases, because the network parameters are tuned only on the reconstruction data. It allows better results and faster reconstruction than the baseline untrained Neural Network approach.


Subject(s)
Algorithms , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Neural Networks, Computer , Magnetic Resonance Imaging/methods , Humans , Image Processing, Computer-Assisted/methods , Artifacts , Brain/diagnostic imaging , Data Compression/methods
4.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38759114

ABSTRACT

MOTIVATION: The quality scores data (QSD) account for 70% in compressed FastQ files obtained from the short and long reads sequencing technologies. Designing effective compressors for QSD that counterbalance compression ratio, time cost, and memory consumption is essential in scenarios such as large-scale genomics data sharing and long-term data backup. This study presents a novel parallel lossless QSD-dedicated compression algorithm named PQSDC, which fulfills the above requirements well. PQSDC is based on two core components: a parallel sequences-partition model designed to reduce peak memory consumption and time cost during compression and decompression processes, as well as a parallel four-level run-length prediction mapping model to enhance compression ratio. Besides, the PQSDC algorithm is also designed to be highly concurrent using multicore CPU clusters. RESULTS: We evaluate PQSDC and four state-of-the-art compression algorithms on 27 real-world datasets, including 61.857 billion QSD characters and 632.908 million QSD sequences. (1) For short reads, compared to baselines, the maximum improvement of PQSDC reaches 7.06% in average compression ratio, and 8.01% in weighted average compression ratio. During compression and decompression, the maximum total time savings of PQSDC are 79.96% and 84.56%, respectively; the maximum average memory savings are 68.34% and 77.63%, respectively. (2) For long reads, the maximum improvement of PQSDC reaches 12.51% and 13.42% in average and weighted average compression ratio, respectively. The maximum total time savings during compression and decompression are 53.51% and 72.53%, respectively; the maximum average memory savings are 19.44% and 17.42%, respectively. (3) Furthermore, PQSDC ranks second in compression robustness among the tested algorithms, indicating that it is less affected by the probability distribution of the QSD collections. Overall, our work provides a promising solution for QSD parallel compression, which balances storage cost, time consumption, and memory occupation primely. AVAILABILITY AND IMPLEMENTATION: The proposed PQSDC compressor can be downloaded from https://github.com/fahaihi/PQSDC.


Subject(s)
Algorithms , Data Compression , Data Compression/methods , Genomics/methods , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Software , Humans
5.
J Neural Eng ; 21(3)2024 May 16.
Article in English | MEDLINE | ID: mdl-38718785

ABSTRACT

Objective.Recently, the demand for wearable devices using electroencephalography (EEG) has increased rapidly in many fields. Due to its volume and computation constraints, wearable devices usually compress and transmit EEG to external devices for analysis. However, current EEG compression algorithms are not tailor-made for wearable devices with limited computing and storage. Firstly, the huge amount of parameters makes it difficult to apply in wearable devices; secondly, it is tricky to learn EEG signals' distribution law due to the low signal-to-noise ratio, which leads to excessive reconstruction error and suboptimal compression performance.Approach.Here, a feature enhanced asymmetric encoding-decoding network is proposed. EEG is encoded with a lightweight model, and subsequently decoded with a multi-level feature fusion network by extracting the encoded features deeply and reconstructing the signal through a two-branch structure.Main results.On public EEG datasets, motor imagery and event-related potentials, experimental results show that the proposed method has achieved the state of the art compression performance. In addition, the neural representation analysis and the classification performance of the reconstructed EEG signals also show that our method tends to retain more task-related information as the compression ratio increases and retains reliable discriminative information after EEG compression.Significance.This paper tailors an asymmetric EEG compression method for wearable devices that achieves state-of-the-art compression performance in a lightweight manner, paving the way for the application of EEG-based wearable devices.


Subject(s)
Data Compression , Electroencephalography , Electroencephalography/methods , Data Compression/methods , Humans , Wearable Electronic Devices , Neural Networks, Computer , Algorithms , Signal Processing, Computer-Assisted , Imagination/physiology
6.
Sci Rep ; 14(1): 10560, 2024 05 08.
Article in English | MEDLINE | ID: mdl-38720020

ABSTRACT

The research on video analytics especially in the area of human behavior recognition has become increasingly popular recently. It is widely applied in virtual reality, video surveillance, and video retrieval. With the advancement of deep learning algorithms and computer hardware, the conventional two-dimensional convolution technique for training video models has been replaced by three-dimensional convolution, which enables the extraction of spatio-temporal features. Specifically, the use of 3D convolution in human behavior recognition has been the subject of growing interest. However, the increased dimensionality has led to challenges such as the dramatic increase in the number of parameters, increased time complexity, and a strong dependence on GPUs for effective spatio-temporal feature extraction. The training speed can be considerably slow without the support of powerful GPU hardware. To address these issues, this study proposes an Adaptive Time Compression (ATC) module. Functioning as an independent component, ATC can be seamlessly integrated into existing architectures and achieves data compression by eliminating redundant frames within video data. The ATC module effectively reduces GPU computing load and time complexity with negligible loss of accuracy, thereby facilitating real-time human behavior recognition.


Subject(s)
Algorithms , Data Compression , Video Recording , Humans , Data Compression/methods , Human Activities , Deep Learning , Image Processing, Computer-Assisted/methods , Pattern Recognition, Automated/methods
7.
Commun Biol ; 7(1): 553, 2024 May 09.
Article in English | MEDLINE | ID: mdl-38724695

ABSTRACT

For the last two decades, the amount of genomic data produced by scientific and medical applications has been growing at a rapid pace. To enable software solutions that analyze, process, and transmit these data in an efficient and interoperable way, ISO and IEC released the first version of the compression standard MPEG-G in 2019. However, non-proprietary implementations of the standard are not openly available so far, limiting fair scientific assessment of the standard and, therefore, hindering its broad adoption. In this paper, we present Genie, to the best of our knowledge the first open-source encoder that compresses genomic data according to the MPEG-G standard. We demonstrate that Genie reaches state-of-the-art compression ratios while offering interoperability with any other standard-compliant decoder independent from its manufacturer. Finally, the ISO/IEC ecosystem ensures the long-term sustainability and decodability of the compressed data through the ISO/IEC-supported reference decoder.


Subject(s)
Data Compression , Genomics , Software , Genomics/methods , Data Compression/methods , Humans
8.
J Biomed Opt ; 29(Suppl 1): S11529, 2024 Jan.
Article in English | MEDLINE | ID: mdl-38650979

ABSTRACT

Significance: Compressed sensing (CS) uses special measurement designs combined with powerful mathematical algorithms to reduce the amount of data to be collected while maintaining image quality. This is relevant to almost any imaging modality, and in this paper we focus on CS in photoacoustic projection imaging (PAPI) with integrating line detectors (ILDs). Aim: Our previous research involved rather general CS measurements, where each ILD can contribute to any measurement. In the real world, however, the design of CS measurements is subject to practical constraints. In this research, we aim at a CS-PAPI system where each measurement involves only a subset of ILDs, and which can be implemented in a cost-effective manner. Approach: We extend the existing PAPI with a self-developed CS unit. The system provides structured CS matrices for which the existing recovery theory cannot be applied directly. A random search strategy is applied to select the CS measurement matrix within this class for which we obtain exact sparse recovery. Results: We implement a CS PAPI system for a compression factor of 4:3, where specific measurements are made on separate groups of 16 ILDs. We algorithmically design optimal CS measurements that have proven sparse CS capabilities. Numerical experiments are used to support our results. Conclusions: CS with proven sparse recovery capabilities can be integrated into PAPI, and numerical results support this setup. Future work will focus on applying it to experimental data and utilizing data-driven approaches to enhance the compression factor and generalize the signal class.


Subject(s)
Algorithms , Equipment Design , Image Processing, Computer-Assisted , Photoacoustic Techniques , Photoacoustic Techniques/methods , Photoacoustic Techniques/instrumentation , Image Processing, Computer-Assisted/methods , Data Compression/methods , Phantoms, Imaging
9.
J Proteome Res ; 23(5): 1702-1712, 2024 May 03.
Article in English | MEDLINE | ID: mdl-38640356

ABSTRACT

Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files: five relative errors for intensities and five absolute errors for m/z values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared Precision, Recall, F1 - score, and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute m/z error of 10-4 and a relative intensity error of 2 × 10-2, adhering to a 5% error threshold (F1 - scores above 95%). For a stricter 1% error threshold (F1 - scores above 99%), an absolute m/z error of 2 × 10-5 and a relative intensity error of 2 × 10-3 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.


Subject(s)
Data Compression , Mass Spectrometry , Metabolomics , Mass Spectrometry/methods , Metabolomics/methods , Metabolomics/statistics & numerical data , Data Compression/methods , Software , Humans , Algorithms
10.
Genome Biol ; 25(1): 106, 2024 04 25.
Article in English | MEDLINE | ID: mdl-38664753

ABSTRACT

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.


Subject(s)
Data Compression , Metagenomics , Data Compression/methods , Metagenomics/methods , Software , Genome, Microbial , Genome, Bacterial , Sequence Analysis, DNA/methods
11.
Sci Rep ; 14(1): 5168, 2024 03 02.
Article in English | MEDLINE | ID: mdl-38431641

ABSTRACT

Magnetic resonance imaging is a medical imaging technique to create comprehensive images of the tissues and organs in the body. This study presents an advanced approach for storing and compressing neuroimaging informatics technology initiative files, a standard format in magnetic resonance imaging. It is designed to enhance telemedicine services by facilitating efficient and high-quality communication between healthcare practitioners and patients. The proposed downsampling approach begins by opening the neuroimaging informatics technology initiative file as volumetric data and then planning it into several slice images. Then, the quantization hiding technique will be applied to each of the two consecutive slice images to generate the stego slice with the same size. This involves the following major steps: normalization, microblock generation, and discrete cosine transformation. Finally, it assembles the resultant stego slice images to produce the final neuroimaging informatics technology initiative file as volumetric data. The upsampling process, designed to be completely blind, reverses the downsampling steps to reconstruct the subsequent image slice accurately. The efficacy of the proposed method was evaluated using a magnetic resonance imaging dataset, focusing on peak signal-to-noise ratio, signal-to-noise ratio, structural similarity index, and Entropy as key performance metrics. The results demonstrate that the proposed approach not only significantly reduces file sizes but also maintains high image quality.


Subject(s)
Data Compression , Telemedicine , Humans , Data Compression/methods , Magnetic Resonance Imaging/methods , Neuroimaging , Signal-To-Noise Ratio
12.
IEEE Trans Image Process ; 33: 2502-2513, 2024.
Article in English | MEDLINE | ID: mdl-38526904

ABSTRACT

Residual coding has gained prevalence in lossless compression, where a lossy layer is initially employed and the reconstruction errors (i.e., residues) are then losslessly compressed. The underlying principle of the residual coding revolves around the exploration of priors based on context modeling. Herein, we propose a residual coding framework for 3D medical images, involving the off-the-shelf video codec as the lossy layer and a Bilateral Context Modeling based Network (BCM-Net) as the residual layer. The BCM-Net is proposed to achieve efficient lossless compression of residues through exploring intra-slice and inter-slice bilateral contexts. In particular, a symmetry-based intra-slice context extraction (SICE) module is proposed to mine bilateral intra-slice correlations rooted in the inherent anatomical symmetry of 3D medical images. Moreover, a bi-directional inter-slice context extraction (BICE) module is designed to explore bilateral inter-slice correlations from bi-directional references, thereby yielding representative inter-slice context. Experiments on popular 3D medical image datasets demonstrate that the proposed method can outperform existing state-of-the-art methods owing to efficient redundancy reduction. Our code will be available on GitHub for future research.


Subject(s)
Data Compression , Data Compression/methods , Imaging, Three-Dimensional/methods
13.
Sci Rep ; 14(1): 5087, 2024 03 01.
Article in English | MEDLINE | ID: mdl-38429300

ABSTRACT

When traditional EEG signals are collected based on the Nyquist theorem, long-time recordings of EEG signals will produce a large amount of data. At the same time, limited bandwidth, end-to-end delay, and memory space will bring great pressure on the effective transmission of data. The birth of compressed sensing alleviates this transmission pressure. However, using an iterative compressed sensing reconstruction algorithm for EEG signal reconstruction faces complex calculation problems and slow data processing speed, limiting the application of compressed sensing in EEG signal rapid monitoring systems. As such, this paper presents a non-iterative and fast algorithm for reconstructing EEG signals using compressed sensing and deep learning techniques. This algorithm uses the improved residual network model, extracts the feature information of the EEG signal by one-dimensional dilated convolution, directly learns the nonlinear mapping relationship between the measured value and the original signal, and can quickly and accurately reconstruct the EEG signal. The method proposed in this paper has been verified by simulation on the open BCI contest dataset. Overall, it is proved that the proposed method has higher reconstruction accuracy and faster reconstruction speed than the traditional CS reconstruction algorithm and the existing deep learning reconstruction algorithm. In addition, it can realize the rapid reconstruction of EEG signals.


Subject(s)
Data Compression , Deep Learning , Signal Processing, Computer-Assisted , Data Compression/methods , Algorithms , Electroencephalography/methods
14.
BMC Genomics ; 25(1): 266, 2024 Mar 09.
Article in English | MEDLINE | ID: mdl-38461245

ABSTRACT

BACKGROUND: DNA storage has the advantages of large capacity, long-term stability, and low power consumption relative to other storage mediums, making it a promising new storage medium for multimedia information such as images. However, DNA storage has a low coding density and weak error correction ability. RESULTS: To achieve more efficient DNA storage image reconstruction, we propose DNA-QLC (QRes-VAE and Levenshtein code (LC)), which uses the quantized ResNet VAE (QRes-VAE) model and LC for image compression and DNA sequence error correction, thus improving both the coding density and error correction ability. Experimental results show that the DNA-QLC encoding method can not only obtain DNA sequences that meet the combinatorial constraints, but also have a net information density that is 2.4 times higher than DNA Fountain. Furthermore, at a higher error rate (2%), DNA-QLC achieved image reconstruction with an SSIM value of 0.917. CONCLUSIONS: The results indicate that the DNA-QLC encoding scheme guarantees the efficiency and reliability of the DNA storage system and improves the application potential of DNA storage for multimedia information such as images.


Subject(s)
Algorithms , Data Compression , Reproducibility of Results , DNA/genetics , Data Compression/methods , Image Processing, Computer-Assisted/methods
15.
Eur J Radiol ; 175: 111418, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38490130

ABSTRACT

PURPOSE: To investigate the potential of combining Compressed Sensing (CS) and a newly developed AI-based super resolution reconstruction prototype consisting of a series of convolutional neural networks (CNN) for a complete five-minute 2D knee MRI protocol. METHODS: In this prospective study, 20 volunteers were examined using a 3T-MRI-scanner (Ingenia Elition X, Philips). Similar to clinical practice, the protocol consists of a fat-saturated 2D-proton-density-sequence in coronal, sagittal and transversal orientation as well as a sagittal T1-weighted sequence. The sequences were acquired with two different resolutions (standard and low resolution) and the raw data reconstructed with two different reconstruction algorithms: a conventional Compressed SENSE (CS) and a new CNN-based algorithm for denoising and subsequently to interpolate and therewith increase the sharpness of the image (CS-SuperRes). Subjective image quality was evaluated by two blinded radiologists reviewing 8 criteria on a 5-point Likert scale and signal-to-noise ratio calculated as an objective parameter. RESULTS: The protocol reconstructed with CS-SuperRes received higher ratings than the time-equivalent CS reconstructions, statistically significant especially for low resolution acquisitions (e.g., overall image impression: 4.3 ±â€¯0.4 vs. 3.4 ±â€¯0.4, p < 0.05). CS-SuperRes reconstructions for the low resolution acquisition were comparable to traditional CS reconstructions with standard resolution for all parameters, achieving a scan time reduction from 11:01 min to 4:46 min (57 %) for the complete protocol (e.g. overall image impression: 4.3 ±â€¯0.4 vs. 4.0 ±â€¯0.5, p < 0.05). CONCLUSION: The newly-developed AI-based reconstruction algorithm CS-SuperRes allows to reduce scan time by 57% while maintaining unchanged image quality compared to the conventional CS reconstruction.


Subject(s)
Algorithms , Healthy Volunteers , Knee Joint , Magnetic Resonance Imaging , Humans , Magnetic Resonance Imaging/methods , Male , Female , Prospective Studies , Adult , Knee Joint/diagnostic imaging , Data Compression/methods , Neural Networks, Computer , Middle Aged , Signal-To-Noise Ratio , Image Interpretation, Computer-Assisted/methods , Young Adult
16.
Eur J Radiol ; 175: 111445, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38537605

ABSTRACT

PURPOSE: To evaluate the feasibility of a free-breathing sequence (4D FreeBreathing) combined with Compressed SENSE in dynamic contrast-enhanced pancreatic MRI and compare it with a breath-holding sequence (eTHRIVE). METHOD: Patients who underwent pancreatic MRI, either eTHRIVE or 4D FreeBreathing, from April 2022 to November 2023 were included in this retrospective study. Two radiologists, who were unaware of the scan sequence, independently and randomly reviewed the images at the precontrast, pancreatic, portal venous, and equilibrium phases and assigned confidence scores for motion and streaking artifacts, pancreatic sharpness, and overall image quality using a 5-point scale. Furthermore, the radiologists assessed the appropriateness of the scan timing of the pancreatic phase. Mann-Whitney U and Fisher's exact tests were conducted to compare the confidence scores and adequacy of the pancreatic phase scan timing between eTHRIVE and 4D FreeBreathing. RESULTS: Overall, 48 patients (median age, 71 years; interquartile range, 64-77 years; 24 women) were included. Among them, 20 patients (42%) were scanned using 4D FreeBreathing. The 4D FreeBreathing showed moderate streaking artifact but improved motion artifact (P <.001-.17) at all phases. Pancreatic sharpness and overall image quality were almost comparable between two sequences (P = .17-.96). All 20 examinations in 4D FreeBreathing showed appropriate pancreatic phase images, but only 16 (57%; P <.001 for reviewer 1) and 18 (64%; P = .003 for reviewer 2) examinations showed it in eTHRIVE. CONCLUSION: The use of 4D FreeBreathing combined with Compressed SENSE was feasible in pancreatic MRI and provided appropriate pancreatic phase images in all examinations.


Subject(s)
Contrast Media , Feasibility Studies , Magnetic Resonance Imaging , Humans , Female , Male , Middle Aged , Aged , Retrospective Studies , Magnetic Resonance Imaging/methods , Artifacts , Respiration , Image Enhancement/methods , Breath Holding , Data Compression/methods , Pancreatic Neoplasms/diagnostic imaging , Pancreas/diagnostic imaging , Pancreatic Diseases/diagnostic imaging
17.
Article in English | MEDLINE | ID: mdl-38358865

ABSTRACT

Revolutionary advances in DNA sequencing technologies fundamentally change the nature of genomics. Today's sequencing technologies have opened into an outburst in genomic data volume. These data can be used in various applications where long-term storage and analysis of genomic sequence data are required. Data-specific compression algorithms can effectively manage a large volume of data. In recent times, deep learning has achieved great success in many compression tools and is gradually being used in genomic sequence compression. Significantly, autoencoder has been applied in dimensionality reduction, compact representations of data, and generative model learning. It can use convolutional layers to learn essential features from input data, which is better for image and series data. Autoencoder reconstructs the input data with some loss of information. Since accuracy is critical in genomic data, compressed genomic data must be decompressed without any information loss. We introduce a new scheme to address the loss incurred in the decompressed data of the autoencoder. This paper proposes a novel algorithm called GenCoder for reference-free compression of genomic sequences using a convolutional autoencoder and regenerating the genomic sequences from a latent code produced by the autoencoder, and retrieving original data losslessly. Performance evaluation is conducted on various genomes and benchmarked datasets. The experimental results on the tested data demonstrate that the deep learning model used in the proposed compression algorithm generalizes well for genomic sequence data and achieves a compression gain of 27% over the best state-of-the-art method.


Subject(s)
Algorithms , Data Compression , Genomics , Neural Networks, Computer , Sequence Analysis, DNA , Data Compression/methods , Genomics/methods , Sequence Analysis, DNA/methods , Humans , Deep Learning
18.
IEEE Trans Pattern Anal Mach Intell ; 46(8): 5820-5834, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38386571

ABSTRACT

To cost-effectively transmit high-quality dynamic 3D human images in immersive multimedia applications, efficient data compression is crucial. Unlike existing methods that focus on reducing signal-level reconstruction errors, we propose the first dynamic 3D human compression framework based on human priors. The layered coding architecture significantly enhances the perceptual quality while also supporting a variety of downstream tasks, including visual analysis and content editing. Specifically, a high-fidelity pose-driven Avatar is generated from the original frames as the basic structure layer to implicitly represent the human shape. Then, human movements between frames are parameterized via a commonly-used human prior model, i.e., the Skinned Multi-Person Linear Model (SMPL), to form the motion layer and drive the Avatar. Furthermore, the normals are also introduced as an enhancement layer to preserve fine-grained geometric details. Finally, the Avatar, SMPL parameters, and normal maps are efficiently compressed into layered semantic bitstreams. Extensive qualitative and quantitative experiments show that the proposed framework remarkably outperforms other state-of-the-art 3D codecs in terms of subjective quality with only a few bits. More notably, as the size or frame number of the 3D human sequence increases, the superiority of our framework in perceptual quality becomes more significant while saving more bitrates.


Subject(s)
Data Compression , Imaging, Three-Dimensional , Humans , Imaging, Three-Dimensional/methods , Data Compression/methods , Algorithms , Posture/physiology
19.
Gene ; 907: 148235, 2024 May 20.
Article in English | MEDLINE | ID: mdl-38342250

ABSTRACT

Next Generation Sequencing (NGS) technology generates massive amounts of genome sequence that increases rapidly over time. As a result, there is a growing need for efficient compression algorithms to facilitate the processing, storage, transmission, and analysis of large-scale genome sequences. Over the past 31 years, numerous state-of-the-art compression algorithms have been developed. The performance of any compression algorithm is measured by three main compression metrics: compression ratio, time, and memory usage. Existing k-mer hash indexing systems take more time, due to the decision-making process based on compression results. In this paper, we propose a two-phase reference genome compression algorithm using optimal k-mer length (RGCOK). Reference-based compression takes advantage of the inter-similarity between chromosomes of the same species. RGCOK achieves this by finding the optimal k-mer length for matching, using a randomization method and hashing. The performance of RGCOK was evaluated on three different benchmark data sets: novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Homo sapiens, and other species sequences using an Amazon AWS virtual cloud machine. Experiments showed that the optimal k-mer finding time by RGCOK is around 45.28 min, whereas the time for existing state-of-the-art algorithms HiRGC, SCCG, and HRCM ranges from 58 min to 8.97 h.


Subject(s)
Data Compression , Software , Humans , Data Compression/methods , Algorithms , Genome , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods
20.
IUCrJ ; 11(Pt 2): 190-201, 2024 Mar 01.
Article in English | MEDLINE | ID: mdl-38327201

ABSTRACT

Serial crystallography (SX) has become an established technique for protein structure determination, especially when dealing with small or radiation-sensitive crystals and investigating fast or irreversible protein dynamics. The advent of newly developed multi-megapixel X-ray area detectors, capable of capturing over 1000 images per second, has brought about substantial benefits. However, this advancement also entails a notable increase in the volume of collected data. Today, up to 2 PB of data per experiment could be easily obtained under efficient operating conditions. The combined costs associated with storing data from multiple experiments provide a compelling incentive to develop strategies that effectively reduce the amount of data stored on disk while maintaining the quality of scientific outcomes. Lossless data-compression methods are designed to preserve the information content of the data but often struggle to achieve a high compression ratio when applied to experimental data that contain noise. Conversely, lossy compression methods offer the potential to greatly reduce the data volume. Nonetheless, it is vital to thoroughly assess the impact of data quality and scientific outcomes when employing lossy compression, as it inherently involves discarding information. The evaluation of lossy compression effects on data requires proper data quality metrics. In our research, we assess various approaches for both lossless and lossy compression techniques applied to SX data, and equally importantly, we describe metrics suitable for evaluating SX data quality.


Subject(s)
Algorithms , Data Compression , Crystallography , Data Compression/methods , Tomography, X-Ray Computed
SELECTION OF CITATIONS
SEARCH DETAIL
...