Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.567
Filtrar
1.
Commun Biol ; 7(1): 553, 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38724695

RESUMO

For the last two decades, the amount of genomic data produced by scientific and medical applications has been growing at a rapid pace. To enable software solutions that analyze, process, and transmit these data in an efficient and interoperable way, ISO and IEC released the first version of the compression standard MPEG-G in 2019. However, non-proprietary implementations of the standard are not openly available so far, limiting fair scientific assessment of the standard and, therefore, hindering its broad adoption. In this paper, we present Genie, to the best of our knowledge the first open-source encoder that compresses genomic data according to the MPEG-G standard. We demonstrate that Genie reaches state-of-the-art compression ratios while offering interoperability with any other standard-compliant decoder independent from its manufacturer. Finally, the ISO/IEC ecosystem ensures the long-term sustainability and decodability of the compressed data through the ISO/IEC-supported reference decoder.


Assuntos
Compressão de Dados , Genômica , Software , Genômica/métodos , Compressão de Dados/métodos , Humanos
2.
Sci Rep ; 14(1): 10560, 2024 05 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720020

RESUMO

The research on video analytics especially in the area of human behavior recognition has become increasingly popular recently. It is widely applied in virtual reality, video surveillance, and video retrieval. With the advancement of deep learning algorithms and computer hardware, the conventional two-dimensional convolution technique for training video models has been replaced by three-dimensional convolution, which enables the extraction of spatio-temporal features. Specifically, the use of 3D convolution in human behavior recognition has been the subject of growing interest. However, the increased dimensionality has led to challenges such as the dramatic increase in the number of parameters, increased time complexity, and a strong dependence on GPUs for effective spatio-temporal feature extraction. The training speed can be considerably slow without the support of powerful GPU hardware. To address these issues, this study proposes an Adaptive Time Compression (ATC) module. Functioning as an independent component, ATC can be seamlessly integrated into existing architectures and achieves data compression by eliminating redundant frames within video data. The ATC module effectively reduces GPU computing load and time complexity with negligible loss of accuracy, thereby facilitating real-time human behavior recognition.


Assuntos
Algoritmos , Compressão de Dados , Gravação em Vídeo , Humanos , Compressão de Dados/métodos , Atividades Humanas , Aprendizado Profundo , Processamento de Imagem Assistida por Computador/métodos , Reconhecimento Automatizado de Padrão/métodos
3.
J Neural Eng ; 21(3)2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38718785

RESUMO

Objective.Recently, the demand for wearable devices using electroencephalography (EEG) has increased rapidly in many fields. Due to its volume and computation constraints, wearable devices usually compress and transmit EEG to external devices for analysis. However, current EEG compression algorithms are not tailor-made for wearable devices with limited computing and storage. Firstly, the huge amount of parameters makes it difficult to apply in wearable devices; secondly, it is tricky to learn EEG signals' distribution law due to the low signal-to-noise ratio, which leads to excessive reconstruction error and suboptimal compression performance.Approach.Here, a feature enhanced asymmetric encoding-decoding network is proposed. EEG is encoded with a lightweight model, and subsequently decoded with a multi-level feature fusion network by extracting the encoded features deeply and reconstructing the signal through a two-branch structure.Main results.On public EEG datasets, motor imagery and event-related potentials, experimental results show that the proposed method has achieved the state of the art compression performance. In addition, the neural representation analysis and the classification performance of the reconstructed EEG signals also show that our method tends to retain more task-related information as the compression ratio increases and retains reliable discriminative information after EEG compression.Significance.This paper tailors an asymmetric EEG compression method for wearable devices that achieves state-of-the-art compression performance in a lightweight manner, paving the way for the application of EEG-based wearable devices.


Assuntos
Compressão de Dados , Eletroencefalografia , Eletroencefalografia/métodos , Compressão de Dados/métodos , Humanos , Dispositivos Eletrônicos Vestíveis , Redes Neurais de Computação , Algoritmos , Processamento de Sinais Assistido por Computador , Imaginação/fisiologia
4.
Bioinformatics ; 40(5)2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38759114

RESUMO

MOTIVATION: The quality scores data (QSD) account for 70% in compressed FastQ files obtained from the short and long reads sequencing technologies. Designing effective compressors for QSD that counterbalance compression ratio, time cost, and memory consumption is essential in scenarios such as large-scale genomics data sharing and long-term data backup. This study presents a novel parallel lossless QSD-dedicated compression algorithm named PQSDC, which fulfills the above requirements well. PQSDC is based on two core components: a parallel sequences-partition model designed to reduce peak memory consumption and time cost during compression and decompression processes, as well as a parallel four-level run-length prediction mapping model to enhance compression ratio. Besides, the PQSDC algorithm is also designed to be highly concurrent using multicore CPU clusters. RESULTS: We evaluate PQSDC and four state-of-the-art compression algorithms on 27 real-world datasets, including 61.857 billion QSD characters and 632.908 million QSD sequences. (1) For short reads, compared to baselines, the maximum improvement of PQSDC reaches 7.06% in average compression ratio, and 8.01% in weighted average compression ratio. During compression and decompression, the maximum total time savings of PQSDC are 79.96% and 84.56%, respectively; the maximum average memory savings are 68.34% and 77.63%, respectively. (2) For long reads, the maximum improvement of PQSDC reaches 12.51% and 13.42% in average and weighted average compression ratio, respectively. The maximum total time savings during compression and decompression are 53.51% and 72.53%, respectively; the maximum average memory savings are 19.44% and 17.42%, respectively. (3) Furthermore, PQSDC ranks second in compression robustness among the tested algorithms, indicating that it is less affected by the probability distribution of the QSD collections. Overall, our work provides a promising solution for QSD parallel compression, which balances storage cost, time consumption, and memory occupation primely. AVAILABILITY AND IMPLEMENTATION: The proposed PQSDC compressor can be downloaded from https://github.com/fahaihi/PQSDC.


Assuntos
Algoritmos , Compressão de Dados , Compressão de Dados/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos , Software , Humanos
5.
J Biomed Opt ; 29(Suppl 1): S11529, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38650979

RESUMO

Significance: Compressed sensing (CS) uses special measurement designs combined with powerful mathematical algorithms to reduce the amount of data to be collected while maintaining image quality. This is relevant to almost any imaging modality, and in this paper we focus on CS in photoacoustic projection imaging (PAPI) with integrating line detectors (ILDs). Aim: Our previous research involved rather general CS measurements, where each ILD can contribute to any measurement. In the real world, however, the design of CS measurements is subject to practical constraints. In this research, we aim at a CS-PAPI system where each measurement involves only a subset of ILDs, and which can be implemented in a cost-effective manner. Approach: We extend the existing PAPI with a self-developed CS unit. The system provides structured CS matrices for which the existing recovery theory cannot be applied directly. A random search strategy is applied to select the CS measurement matrix within this class for which we obtain exact sparse recovery. Results: We implement a CS PAPI system for a compression factor of 4:3, where specific measurements are made on separate groups of 16 ILDs. We algorithmically design optimal CS measurements that have proven sparse CS capabilities. Numerical experiments are used to support our results. Conclusions: CS with proven sparse recovery capabilities can be integrated into PAPI, and numerical results support this setup. Future work will focus on applying it to experimental data and utilizing data-driven approaches to enhance the compression factor and generalize the signal class.


Assuntos
Algoritmos , Desenho de Equipamento , Processamento de Imagem Assistida por Computador , Técnicas Fotoacústicas , Técnicas Fotoacústicas/métodos , Técnicas Fotoacústicas/instrumentação , Processamento de Imagem Assistida por Computador/métodos , Compressão de Dados/métodos , Imagens de Fantasmas
6.
Genome Biol ; 25(1): 106, 2024 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-38664753

RESUMO

Centrifuger is an efficient taxonomic classification method that compares sequencing reads against a microbial genome database. In Centrifuger, the Burrows-Wheeler transformed genome sequences are losslessly compressed using a novel scheme called run-block compression. Run-block compression achieves sublinear space complexity and is effective at compressing diverse microbial databases like RefSeq while supporting fast rank queries. Combining this compression method with other strategies for compacting the Ferragina-Manzini (FM) index, Centrifuger reduces the memory footprint by half compared to other FM-index-based approaches. Furthermore, the lossless compression and the unconstrained match length help Centrifuger achieve greater accuracy than competing methods at lower taxonomic levels.


Assuntos
Compressão de Dados , Metagenômica , Compressão de Dados/métodos , Metagenômica/métodos , Software , Genoma Microbiano , Genoma Bacteriano , Análise de Sequência de DNA/métodos
7.
J Proteome Res ; 23(5): 1702-1712, 2024 May 03.
Artigo em Inglês | MEDLINE | ID: mdl-38640356

RESUMO

Several lossy compressors have achieved superior compression rates for mass spectrometry (MS) data at the cost of storage precision. Currently, the impacts of precision losses on MS data processing have not been thoroughly evaluated, which is critical for the future development of lossy compressors. We first evaluated different storage precision (32 bit and 64 bit) in lossless mzML files. We then applied 10 truncation transformations to generate precision-lossy files: five relative errors for intensities and five absolute errors for m/z values. MZmine3 and XCMS were used for feature detection and GNPS for compound annotation. Lastly, we compared Precision, Recall, F1 - score, and file sizes between lossy files and lossless files under different conditions. Overall, we revealed that the discrepancy between 32 and 64 bit precision was under 1%. We proposed an absolute m/z error of 10-4 and a relative intensity error of 2 × 10-2, adhering to a 5% error threshold (F1 - scores above 95%). For a stricter 1% error threshold (F1 - scores above 99%), an absolute m/z error of 2 × 10-5 and a relative intensity error of 2 × 10-3 were advised. This guidance aims to help researchers improve lossy compression algorithms and minimize the negative effects of precision losses on downstream data processing.


Assuntos
Compressão de Dados , Espectrometria de Massas , Metabolômica , Espectrometria de Massas/métodos , Metabolômica/métodos , Metabolômica/estatística & dados numéricos , Compressão de Dados/métodos , Software , Humanos , Algoritmos
8.
Sci Rep ; 14(1): 5168, 2024 03 02.
Artigo em Inglês | MEDLINE | ID: mdl-38431641

RESUMO

Magnetic resonance imaging is a medical imaging technique to create comprehensive images of the tissues and organs in the body. This study presents an advanced approach for storing and compressing neuroimaging informatics technology initiative files, a standard format in magnetic resonance imaging. It is designed to enhance telemedicine services by facilitating efficient and high-quality communication between healthcare practitioners and patients. The proposed downsampling approach begins by opening the neuroimaging informatics technology initiative file as volumetric data and then planning it into several slice images. Then, the quantization hiding technique will be applied to each of the two consecutive slice images to generate the stego slice with the same size. This involves the following major steps: normalization, microblock generation, and discrete cosine transformation. Finally, it assembles the resultant stego slice images to produce the final neuroimaging informatics technology initiative file as volumetric data. The upsampling process, designed to be completely blind, reverses the downsampling steps to reconstruct the subsequent image slice accurately. The efficacy of the proposed method was evaluated using a magnetic resonance imaging dataset, focusing on peak signal-to-noise ratio, signal-to-noise ratio, structural similarity index, and Entropy as key performance metrics. The results demonstrate that the proposed approach not only significantly reduces file sizes but also maintains high image quality.


Assuntos
Compressão de Dados , Telemedicina , Humanos , Compressão de Dados/métodos , Imageamento por Ressonância Magnética/métodos , Neuroimagem , Razão Sinal-Ruído
9.
Sci Rep ; 14(1): 5087, 2024 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-38429300

RESUMO

When traditional EEG signals are collected based on the Nyquist theorem, long-time recordings of EEG signals will produce a large amount of data. At the same time, limited bandwidth, end-to-end delay, and memory space will bring great pressure on the effective transmission of data. The birth of compressed sensing alleviates this transmission pressure. However, using an iterative compressed sensing reconstruction algorithm for EEG signal reconstruction faces complex calculation problems and slow data processing speed, limiting the application of compressed sensing in EEG signal rapid monitoring systems. As such, this paper presents a non-iterative and fast algorithm for reconstructing EEG signals using compressed sensing and deep learning techniques. This algorithm uses the improved residual network model, extracts the feature information of the EEG signal by one-dimensional dilated convolution, directly learns the nonlinear mapping relationship between the measured value and the original signal, and can quickly and accurately reconstruct the EEG signal. The method proposed in this paper has been verified by simulation on the open BCI contest dataset. Overall, it is proved that the proposed method has higher reconstruction accuracy and faster reconstruction speed than the traditional CS reconstruction algorithm and the existing deep learning reconstruction algorithm. In addition, it can realize the rapid reconstruction of EEG signals.


Assuntos
Compressão de Dados , Aprendizado Profundo , Processamento de Sinais Assistido por Computador , Compressão de Dados/métodos , Algoritmos , Eletroencefalografia/métodos
10.
IEEE Trans Image Process ; 33: 2502-2513, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38526904

RESUMO

Residual coding has gained prevalence in lossless compression, where a lossy layer is initially employed and the reconstruction errors (i.e., residues) are then losslessly compressed. The underlying principle of the residual coding revolves around the exploration of priors based on context modeling. Herein, we propose a residual coding framework for 3D medical images, involving the off-the-shelf video codec as the lossy layer and a Bilateral Context Modeling based Network (BCM-Net) as the residual layer. The BCM-Net is proposed to achieve efficient lossless compression of residues through exploring intra-slice and inter-slice bilateral contexts. In particular, a symmetry-based intra-slice context extraction (SICE) module is proposed to mine bilateral intra-slice correlations rooted in the inherent anatomical symmetry of 3D medical images. Moreover, a bi-directional inter-slice context extraction (BICE) module is designed to explore bilateral inter-slice correlations from bi-directional references, thereby yielding representative inter-slice context. Experiments on popular 3D medical image datasets demonstrate that the proposed method can outperform existing state-of-the-art methods owing to efficient redundancy reduction. Our code will be available on GitHub for future research.


Assuntos
Compressão de Dados , Compressão de Dados/métodos , Imageamento Tridimensional/métodos
11.
Eur J Radiol ; 175: 111418, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38490130

RESUMO

PURPOSE: To investigate the potential of combining Compressed Sensing (CS) and a newly developed AI-based super resolution reconstruction prototype consisting of a series of convolutional neural networks (CNN) for a complete five-minute 2D knee MRI protocol. METHODS: In this prospective study, 20 volunteers were examined using a 3T-MRI-scanner (Ingenia Elition X, Philips). Similar to clinical practice, the protocol consists of a fat-saturated 2D-proton-density-sequence in coronal, sagittal and transversal orientation as well as a sagittal T1-weighted sequence. The sequences were acquired with two different resolutions (standard and low resolution) and the raw data reconstructed with two different reconstruction algorithms: a conventional Compressed SENSE (CS) and a new CNN-based algorithm for denoising and subsequently to interpolate and therewith increase the sharpness of the image (CS-SuperRes). Subjective image quality was evaluated by two blinded radiologists reviewing 8 criteria on a 5-point Likert scale and signal-to-noise ratio calculated as an objective parameter. RESULTS: The protocol reconstructed with CS-SuperRes received higher ratings than the time-equivalent CS reconstructions, statistically significant especially for low resolution acquisitions (e.g., overall image impression: 4.3 ±â€¯0.4 vs. 3.4 ±â€¯0.4, p < 0.05). CS-SuperRes reconstructions for the low resolution acquisition were comparable to traditional CS reconstructions with standard resolution for all parameters, achieving a scan time reduction from 11:01 min to 4:46 min (57 %) for the complete protocol (e.g. overall image impression: 4.3 ±â€¯0.4 vs. 4.0 ±â€¯0.5, p < 0.05). CONCLUSION: The newly-developed AI-based reconstruction algorithm CS-SuperRes allows to reduce scan time by 57% while maintaining unchanged image quality compared to the conventional CS reconstruction.


Assuntos
Algoritmos , Voluntários Saudáveis , Articulação do Joelho , Imageamento por Ressonância Magnética , Humanos , Imageamento por Ressonância Magnética/métodos , Masculino , Feminino , Estudos Prospectivos , Adulto , Articulação do Joelho/diagnóstico por imagem , Compressão de Dados/métodos , Redes Neurais de Computação , Pessoa de Meia-Idade , Razão Sinal-Ruído , Interpretação de Imagem Assistida por Computador/métodos , Adulto Jovem
12.
Eur J Radiol ; 175: 111445, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38537605

RESUMO

PURPOSE: To evaluate the feasibility of a free-breathing sequence (4D FreeBreathing) combined with Compressed SENSE in dynamic contrast-enhanced pancreatic MRI and compare it with a breath-holding sequence (eTHRIVE). METHOD: Patients who underwent pancreatic MRI, either eTHRIVE or 4D FreeBreathing, from April 2022 to November 2023 were included in this retrospective study. Two radiologists, who were unaware of the scan sequence, independently and randomly reviewed the images at the precontrast, pancreatic, portal venous, and equilibrium phases and assigned confidence scores for motion and streaking artifacts, pancreatic sharpness, and overall image quality using a 5-point scale. Furthermore, the radiologists assessed the appropriateness of the scan timing of the pancreatic phase. Mann-Whitney U and Fisher's exact tests were conducted to compare the confidence scores and adequacy of the pancreatic phase scan timing between eTHRIVE and 4D FreeBreathing. RESULTS: Overall, 48 patients (median age, 71 years; interquartile range, 64-77 years; 24 women) were included. Among them, 20 patients (42%) were scanned using 4D FreeBreathing. The 4D FreeBreathing showed moderate streaking artifact but improved motion artifact (P <.001-.17) at all phases. Pancreatic sharpness and overall image quality were almost comparable between two sequences (P = .17-.96). All 20 examinations in 4D FreeBreathing showed appropriate pancreatic phase images, but only 16 (57%; P <.001 for reviewer 1) and 18 (64%; P = .003 for reviewer 2) examinations showed it in eTHRIVE. CONCLUSION: The use of 4D FreeBreathing combined with Compressed SENSE was feasible in pancreatic MRI and provided appropriate pancreatic phase images in all examinations.


Assuntos
Meios de Contraste , Estudos de Viabilidade , Imageamento por Ressonância Magnética , Humanos , Feminino , Masculino , Pessoa de Meia-Idade , Idoso , Estudos Retrospectivos , Imageamento por Ressonância Magnética/métodos , Artefatos , Respiração , Aumento da Imagem/métodos , Suspensão da Respiração , Compressão de Dados/métodos , Neoplasias Pancreáticas/diagnóstico por imagem , Pâncreas/diagnóstico por imagem , Pancreatopatias/diagnóstico por imagem
13.
BMC Genomics ; 25(1): 266, 2024 Mar 09.
Artigo em Inglês | MEDLINE | ID: mdl-38461245

RESUMO

BACKGROUND: DNA storage has the advantages of large capacity, long-term stability, and low power consumption relative to other storage mediums, making it a promising new storage medium for multimedia information such as images. However, DNA storage has a low coding density and weak error correction ability. RESULTS: To achieve more efficient DNA storage image reconstruction, we propose DNA-QLC (QRes-VAE and Levenshtein code (LC)), which uses the quantized ResNet VAE (QRes-VAE) model and LC for image compression and DNA sequence error correction, thus improving both the coding density and error correction ability. Experimental results show that the DNA-QLC encoding method can not only obtain DNA sequences that meet the combinatorial constraints, but also have a net information density that is 2.4 times higher than DNA Fountain. Furthermore, at a higher error rate (2%), DNA-QLC achieved image reconstruction with an SSIM value of 0.917. CONCLUSIONS: The results indicate that the DNA-QLC encoding scheme guarantees the efficiency and reliability of the DNA storage system and improves the application potential of DNA storage for multimedia information such as images.


Assuntos
Algoritmos , Compressão de Dados , Reprodutibilidade dos Testes , DNA/genética , Compressão de Dados/métodos , Processamento de Imagem Assistida por Computador/métodos
14.
IUCrJ ; 11(Pt 2): 190-201, 2024 Mar 01.
Artigo em Inglês | MEDLINE | ID: mdl-38327201

RESUMO

Serial crystallography (SX) has become an established technique for protein structure determination, especially when dealing with small or radiation-sensitive crystals and investigating fast or irreversible protein dynamics. The advent of newly developed multi-megapixel X-ray area detectors, capable of capturing over 1000 images per second, has brought about substantial benefits. However, this advancement also entails a notable increase in the volume of collected data. Today, up to 2 PB of data per experiment could be easily obtained under efficient operating conditions. The combined costs associated with storing data from multiple experiments provide a compelling incentive to develop strategies that effectively reduce the amount of data stored on disk while maintaining the quality of scientific outcomes. Lossless data-compression methods are designed to preserve the information content of the data but often struggle to achieve a high compression ratio when applied to experimental data that contain noise. Conversely, lossy compression methods offer the potential to greatly reduce the data volume. Nonetheless, it is vital to thoroughly assess the impact of data quality and scientific outcomes when employing lossy compression, as it inherently involves discarding information. The evaluation of lossy compression effects on data requires proper data quality metrics. In our research, we assess various approaches for both lossless and lossy compression techniques applied to SX data, and equally importantly, we describe metrics suitable for evaluating SX data quality.


Assuntos
Algoritmos , Compressão de Dados , Cristalografia , Compressão de Dados/métodos , Tomografia Computadorizada por Raios X
15.
Magn Reson Imaging ; 108: 116-128, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38325727

RESUMO

To improve the efficiency of multi-coil data compression and recover the compressed image reversibly, increasing the possibility of applying the proposed method to medical scenarios. A deep learning algorithm is employed for MR coil compression in the presented work. The approach introduces a variable augmentation network for invertible coil compression (VAN-ICC). This network utilizes the inherent reversibility of normalizing flow-based models. The aim is to enhance the readability of the sentence and clearly convey the key components of the algorithm. By applying the variable augmentation technology to image/k-space variables from multi-coils, VAN-ICC trains the invertible network by finding an invertible and bijective function, which can map the original data to the compressed counterpart and vice versa. Experiments conducted on both fully-sampled and under-sampled data verified the effectiveness and flexibility of VAN-ICC. Quantitative and qualitative comparisons with traditional non-deep learning-based approaches demonstrated that VAN-ICC carries much higher compression effects. The proposed method trains the invertible network by finding an invertible and bijective function, which improves the defects of traditional coil compression method by utilizing inherent reversibility of normalizing flow-based models. In addition, the application of variable augmentation technology ensures the implementation of reversible networks. In short, VAN-ICC offered a competitive advantage over other traditional coil compression algorithms.


Assuntos
Compressão de Dados , Compressão de Dados/métodos , Imageamento por Ressonância Magnética/métodos , Algoritmos , Processamento de Imagem Assistida por Computador/métodos
16.
Bioinformatics ; 40(3)2024 Mar 04.
Artigo em Inglês | MEDLINE | ID: mdl-38377404

RESUMO

MOTIVATION: Seeding is a rate-limiting stage in sequence alignment for next-generation sequencing reads. The existing optimization algorithms typically utilize hardware and machine-learning techniques to accelerate seeding. However, an efficient solution provided by professional next-generation sequencing compressors has been largely overlooked by far. In addition to achieving remarkable compression ratios by reordering reads, these compressors provide valuable insights for downstream alignment that reveal the repetitive computations accounting for more than 50% of seeding procedure in commonly used short read aligner BWA-MEM at typical sequencing coverage. Nevertheless, the exploited redundancy information is not fully realized or utilized. RESULTS: In this study, we present a compressive seeding algorithm, named CompSeed, to fill the gap. CompSeed, in collaboration with the existing reordering-based compression tools, finishes the BWA-MEM seeding process in about half the time by caching all intermediate seeding results in compact trie structures to directly answer repetitive inquiries that frequently cause random memory accesses. Furthermore, CompSeed demonstrates better performance as sequencing coverage increases, as it focuses solely on the small informative portion of sequencing reads after compression. The innovative strategy highlights the promising potential of integrating sequence compression and alignment to tackle the ever-growing volume of sequencing data. AVAILABILITY AND IMPLEMENTATION: CompSeed is available at https://github.com/i-xiaohu/CompSeed.


Assuntos
Compressão de Dados , Software , Análise de Sequência de DNA/métodos , Algoritmos , Compressão de Dados/métodos , Computadores , Sequenciamento de Nucleotídeos em Larga Escala/métodos
17.
Gene ; 907: 148235, 2024 May 20.
Artigo em Inglês | MEDLINE | ID: mdl-38342250

RESUMO

Next Generation Sequencing (NGS) technology generates massive amounts of genome sequence that increases rapidly over time. As a result, there is a growing need for efficient compression algorithms to facilitate the processing, storage, transmission, and analysis of large-scale genome sequences. Over the past 31 years, numerous state-of-the-art compression algorithms have been developed. The performance of any compression algorithm is measured by three main compression metrics: compression ratio, time, and memory usage. Existing k-mer hash indexing systems take more time, due to the decision-making process based on compression results. In this paper, we propose a two-phase reference genome compression algorithm using optimal k-mer length (RGCOK). Reference-based compression takes advantage of the inter-similarity between chromosomes of the same species. RGCOK achieves this by finding the optimal k-mer length for matching, using a randomization method and hashing. The performance of RGCOK was evaluated on three different benchmark data sets: novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), Homo sapiens, and other species sequences using an Amazon AWS virtual cloud machine. Experiments showed that the optimal k-mer finding time by RGCOK is around 45.28 min, whereas the time for existing state-of-the-art algorithms HiRGC, SCCG, and HRCM ranges from 58 min to 8.97 h.


Assuntos
Compressão de Dados , Software , Humanos , Compressão de Dados/métodos , Algoritmos , Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de DNA/métodos
18.
Artigo em Inglês | MEDLINE | ID: mdl-38358865

RESUMO

Revolutionary advances in DNA sequencing technologies fundamentally change the nature of genomics. Today's sequencing technologies have opened into an outburst in genomic data volume. These data can be used in various applications where long-term storage and analysis of genomic sequence data are required. Data-specific compression algorithms can effectively manage a large volume of data. In recent times, deep learning has achieved great success in many compression tools and is gradually being used in genomic sequence compression. Significantly, autoencoder has been applied in dimensionality reduction, compact representations of data, and generative model learning. It can use convolutional layers to learn essential features from input data, which is better for image and series data. Autoencoder reconstructs the input data with some loss of information. Since accuracy is critical in genomic data, compressed genomic data must be decompressed without any information loss. We introduce a new scheme to address the loss incurred in the decompressed data of the autoencoder. This paper proposes a novel algorithm called GenCoder for reference-free compression of genomic sequences using a convolutional autoencoder and regenerating the genomic sequences from a latent code produced by the autoencoder, and retrieving original data losslessly. Performance evaluation is conducted on various genomes and benchmarked datasets. The experimental results on the tested data demonstrate that the deep learning model used in the proposed compression algorithm generalizes well for genomic sequence data and achieves a compression gain of 27% over the best state-of-the-art method.


Assuntos
Algoritmos , Compressão de Dados , Genômica , Redes Neurais de Computação , Análise de Sequência de DNA , Compressão de Dados/métodos , Genômica/métodos , Análise de Sequência de DNA/métodos , Humanos , Aprendizado Profundo
19.
IEEE Trans Pattern Anal Mach Intell ; 46(7): 4579-4596, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38252583

RESUMO

Almost all digital videos are coded into compact representations before being transmitted. Such compact representations need to be decoded back to pixels before being displayed to humans and - as usual - before being enhanced/analyzed by machine vision algorithms. Intuitively, it is more efficient to enhance/analyze the coded representations directly without decoding them into pixels. Therefore, we propose a versatile neural video coding (VNVC) framework, which targets learning compact representations to support both reconstruction and direct enhancement/analysis, thereby being versatile for both human and machine vision. Our VNVC framework has a feature-based compression loop. In the loop, one frame is encoded into compact representations and decoded to an intermediate feature that is obtained before performing reconstruction. The intermediate feature can be used as reference in motion compensation and motion estimation through feature-based temporal context mining and cross-domain motion encoder-decoder to compress the following frames. The intermediate feature is directly fed into video reconstruction, video enhancement, and video analysis networks to evaluate its effectiveness. The evaluation shows that our framework with the intermediate feature achieves high compression efficiency for video reconstruction and satisfactory task performances with lower complexities.


Assuntos
Algoritmos , Compressão de Dados , Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Gravação em Vídeo , Humanos , Processamento de Imagem Assistida por Computador/métodos , Compressão de Dados/métodos
20.
IEEE Trans Biomed Circuits Syst ; 18(3): 691-701, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38285576

RESUMO

Conventional in vivo neural signal processing involves extracting spiking activity within the recorded signals from an ensemble of neurons and transmitting only spike counts over an adequate interval. However, for brain-computer interface (BCI) applications utilizing continuous local field potentials (LFPs) for cognitive decoding, the volume of neural data to be transmitted to a computer imposes relatively high data rate requirements. This is particularly true for BCIs employing high-density intracortical recordings with hundreds or thousands of electrodes. This article introduces the first autoencoder-based compression digital circuit for the efficient transmission of LFP neural signals. Various algorithmic and architectural-level optimizations are implemented to significantly reduce the computational complexity and memory requirements of the designed in vivo compression circuit. This circuit employs an autoencoder-based neural network, providing a robust signal reconstruction. The application-specific integrated circuit (ASIC) of the in vivo compression logic occupies the smallest silicon area and consumes the lowest power among the reported state-of-the-art compression ASICs. Additionally, it offers a higher compression rate and a superior signal-to-noise and distortion ratio.


Assuntos
Algoritmos , Interfaces Cérebro-Computador , Compressão de Dados , Redes Neurais de Computação , Processamento de Sinais Assistido por Computador , Compressão de Dados/métodos , Animais , Neurônios/fisiologia , Eletroencefalografia/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA