Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 98
Filter
1.
Heliyon ; 10(12): e33184, 2024 Jun 30.
Article in English | MEDLINE | ID: mdl-39005912

ABSTRACT

Long pulse thermography (LPT) and shearography have been developed as primary methods for detecting debonding or delamination defects in composites due to their full-field imaging, non-contact operation, and high detection efficiency. Both methods utilize halogen lamps as the excitation source for thermal loading. However, the defects detected by the two techniques differ due to their distinct inspection mechanisms. In this study, LPT and shearography are employed to evaluate internal damage in various composite structures. The experimental results demonstrate that LPT, when combined with thermal signal processing algorithms, can clearly detect debonding defects in rubber-to-metal bonded plates, whereas excessive adhesive defects can only be identified by shearography. Flat-bottom holes in the CFRP panel can only be detected by LPT, and shearography is particularly effective for detecting composite materials with a metal skin. For the quantitative measurement of defect sizes, the average errors of the rubber-to-metal bonded plate and CFRP panel using LPT are 4.9 % and 2.2 %, respectively, whereas the average errors of the rubber-to-metal bonded plate and aluminum honeycomb panel using shearography are 15.12 % and 95.4 %, respectively. This indicates that LPT is superior to shearography in quantitatively measuring defect sizes. These two nondestructive testing methods, based on different principles, each have their own advantages and disadvantages. Employing a multi-modal inspection method can leverage their complementary advantages, preventing misdetection and leakage of internal defects in composites.

2.
Methods ; 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38972499

ABSTRACT

Molecular simulation (MD) is a crucial research domain within the life sciences, focusing on comprehending the mechanisms of biomolecular interactions at atomic scales. Protein simulation, as a critical subfield, often utilizes MD for implementation, with trajectory data play a pivotal role in drug discovery. The advancement of high-performance computing and deep learning technology becomes popular and critical to predict protein properties from vast trajectory data, posing challenges regarding data features extraction from the complicated simulation data and dimensionality reduction. Simultaneously, it is essential to provide a meaningful explanation of the biological mechanism behind dimensionality. To tackle this challenge, we propose a new unsupervised model named RevGraphVAMP to intelligently analyze the simulation trajectory. This model is based on the variational approach for Markov processes (VAMP) and integrates graph convolutional neural networks and physical constraint optimization to enhance the learning performance. Additionally, we introduce attention mechanism to assess the importance of key interaction region, facilitating the interpretation of molecular mechanism. In comparison to other VAMPNets models, our model showcases competitive performance, improved accuracy in state transition prediction, as demonstrated through its application to two public datasets and the Shank3-Rap1 complex, which is associated with autism spectrum disorder. Moreover, it enhanced dimensionality reduction discrimination across different substates and provides interpretable results for protein structural characterization.

3.
Article in English | MEDLINE | ID: mdl-38905083

ABSTRACT

The amount of genetic data generated by Next Generation Sequencing (NGS) technologies grows faster than Moore's law. This necessitates the development of efficient NGS data processing and analysis algorithms. A filter before the computationally-costly analysis step can significantly reduce the run time of the NGS data analysis. As GPUs are orders of magnitude more powerful than CPUs, this paper proposes a GPU-friendly pre-align filtering algorithm named SeedHit for the fast processing of NGS data. Inspired by BLAST, SeedHit counts seed hits between two sequences to determine their similarity. In SeedHit, a nucleic acid in a gene sequence is presented in binary format. By packaging data and generating a lookup table that fits into the L1 cache, SeedHit is GPU-friendly and high- throughput. Using three 16 s rRNA datasets from Greengenes as input SeedHit can reject 84%-89% dissimilar sequence pairs on average when the similarity is 0.9-0.99. The throughput of SeedHit achieved 1 T/s (Tera base per second) on 3080 Ti. Compared with the other two GPU-based filtering algorithms, GateKeeper and SneakySnake, SeedHit has the highest rejection rate and throughput. By incorporating SeedHit into our in-house clustering algorithm nGIA, the modified nGIA achieved a 1.6-2.1 times speedup compared to the original version.

4.
Biosens Bioelectron ; 259: 116403, 2024 Sep 01.
Article in English | MEDLINE | ID: mdl-38776802

ABSTRACT

Robust encapsulation and controllable release of biomolecules have wide biomedical applications ranging from biosensing, drug delivery to information storage. However, conventional biomolecule encapsulation strategies have limitations in complicated operations, optical instability, and difficulty in decapsulation. Here, we report a simple, robust, and solvent-free biomolecule encapsulation strategy based on gallium liquid metal featuring low-temperature phase transition, self-healing, high hermetic sealing, and intrinsic resistance to optical damage. We sandwiched the biomolecules with the solid gallium films followed by low-temperature welding of the films for direct sealing. The gallium can not only protect DNA and enzymes from various physical and chemical damages but also allow the on-demand release of biomolecules by applying vibration to break the liquid gallium. We demonstrated that a DNA-coded image file can be recovered with up to 99.9% sequence retention after an accelerated aging test. We also showed the practical applications of the controllable release of bioreagents in a one-pot RPA-CRISPR/Cas12a reaction for SARS-COV-2 screening with a low detection limit of 10 copies within 40 min. This work may facilitate the development of robust and stimuli-responsive biomolecule capsules by using low-melting metals for biotechnology.


Subject(s)
Biosensing Techniques , Phase Transition , SARS-CoV-2 , Biosensing Techniques/methods , SARS-CoV-2/isolation & purification , COVID-19/virology , Gallium/chemistry , Humans , DNA/chemistry , CRISPR-Cas Systems , Capsules/chemistry
5.
Sci Total Environ ; 928: 172592, 2024 Jun 10.
Article in English | MEDLINE | ID: mdl-38642768

ABSTRACT

Submerged plants affect nitrogen cycling in aquatic ecosystems. However, whether and how submerged plants change nitrous oxide (N2O) production mechanism and emissions flux remains controversial. Current research primarily focuses on the feedback from N2O release to variation of substrate level and microbial communities. It is deficient in connecting the relative contribution of individual N2O production processes (i.e., the N2O partition). Here, we attempted to offer a comprehensive understanding of the N2O mitigation mechanism in aquatic ecosystems on the Changjiang River Delta according to stable isotopic techniques, metagenome-assembly genome analysis, and statistical analysis. We found that the submerged plant reduced 45 % of N2O emissions by slowing down the dissolved inorganic nitrogen conversion velocity to N2O in sediment (Vf-[DIN]sed). It was attributed to changing the N2O partition and suppressing the potential capacity of net N2O production (i.e., nor/nosZ). The dominated production processes showed a shift with increasing excess N2O. Meanwhile, distinct shift thresholds of planted and unplanted habitats reflected different mechanisms of stimulated N2O production. The hotspot zone of N2O production corresponded to high nor/nosZ and unsaturated oxygen (O2) in unplanted habitat. In contrast, planted habitat hotspot has lower nor/nosZ and supersaturated O2. O2 from photosynthesis critically impacted the activities of N2O producers and consumers. In summary, the presence of submerged plants is beneficial to mitigate N2O emissions from aquatic ecosystems.


Subject(s)
Ecosystem , Nitrous Oxide , Rivers , China , Rivers/chemistry , Nitrous Oxide/analysis , Plants , Environmental Monitoring , Air Pollutants/analysis
6.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38581420

ABSTRACT

Protein-ligand interaction prediction presents a significant challenge in drug design. Numerous machine learning and deep learning (DL) models have been developed to accurately identify docking poses of ligands and active compounds against specific targets. However, current models often suffer from inadequate accuracy or lack practical physical significance in their scoring systems. In this research paper, we introduce IGModel, a novel approach that utilizes the geometric information of protein-ligand complexes as input for predicting the root mean square deviation of docking poses and the binding strength (pKd, the negative value of the logarithm of binding affinity) within the same prediction framework. This ensures that the output scores carry intuitive meaning. We extensively evaluate the performance of IGModel on various docking power test sets, including the CASF-2016 benchmark, PDBbind-CrossDocked-Core and DISCO set, consistently achieving state-of-the-art accuracies. Furthermore, we assess IGModel's generalizability and robustness by evaluating it on unbiased test sets and sets containing target structures generated by AlphaFold2. The exceptional performance of IGModel on these sets demonstrates its efficacy. Additionally, we visualize the latent space of protein-ligand interactions encoded by IGModel and conduct interpretability analysis, providing valuable insights. This study presents a novel framework for DL-based prediction of protein-ligand interactions, contributing to the advancement of this field. The IGModel is available at GitHub repository https://github.com/zchwang/IGModel.


Subject(s)
Deep Learning , Proteins , Proteins/chemistry , Protein Binding , Ligands , Drug Design
7.
Methods ; 224: 35-46, 2024 Apr.
Article in English | MEDLINE | ID: mdl-38373678

ABSTRACT

Bivalent Smac mimetics have been shown to possess binding affinity and pro-apoptotic activity similar to or more potent than that of native Smac, a protein dimer able to neutralize the anti-apoptotic activity of an inhibitor of caspase enzymes, XIAP, which endows cancer cells with resistance to anticancer drugs. We design five new bivalent Smac mimetics, which are formed by various linkers tethering two diazabicyclic cores being the IAP binding motifs. We built in silico models of the five mimetics by the TwistDock workflow and evaluated their conformational tendency, which suggests that compound 3, whose linker is n-hexylene, possess the highest binding potency among the five. After synthesis of these compounds, their ability in tumour cell growth inhibition and apoptosis induction displayed in experiments with SK-OV-3 and MDA-MB-231 cancer cell lines confirms our prediction. Among the five mimetics, compound 3 displays promising pro-apoptotic activity and deserves further optimization.


Subject(s)
Antineoplastic Agents , Neoplasms , Humans , Inhibitor of Apoptosis Proteins/metabolism , Inhibitor of Apoptosis Proteins/pharmacology , X-Linked Inhibitor of Apoptosis Protein/metabolism , X-Linked Inhibitor of Apoptosis Protein/pharmacology , Antineoplastic Agents/pharmacology , Antineoplastic Agents/chemistry , Molecular Conformation , Apoptosis , Cell Line, Tumor
8.
Int J Mol Sci ; 24(22)2023 Nov 07.
Article in English | MEDLINE | ID: mdl-38003217

ABSTRACT

The automatic detection of cells in microscopy image sequences is a significant task in biomedical research. However, routine microscopy images with cells, which are taken during the process whereby constant division and differentiation occur, are notoriously difficult to detect due to changes in their appearance and number. Recently, convolutional neural network (CNN)-based methods have made significant progress in cell detection and tracking. However, these approaches require many manually annotated data for fully supervised training, which is time-consuming and often requires professional researchers. To alleviate such tiresome and labor-intensive costs, we propose a novel weakly supervised learning cell detection and tracking framework that trains the deep neural network using incomplete initial labels. Our approach uses incomplete cell markers obtained from fluorescent images for initial training on the Induced Pluripotent Stem (iPS) cell dataset, which is rarely studied for cell detection and tracking. During training, the incomplete initial labels were updated iteratively by combining detection and tracking results to obtain a model with better robustness. Our method was evaluated using two fields of the iPS cell dataset, along with the cell detection accuracy (DET) evaluation metric from the Cell Tracking Challenge (CTC) initiative, and it achieved 0.862 and 0.924 DET, respectively. The transferability of the developed model was tested using the public dataset FluoN2DH-GOWT1, which was taken from CTC; this contains two datasets with reference annotations. We randomly removed parts of the annotations in each labeled data to simulate the initial annotations on the public dataset. After training the model on the two datasets, with labels that comprise 10% cell markers, the DET improved from 0.130 to 0.903 and 0.116 to 0.877. When trained with labels that comprise 60% cell markers, the performance was better than the model trained using the supervised learning method. This outcome indicates that the model's performance improved as the quality of the labels used for training increased.


Subject(s)
Neural Networks, Computer , Supervised Machine Learning , Image Processing, Computer-Assisted/methods
9.
Brief Bioinform ; 24(6)2023 09 22.
Article in English | MEDLINE | ID: mdl-37833842

ABSTRACT

Recent studies have shed light on the potential of circular RNA (circRNA) as a biomarker for disease diagnosis and as a nucleic acid vaccine. The exploration of these functionalities requires correct circRNA full-length sequences; however, existing assembly tools can only correctly assemble some circRNAs, and their performance can be further improved. Here, we introduce a novel feature known as the junction contig (JC), which is an extension of the back-splice junction (BSJ). Leveraging the strengths of both BSJ and JC, we present a novel method called JCcirc (https://github.com/cbbzhang/JCcirc). It enables efficient reconstruction of all types of circRNA full-length sequences and their alternative isoforms using splice graphs and fragment coverage. Our findings demonstrate the superiority of JCcirc over existing methods on human simulation datasets, and its average F1 score surpasses CircAST by 0.40 and outperforms both CIRI-full and circRNAfull by 0.13. For circRNAs below 400 bp, 400-800 bp, 800 bp-1200 bp and above 1200 bp, the correct assembly rates are 0.13, 0.09, 0.04 and 0.03 higher, respectively, than those achieved by existing methods. Moreover, JCcirc also outperforms existing assembly tools on other five model species datasets and real sequencing datasets. These results show that JCcirc is a robust tool for accurately assembling circRNA full-length sequences, laying the foundation for the functional analysis of circRNAs.


Subject(s)
RNA, Circular , RNA , Humans , RNA, Circular/genetics , Sequence Analysis, RNA/methods , Protein Isoforms/genetics , RNA/genetics
10.
Proteins ; 91(12): 1837-1849, 2023 Dec.
Article in English | MEDLINE | ID: mdl-37606194

ABSTRACT

We introduce a deep learning-based ligand pose scoring model called zPoseScore for predicting protein-ligand complexes in the 15th Critical Assessment of Protein Structure Prediction (CASP15). Our contributions are threefold: first, we generate six training and evaluation data sets by employing advanced data augmentation and sampling methods. Second, we redesign the "zFormer" module, inspired by AlphaFold2's Evoformer, to efficiently describe protein-ligand interactions. This module enables the extraction of protein-ligand paired features that lead to accurate predictions. Finally, we develop the zPoseScore framework with zFormer for scoring and ranking ligand poses, allowing for atomic-level protein-ligand feature encoding and fusion to output refined ligand poses and ligand per-atom deviations. Our results demonstrate excellent performance on various testing data sets, achieving Pearson's correlation R = 0.783 and 0.659 for ranking docking decoys generated based on experimental and predicted protein structures of CASF-2016 protein-ligand complexes. Additionally, we obtain an averaged local distance difference test (lDDT pli = 0.558) of AIchemy LIG2 in CASP15 for de novo protein-ligand complex structure predictions. Detailed analysis shows that accurate ligand binding site prediction and side-chain orientation are crucial for achieving better prediction performance. Our proposed model is one of the most accurate protein-ligand pose prediction models and could serve as a valuable tool in small molecule drug discovery.


Subject(s)
Proteins , Ligands , Protein Binding , Proteins/chemistry , Binding Sites , Molecular Docking Simulation
11.
J Comput Biol ; 30(9): 951-960, 2023 09.
Article in English | MEDLINE | ID: mdl-37585615

ABSTRACT

Spiking neural network (SNN) simulators play an important role in neural system modeling and brain function research. They can help scientists reproduce and explore neuronal activities in brain regions, neuroscience, brain-like computing, and other fields and can also be applied to artificial intelligence, machine learning, and other fields. At present, many simulators using central processing unit (CPU) or graphics processing unit (GPU) have been developed. However, due to the randomness of connections between neurons and spiking events in SNN simulation, this causes a lot of memory access time. To alleviate this problem, we developed an SNN simulator SWsnn based on the new Sunway SW26010pro processor. The SW26010pro processor consists of six core groups, each with 16 MB of local data memory (LDM). LDM has the characteristics of high-speed read and write, which is suitable for performing simulation tasks similar to SNNs. Experimental results show that SWsnn runs faster than other mainstream GPU-based simulators when simulating a certain scale of neural network, showing a strong performance advantage. To conduct larger scale simulations, SWsnn designed a simulation computation based on a large shared model of Sunway processor and developed a multiprocessor version of SWsnn based on this mode, achieving larger scale SNN simulations.


Subject(s)
Artificial Intelligence , Neural Networks, Computer , Computer Simulation , Neurons/physiology , Brain
13.
Front Genet ; 14: 1248519, 2023.
Article in English | MEDLINE | ID: mdl-37485341

ABSTRACT

[This corrects the article DOI: 10.3389/fgene.2022.816825.].

14.
Front Mol Biosci ; 10: 1249019, 2023.
Article in English | MEDLINE | ID: mdl-37469706

ABSTRACT

[This corrects the article DOI: 10.3389/fmolb.2022.857320.].

15.
Methods ; 216: 39-50, 2023 08.
Article in English | MEDLINE | ID: mdl-37330158

ABSTRACT

Assessing the quality of sequencing data plays a crucial role in downstream data analysis. However, existing tools often achieve sub-optimal efficiency, especially when dealing with compressed files or performing complicated quality control operations such as over-representation analysis and error correction. We present RabbitQCPlus, an ultra-efficient quality control tool for modern multi-core systems. RabbitQCPlus uses vectorization, memory copy reduction, parallel (de)compression, and optimized data structures to achieve substantial performance gains. It is 1.1 to 5.4 times faster when performing basic quality control operations compared to state-of-the-art applications yet requires fewer compute resources. Moreover, RabbitQCPlus is at least 4 times faster than other applications when processing gzip-compressed FASTQ files and 1.3 times faster with the error correction module turned on. Furthermore, it takes less than 4 minutes to process 280 GB of plain FASTQ sequencing data, while other applications take at least 22 minutes on a 48-core server when enabling the per-read over-representation analysis. C++ sources are available at https://github.com/RabbitBio/RabbitQCPlus.


Subject(s)
Data Compression , Software , High-Throughput Nucleotide Sequencing , Quality Control , Algorithms , Sequence Analysis, DNA
16.
Genome Biol ; 24(1): 121, 2023 05 17.
Article in English | MEDLINE | ID: mdl-37198663

ABSTRACT

We present RabbitTClust, a fast and memory-efficient genome clustering tool based on sketch-based distance estimation. Our approach enables efficient processing of large-scale datasets by combining dimensionality reduction techniques with streaming and parallelization on modern multi-core platforms. 113,674 complete bacterial genome sequences from RefSeq, 455 GB in FASTA format, can be clustered within less than 6 min and 1,009,738 GenBank assembled bacterial genomes, 4.0 TB in FASTA format, within only 34 min on a 128-core workstation. Our results further identify 1269 redundant genomes, with identical nucleotide content, in the RefSeq bacterial genomes database.


Subject(s)
Genome , Software , Databases, Nucleic Acid , Cluster Analysis , Bacteria , Algorithms , Genome, Bacterial
17.
Environ Res ; 227: 115710, 2023 06 15.
Article in English | MEDLINE | ID: mdl-36933634

ABSTRACT

Vegetation restoration projects can not only improve water quality by absorbing and transferring pollutants and nutrients from non-vegetation sources, but also protect biodiversity by providing habitat for biological growth. However, the mechanism of the protistan and bacterial assembly processes in the vegetation restoration project were rarely explored. To address this, based on 18 S rRNA and 16 S rRNA high-throughput sequencing, we investigated the mechanism of protistan and bacterial community assembly processes, environmental conditions, and microbial interactions in the rivers with (out) vegetation restoration. The results indicated that the deterministic process dominated the protistan and bacterial community assembly (94.29% and 92.38%), influenced by biotic and abiotic factors. For biotic factors, microbial network connectivity was higher in the vegetation zone (average degree = 20.34) than in the bare zone (average degree = 11.00). For abiotic factors, the concentration of dissolved organic carbon ([DOC]) was the most important environmental factor affecting the microbial community composition. [DOC] was lower significantly in vegetation zone (18.65 ± 6.34 mg/L) than in the bare zone (28.22 ± 4.82 mg/L). In overlying water, vegetation restoration upregulated the protein-like fluorescence components (C1 and C2) by 1.26 and 1.01-folds and downregulated the terrestrial humic-like fluorescence components (C3 and C4) by 0.54 and 0.55-folds, respectively. The different DOM components guided bacteria and protists to select different interactive relationships. The protein-like DOM components led to bacterial competition, whereas the humus-like DOM components resulted in protistan competition. Finally, the structural equation model was established to explain that DOM components can affect protistan and bacterial diversity by providing substrates, facilitating microbial interactions, and promoting nutrient input. In general, our study provides insights into the responses of vegetation restored ecosystems to the dynamics and interactives in the anthropogenically influenced river and evaluates the ecological restoration performance of vegetation restoration from a molecular biology perspective.


Subject(s)
Dissolved Organic Matter , Microbiota , Rivers/chemistry , Water Quality , Bacteria/genetics , Spectrometry, Fluorescence
18.
J Chem Inf Model ; 63(3): 835-845, 2023 02 13.
Article in English | MEDLINE | ID: mdl-36724090

ABSTRACT

Many bioactive peptides demonstrated therapeutic effects over complicated diseases, such as antiviral, antibacterial, anticancer, etc. It is possible to generate a large number of potentially bioactive peptides using deep learning in a manner analogous to the generation of de novo chemical compounds using the acquired bioactive peptides as a training set. Such generative techniques would be significant for drug development since peptides are much easier and cheaper to synthesize than compounds. Despite the limited availability of deep learning-based peptide-generating models, we have built an LSTM model (called LSTM_Pep) to generate de novo peptides and fine-tuned the model to generate de novo peptides with specific prospective therapeutic benefits. Remarkably, the Antimicrobial Peptide Database has been effectively utilized to generate various kinds of potential active de novo peptides. We proposed a pipeline for screening those generated peptides for a given target and used the main protease of SARS-COV-2 as a proof-of-concept. Moreover, we have developed a deep learning-based protein-peptide prediction model (DeepPep) for rapid screening of the generated peptides for the given targets. Together with the generating model, we have demonstrated that iteratively fine-tuning training, generating, and screening peptides for higher-predicted binding affinity peptides can be achieved. Our work sheds light on developing deep learning-based methods and pipelines to effectively generate and obtain bioactive peptides with a specific therapeutic effect and showcases how artificial intelligence can help discover de novo bioactive peptides that can bind to a particular target.


Subject(s)
COVID-19 , Deep Learning , Humans , Artificial Intelligence , Drug Design , SARS-CoV-2 , Peptides/pharmacology
19.
Brief Bioinform ; 24(1)2023 01 19.
Article in English | MEDLINE | ID: mdl-36502369

ABSTRACT

The recently reported machine learning- or deep learning-based scoring functions (SFs) have shown exciting performance in predicting protein-ligand binding affinities with fruitful application prospects. However, the differentiation between highly similar ligand conformations, including the native binding pose (the global energy minimum state), remains challenging that could greatly enhance the docking. In this work, we propose a fully differentiable, end-to-end framework for ligand pose optimization based on a hybrid SF called DeepRMSD+Vina combined with a multi-layer perceptron (DeepRMSD) and the traditional AutoDock Vina SF. The DeepRMSD+Vina, which combines (1) the root mean square deviation (RMSD) of the docking pose with respect to the native pose and (2) the AutoDock Vina score, is fully differentiable; thus is capable of optimizing the ligand binding pose to the energy-lowest conformation. Evaluated by the CASF-2016 docking power dataset, the DeepRMSD+Vina reaches a success rate of 94.4%, which outperforms most reported SFs to date. We evaluated the ligand conformation optimization framework in practical molecular docking scenarios (redocking and cross-docking tasks), revealing the high potentialities of this framework in drug design and discovery. Structural analysis shows that this framework has the ability to identify key physical interactions in protein-ligand binding, such as hydrogen-bonding. Our work provides a paradigm for optimizing ligand conformations based on deep learning algorithms. The DeepRMSD+Vina model and the optimization framework are available at GitHub repository https://github.com/zchwang/DeepRMSD-Vina_Optimization.


Subject(s)
Deep Learning , Ligands , Molecular Docking Simulation , Proteins/chemistry , Drug Design , Protein Binding
20.
IEEE/ACM Trans Comput Biol Bioinform ; 20(3): 2341-2348, 2023.
Article in English | MEDLINE | ID: mdl-36327193

ABSTRACT

The continuous growth of generated sequencing data leads to the development of a variety of associated bioinformatics tools. However, many of them are not able to fully exploit the resources of modern multi-core systems since they are bottlenecked by parsing files leading to slow execution times. This motivates the design of an efficient method for parsing sequencing data that can exploit the power of modern hardware, especially for modern CPUs with fast storage devices. We have developed RabbitFX, a fast, efficient, and easy-to-use framework for processing biological sequencing data on modern multi-core platforms. It can efficiently read FASTA and FASTQ files by combining a lightweight parsing method by means of an optimized formatting implementation. Furthermore, we provide user-friendly and modularized C++ APIs that can be easily integrated into applications in order to increase their file parsing speed. As proof-of-concept, we have integrated RabbitFX into three I/O-intensive applications: fastp, Ktrim, and Mash. Our evaluation shows that the inclusion of RabbitFX leads to speedups of at least 11.6 (6.6), 2.4 (2.4), and 3.7 (3.2) compared to the original versions on plain (gzip-compressed) files, respectively. These case studies demonstrate that RabbitFX can be easily integrated into a variety of NGS analysis tools to significantly reduce associated runtimes. It is open source software available at https://github.com/RabbitBio/RabbitFX.


Subject(s)
Computational Biology , Software , High-Throughput Nucleotide Sequencing
SELECTION OF CITATIONS
SEARCH DETAIL
...