Search | VHL Regional Portal

1.

Parallel Maximum Cardinality Matching for General Graphs on GPUs.

Schwing, Gregory; Grosu, Daniel; Schwiebert, Loren.

IEEE Int Symp Parallel Distrib Process Workshops Phd Forum ; 2024: 880-889, 2024 May.

Article in English | MEDLINE | ID: mdl-39119593

ABSTRACT

The matching problem formulated as Maximum Cardinality Matching in General Graphs (MCMGG) finds the largest matching on graphs without restrictions. The Micali-Vazirani algorithm has the best asymptotic complexity for solving MCMGG when the graphs are sparse. Parallelizing matching in general graphs on the GPU is difficult for multiple reasons. First, the augmenting path procedure is highly recursive, and NVIDIA GPUs use registers to store kernel arguments, which eventually spill into cached device memory, with a performance penalty. Second, extracting parallelism from the matching process requires partitioning the graph to avoid any overlapping augmenting paths. We propose an implementation of the Micali-Vazirani algorithm which identifies bridge edges using thread-parallel breadth-first search, followed by block-parallel path augmentation and blossom contraction. Augmenting path and Union-find methods were implemented as stack-based iterative methods, with a stack allocated in shared memory. Our experimentation shows that compared to the serial implementation, our approach results in up to 15-fold speed-up for very sparse regular graphs, up to 5-fold slowdown for denser regular graphs, and finally a 50-fold slowdown for power-law distributed Kronecker graphs. This implementation has been open-sourced for further research on developing combinatorial graph algorithms on GPUs.

2.

FDI: A MATLAB tool for computing the fractal dimension index of sources reconstructed from EEG data.

Ruiz de Miras, Juan; Casali, Adenauer G; Massimini, Marcello; Ibáñez-Molina, Antonio J; Soriano, María F; Iglesias-Parro, Sergio.

Comput Biol Med ; 179: 108871, 2024 Sep.

Article in English | MEDLINE | ID: mdl-39002315

ABSTRACT

BACKGROUND: The fractal dimension (FD) is a valuable tool for analysing the complexity of neural structures and functions in the human brain. To assess the spatiotemporal complexity of brain activations derived from electroencephalogram (EEG) signals, the fractal dimension index (FDI) was developed. This measure integrates two distinct complexity metrics: 1) integration FD, which calculates the FD of the spatiotemporal coordinates of all significantly active EEG sources (4DFD); and 2) differentiation FD, determined by the complexity of the temporal evolution of the spatial distribution of cortical activations (3DFD), estimated via the Higuchi FD [HFD(3DFD)]. The final FDI value is the product of these two measurements: 4DFD × HFD(3DFD). Although FDI has shown utility in various research on neurological and neurodegenerative disorders, existing literature lacks standardized implementation methods and accessible coding resources, limiting wider adoption within the field. METHODS: We introduce an open-source MATLAB software named FDI for measuring FDI values in EEG datasets. RESULTS: By using CUDA for leveraging the GPU massive parallelism to optimize performance, our software facilitates efficient processing of large-scale EEG data while ensuring compatibility with pre-processed data from widely used tools such as Brainstorm and EEGLab. Additionally, we illustrate the applicability of FDI by demonstrating its usage in two neuroimaging studies. Access to the MATLAB source code and a precompiled executable for Windows system is provided freely. CONCLUSIONS: With these resources, neuroscientists can readily apply FDI to investigate cortical activity complexity within their own studies.

Subject(s)

Electroencephalography , Fractals , Signal Processing, Computer-Assisted , Software , Humans , Electroencephalography/methods , Brain/physiology , Algorithms

3.

A Parallel Compression Pipeline for Improving GPU Virtualization Data Transfers.

Peñaranda, Cristian; Reaño, Carlos; Silla, Federico.

Sensors (Basel) ; 24(14)2024 Jul 17.

Article in English | MEDLINE | ID: mdl-39066047

ABSTRACT

GPUs are commonly used to accelerate the execution of applications in domains such as deep learning. Deep learning applications are applied to an increasing variety of scenarios, with edge computing being one of them. However, edge devices present severe computing power and energy limitations. In this context, the use of remote GPU virtualization solutions is an efficient way to address these concerns. Nevertheless, the limited network bandwidth might be an issue. This limitation can be alleviated by leveraging on-the-fly compression within the communication layer of remote GPU virtualization solutions. In this way, data exchanged with the remote GPU is transparently compressed before being transmitted, thus increasing network bandwidth in practice. In this paper, we present the implementation of a parallel compression pipeline designed to be used within remote GPU virtualization solutions. A thorough performance analysis shows that network bandwidth can be increased by a factor of up to 2×.

4.

GPU-accelerated parallel image reconstruction strategies for magnetic particle imaging.

Quelhas, Klaus N; Henn, Mark-Alexander; Farias, Ricardo; Tew, Weston L; Woods, Solomon I.

Phys Med Biol ; 69(13)2024 Jun 24.

Article in English | MEDLINE | ID: mdl-38843809

ABSTRACT

Objective. Image reconstruction is a fundamental step in magnetic particle imaging (MPI). One of the main challenges is the fact that the reconstructions are computationally intensive and time-consuming, so choosing an algorithm presents a compromise between accuracy and execution time, which depends on the application. This work proposes a method that provides both fast and accurate image reconstructions.Approach. Image reconstruction algorithms were implemented to be executed in parallel ingraphics processing units(GPUs) using the CUDA framework. The calculation of the model-based MPI calibration matrix was also implemented in GPU to allow both fast and flexible reconstructions.Main results. The parallel algorithms were able to accelerate the reconstructions by up to about6,100times in comparison to the serial Kaczmarz algorithm executed in the CPU, allowing for real-time applications. Reconstructions using the OpenMPIData dataset validated the proposed algorithms and demonstrated that they are able to provide both fast and accurate reconstructions. The calculation of the calibration matrix was accelerated by up to about 37 times.Significance. The parallel algorithms proposed in this work can provide single-frame MPI reconstructions in real time, with frame rates greater than 100 frames per second. The parallel calculation of the calibration matrix can be combined with the parallel reconstruction to deliver images in less time than the serial Kaczmarz reconstruction, potentially eliminating the need of storing the calibration matrix in the main memory, and providing the flexibility of redefining scanning and reconstruction parameters during execution.

Subject(s)

Image Processing, Computer-Assisted , Image Processing, Computer-Assisted/methods , Algorithms , Computer Graphics , Time Factors , Molecular Imaging/methods , Calibration

5.

3D Model of Carbon Diffusion during Diffusional Phase Transformations.

Lach, Lukasz; Svyetlichnyy, Dmytro.

Materials (Basel) ; 17(3)2024 Jan 30.

Article in English | MEDLINE | ID: mdl-38591517

ABSTRACT

The microstructure plays a crucial role in determining the properties of metallic materials, in terms of both their strength and functionality in various conditions. In the context of the formation of microstructure, phase transformations that occur in materials are highly significant. These are processes during which the structure of a material undergoes changes, most commonly as a result of variations in temperature, pressure, or chemical composition. The study of phase transformations is a broad and rapidly evolving research area that encompasses both experimental investigations and modeling studies. A foundational understanding of carbon diffusion and phase transformations in materials science is essential for comprehending the behavior of materials under different conditions. This understanding forms the basis for the development and optimization of materials with desired properties. The aim of this paper is to create a three-dimensional model for carbon diffusion in the context of modeling diffusional phase transformations occurring in carbon steels. The proposed model relies on the utilization of the LBM (Lattice Boltzmann Method) and CUDA architecture. The resultant carbon diffusion model is intricately linked with a microstructure evolution model grounded in FCA (Frontal Cellular Automata). This manuscript provides a concise overview of the LBM and the FCA method. It outlines the structure of the developed three-dimensional model for carbon diffusion, details its correlation with the microstructure evolution model, and presents the developed algorithm for simulating carbon diffusion. Demonstrative examples of simulation results, illustrating the growth of the emerging phase and affected by various model parameters within particular planes of the 3D calculation domain, are also presented.

6.

GPU-accelerated Bloch simulations and MR-STAT reconstructions using the Julia programming language.

van der Heide, Oscar; van den Berg, Cornelis A T; Sbrizzi, Alessandro.

Magn Reson Med ; 92(2): 618-630, 2024 Aug.

Article in English | MEDLINE | ID: mdl-38441315

ABSTRACT

PURPOSE: MR-STAT is a relatively new multiparametric quantitative MRI technique in which quantitative paramater maps are obtained by solving a large-scale nonlinear optimization problem. Managing reconstruction times is one of the main challenges of MR-STAT. In this work we leverage GPU hardware to reduce MR-STAT reconstruction times. A highly optimized, GPU-compatible Bloch simulation toolbox is developed as part of this work that can be utilized for other quantitative MRI techniques as well. METHODS: The Julia programming language was used to develop a flexible yet highly performant and GPU-compatible Bloch simulation toolbox called BlochSimulators.jl. The runtime performance of the toolbox is benchmarked against other Bloch simulation toolboxes. Furthermore, a (partially matrix-free) modification of a previously presented (matrix-free) MR-STAT reconstruction algorithm is proposed and implemented using the Julia language on GPU hardware. The proposed algorithm is combined with BlochSimulators.jl and the resulting MR-STAT reconstruction times on GPU hardware are compared to previously presented MR-STAT reconstruction times. RESULTS: The BlochSimulators.jl package demonstrates superior runtime performance on both CPU and GPU hardware when compared to other existing Bloch simulation toolboxes. The GPU-accelerated partially matrix-free MR-STAT reconstruction algorithm, which relies on BlochSimulators.jl, allows for reconstructions of 68 seconds per two-dimensional (2D slice). CONCLUSION: By combining the proposed Bloch simulation toolbox and the partially matrix-free reconstruction algorithm, 2D MR-STAT reconstructions can be performed in the order of one minute on a modern GPU card. The Bloch simulation toolbox can be utilized for other quantitative MRI techniques as well, for example for online dictionary generation for MR Fingerprinting.

Subject(s)

Algorithms , Computer Simulation , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Programming Languages , Magnetic Resonance Imaging/methods , Humans , Image Processing, Computer-Assisted/methods , Computer Graphics , Brain/diagnostic imaging , Phantoms, Imaging , Software , Image Interpretation, Computer-Assisted/methods , Reproducibility of Results

7.

Implementation and analysis of a parallel kalman filter algorithm for lidar localization based on CUDA technology.

Mochurad, Lesia.

Front Robot AI ; 11: 1341689, 2024.

Article in English | MEDLINE | ID: mdl-38371349

ABSTRACT

Introduction: Navigation satellite systems can fail to work or work incorrectly in a number of conditions: signal shadowing, electromagnetic interference, atmospheric conditions, and technical problems. All of these factors can significantly affect the localization accuracy of autonomous driving systems. This emphasizes the need for other localization technologies, such as Lidar. Methods: The use of the Kalman filter in combination with Lidar can be very effective in various applications due to the synergy of their capabilities. The Kalman filter can improve the accuracy of lidar measurements by taking into account the noise and inaccuracies present in the measurements. Results: In this paper, we propose a parallel Kalman algorithm in three-dimensional space to speed up the computational speed of Lidar localization. At the same time, the initial localization accuracy of the latter is preserved. A distinctive feature of the proposed approach is that the Kalman localization algorithm itself is parallelized, rather than the process of building a map for navigation. The proposed algorithm allows us to obtain the result 3.8 times faster without compromising the localization accuracy, which was 3% for both cases, making it effective for real-time decision-making. Discussion: The reliability of this result is confirmed by a preliminary theoretical estimate of the acceleration rate based on Ambdahl's law. Accelerating the Kalman filter with CUDA for Lidar localization can be of significant practical value, especially in real-time and in conditions where large amounts of data from Lidar sensors need to be processed.

8.

Application value of improved Cuda-based volume rendering algorithm in CT vascular imaging three-dimensional reconstruction / 实用放射学杂志

Liping FANG; Ying YUAN; Xiaofeng TAO.

Journal of Practical Radiology ; (12): 659-662, 2024.

Article in Chinese | WPRIM (Western Pacific) | ID: wpr-1020278

ABSTRACT

Objective To explore the application value of an improved volume rendering algorithm based on Cuda in CT vascular imaging three-dimensional reconstruction.Methods Five cases of head and neck vascular computed tomography angiography(CTA)examinations and five cases of coronary CTA examinations were selected.The traditional Bounding-Box algorithm and the Cuda-based volume rendering reconstruction method were used for vascular three-dimensional reconstruction.The reconstruction speed and quality of the two algorithms were compared.Results Running the traditional algorithm on an RTX2060 graphics card took 50-60 ms per frame,while running the algorithm described in this study took 25-35 ms per frame,resulting in approximately a 1x speed improvement.On an RTX3060,the algorithm described in this study took 18-23 ms per frame,resulting in approximately a 1x speed improvement.The reconstruction results from all ten cases demonstrated that the algorithm described in this study provided clearer visualization of small blood vessels in the head and neck region and the distal coronary arteries.Conclusion The Cuda-based development framework achieves faster rendering speed and better image quality compared to the traditional Bounding-Box algorithm.

9.

Compressed SVD-based L + S model to reconstruct undersampled dynamic MRI data using parallel architecture.

Shafique, Muhammad; Qazi, Sohaib Ayaz; Omer, Hammad.

MAGMA ; 2023 Nov 18.

Article in English | MEDLINE | ID: mdl-37978992

ABSTRACT

BACKGROUND: Magnetic Resonance Imaging (MRI) is a highly demanded medical imaging system due to high resolution, large volumetric coverage, and ability to capture the dynamic and functional information of body organs e.g. cardiac MRI is employed to assess cardiac structure and evaluate blood flow dynamics through the cardiac valves. Long scan time is the main drawback of MRI, which makes it difficult for the patients to remain still during the scanning process. OBJECTIVE: By collecting fewer measurements, MRI scan time can be shortened, but this undersampling causes aliasing artifacts in the reconstructed images. Advanced image reconstruction algorithms have been used in literature to overcome these undersampling artifacts. These algorithms are computationally expensive and require a long time for reconstruction which makes them infeasible for real-time clinical applications e.g. cardiac MRI. However, exploiting the inherent parallelism in these algorithms can help to reduce their computation time. METHODS: Low-rank plus sparse (L+S) matrix decomposition model is a technique used in literature to reconstruct the highly undersampled dynamic MRI (dMRI) data at the expense of long reconstruction time. In this paper, Compressed Singular Value Decomposition (cSVD) model is used in L+S decomposition model (instead of conventional SVD) to reduce the reconstruction time. The results provide improved quality of the reconstructed images. Furthermore, it has been observed that cSVD and other parts of the L+S model possess highly parallel operations; therefore, a customized GPU based parallel architecture of the modified L+S model has been presented to further reduce the reconstruction time. RESULTS: Four cardiac MRI datasets (three different cardiac perfusion acquired from different patients and one cardiac cine data), each with different acceleration factors of 2, 6 and 8 are used for experiments in this paper. Experimental results demonstrate that using the proposed parallel architecture for the reconstruction of cardiac perfusion data provides a speed-up factor up to 19.15× (with memory latency) and 70.55× (without memory latency) in comparison to the conventional CPU reconstruction with no compromise on image quality. CONCLUSION: The proposed method is well-suited for real-time clinical applications, offering a substantial reduction in reconstruction time.

10.

Reference-based genome compression using the longest matched substrings with parallelization consideration.

Lu, Zhiwen; Guo, Lu; Chen, Jianhua; Wang, Rongshu.

BMC Bioinformatics ; 24(1): 369, 2023 Sep 30.

Article in English | MEDLINE | ID: mdl-37777730

ABSTRACT

BACKGROUND: A large number of researchers have devoted to accelerating the speed of genome sequencing and reducing the cost of genome sequencing for decades, and they have made great strides in both areas, making it easier for researchers to study and analyze genome data. However, how to efficiently store and transmit the vast amount of genome data generated by high-throughput sequencing technologies has become a challenge for data compression researchers. Therefore, the research of genome data compression algorithms to facilitate the efficient representation of genome data has gradually attracted the attention of these researchers. Meanwhile, considering that the current computing devices have multiple cores, how to make full use of the advantages of the computing devices and improve the efficiency of parallel processing is also an important direction for designing genome compression algorithms. RESULTS: We proposed an algorithm (LMSRGC) based on reference genome sequences, which uses the suffix array (SA) and the longest common prefix (LCP) array to find the longest matched substrings (LMS) for the compression of genome data in FASTA format. The proposed algorithm utilizes the characteristics of SA and the LCP array to select all appropriate LMSs between the genome sequence to be compressed and the reference genome sequence and then utilizes LMSs to compress the target genome sequence. To speed up the operation of the algorithm, we use GPUs to parallelize the construction of SA, while using multiple threads to parallelize the creation of the LCP array and the filtering of LMSs. CONCLUSIONS: Experiment results demonstrate that our algorithm is competitive with the current state-of-the-art algorithms in compression ratio and compression time.

Subject(s)

Data Compression , Data Compression/methods , Sequence Analysis, DNA/methods , Algorithms , Genome , Software , High-Throughput Nucleotide Sequencing/methods

11.

Optimizing Voronoi-based quantifications for reaching interactive analysis of 3D localizations in the million range.

Levet, Florian.

Front Bioinform ; 3: 1249291, 2023.

Article in English | MEDLINE | ID: mdl-37600969

ABSTRACT

Over the last decade, single-molecule localization microscopy (SMLM) has revolutionized cell biology, making it possible to monitor molecular organization and dynamics with spatial resolution of a few nanometers. Despite being a relatively recent field, SMLM has witnessed the development of dozens of analysis methods for problems as diverse as segmentation, clustering, tracking or colocalization. Among those, Voronoi-based methods have achieved a prominent position for 2D analysis as robust and efficient implementations were available for generating 2D Voronoi diagrams. Unfortunately, this was not the case for 3D Voronoi diagrams, and existing methods were therefore extremely time-consuming. In this work, we present a new hybrid CPU-GPU algorithm for the rapid generation of 3D Voronoi diagrams. Voro3D allows creating Voronoi diagrams of datasets composed of millions of localizations in minutes, making any Voronoi-based analysis method such as SR-Tesseler accessible to life scientists wanting to quantify 3D datasets. In addition, we also improve ClusterVisu, a Voronoi-based clustering method using Monte-Carlo simulations, by demonstrating that those costly simulations can be correctly approximated by a customized gamma probability distribution function.

12.

3D Model of Heat Flow during Diffusional Phase Transformations.

Lach, Lukasz; Svyetlichnyy, Dmytro.

Materials (Basel) ; 16(13)2023 Jul 06.

Article in English | MEDLINE | ID: mdl-37445179

ABSTRACT

The structure of metallic materials has a significant impact on their properties. One of the most popular methods to form the properties of metal alloys is heat treatment, which uses thermally activated transformations that take place in metals to achieve the required mechanical or physicochemical properties. The phase transformation in steel results from the fact that one state becomes less durable than the other due to a change in conditions, for example, temperature. Phase transformations are an extensive field of research that is developing very dynamically both in the sphere of experimental and model research. The objective of this paper is the development of a 3D heat flow model to model heat transfer during diffusional phase transformations in carbon steels. This model considers the two main factors that influence the transformation: the temperature and the enthalpy of transformation. The proposed model is based on the lattice Boltzmann method (LBM) and uses CUDA parallel computations. The developed heat flow model is directly related to the microstructure evolution model, which is based on frontal cellular automata (FCA). This paper briefly presents information on the FCA, LBM, CUDA, and diffusional phase transformation in carbon steels. The structures of the 3D model of heat flow and their connection with the microstructure evolution model as well as the algorithm for simulation of heat transfer with consideration of the enthalpy of transformation are shown. Examples of simulation results of the growth of the new phase that are determined by the overheating/overcooling and different model parameters in the selected planes of the 3D calculation domain are also presented.

13.

SNS-Toolbox: An Open Source Tool for Designing Synthetic Nervous Systems and Interfacing Them with Cyber-Physical Systems.

Nourse, William R P; Jackson, Clayton; Szczecinski, Nicholas S; Quinn, Roger D.

Biomimetics (Basel) ; 8(2)2023 Jun 10.

Article in English | MEDLINE | ID: mdl-37366842

ABSTRACT

One developing approach for robotic control is the use of networks of dynamic neurons connected with conductance-based synapses, also known as Synthetic Nervous Systems (SNS). These networks are often developed using cyclic topologies and heterogeneous mixtures of spiking and non-spiking neurons, which is a difficult proposition for existing neural simulation software. Most solutions apply to either one of two extremes, the detailed multi-compartment neural models in small networks, and the large-scale networks of greatly simplified neural models. In this work, we present our open-source Python package SNS-Toolbox, which is capable of simulating hundreds to thousands of spiking and non-spiking neurons in real-time or faster on consumer-grade computer hardware. We describe the neural and synaptic models supported by SNS-Toolbox, and provide performance on multiple software and hardware backends, including GPUs and embedded computing platforms. We also showcase two examples using the software, one for controlling a simulated limb with muscles in the physics simulator Mujoco, and another for a mobile robot using ROS. We hope that the availability of this software will reduce the barrier to entry when designing SNS networks, and will increase the prevalence of SNS networks in the field of robotic control.

14.

Source code and simulation data for the prediction of the electrodeposition mechanism of nanostructured metallic coatings.

Rosano-Ortega, G; Bedolla-Hernández, M; Sánchez-Ruiz, F J; Bedolla-Hernández, J; Schabes-Retchkiman, P S; Vega-Lebrún, C A; Vargas-Viveros, E.

Data Brief ; 48: 109269, 2023 Jun.

Article in English | MEDLINE | ID: mdl-37383780

ABSTRACT

This data article presents a simulation model based on quantum mechanics and energy potentials for obtaining simulation data that allows, from the perspective of materials informatics, the prediction of the electrodeposition mechanism for forming nanostructured metallic coatings. The development of the research is divided into two parts i) the formulation (Quantum mechanical model and Corrected model for electron prediction; using a modified Schrödinger equation) and ii) the implementation of the theoretical prediction model (Discretization of the model). For the simulation process, the finite element method (FEM) was used considering the equation of electric potential and electroneutrality with and without the inclusion of quantum leap. We also provide the code to perform QM simulations in CUDA®, and COMSOL® software, the simulation parameters, and data for two metallic arrangements of chromium nanoparticles (CrNPs) electrodeposited on commercial steel substrate. (CrNPs-AISI 1020 steel and CrNPs-A618 steel). Data collection shows the direct relationship between applied potential (VDC), current (A), concentration (ppm), and time (s) for the homogeneous formation of the coating during the electrodeposition process, as estimated by the theoretical model developed. Their potential reuse data is done to establish the precision of the theoretical model in predicting the formation and growth of nanostructured surface coatings with metallic nanoparticles to give surface-mechanical properties.

15.

A multi-GPU and CUDA-aware MPI-based spectral element formulation for ultrasonic wave propagation in solid media.

Li, Feilong; Zou, Fangxin; Rao, Jing.

Ultrasonics ; 134: 107049, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37290255

ABSTRACT

In this paper, we introduce a new multi-GPU-based spectral element (SE) formulation for simulating ultrasonic wave propagation in solids. To maximize communication efficiency, we purposely developed, based on CUDA-aware MPI, two novel message exchange strategies which allow the common nodal forces of different subdomains to be shared between different GPUs in a direct manner, as opposed to via CPU hosts, during central difference-based time integration steps. The new multi-GPU and CUDA-aware MPI-based formulation is benchmarked against a multi-CPU core and classical MPI-based counterpart, demonstrating a remarkable acceleration in each and every stage of the computation of ultrasonic wave propagation, namely matrix assembly, time integration and message exchange. More importantly, both the computational efficiency and the degree-of-freedom limit of the new formulation are actually scalable with the number of GPUs used, potentially allowing larger structures to be computed and higher computational speeds to be realized. Finally, the new formulation was used to simulate the interaction between Lamb waves and randomly shaped thickness loss defects on plates, showing its potential to become an efficient, accurate and robust technique for addressing the propagation of ultrasonic waves in realistic engineering structures.

16.

A 3D Approach Using a Control Algorithm to Minimize the Effects on the Healthy Tissue in the Hyperthermia for Cancer Treatment.

Fatigate, Gustavo Resende; Lobosco, Marcelo; Reis, Ruy Freitas.

Entropy (Basel) ; 25(4)2023 Apr 19.

Article in English | MEDLINE | ID: mdl-37190473

ABSTRACT

According to the World Health Organization, cancer is a worldwide health problem. Its high mortality rate motivates scientists to study new treatments. One of these new treatments is hyperthermia using magnetic nanoparticles. This treatment consists in submitting the target region with a low-frequency magnetic field to increase its temperature over 43 °C, as the threshold for tissue damage and leading the cells to necrosis. This paper uses an in silico three-dimensional Pennes' model described by a set of partial differential equations (PDEs) to estimate the percentage of tissue damage due to hyperthermia. Differential evolution, an optimization method, suggests the best locations to inject the nanoparticles to maximize tumor cell death and minimize damage to healthy tissue. Three different scenarios were performed to evaluate the suggestions obtained by the optimization method. The results indicate the positive impact of the proposed technique: a reduction in the percentage of healthy tissue damage and the complete damage of the tumors were observed. In the best scenario, the optimization method was responsible for decreasing the healthy tissue damage by 59% when the nanoparticles injection sites were located in the non-intuitive points indicated by the optimization method. The numerical solution of the PDEs is computationally expensive. This work also describes the implemented parallel strategy based on CUDA to reduce the computational costs involved in the PDEs resolution. Compared to the sequential version executed on the CPU, the proposed parallel implementation was able to speed the execution time up to 84.4 times.

17.

Programming Quantum Neural Networks on NISQ Systems: An Overview of Technologies and Methodologies.

Markidis, Stefano.

Entropy (Basel) ; 25(4)2023 Apr 20.

Article in English | MEDLINE | ID: mdl-37190482

ABSTRACT

Noisy Intermediate-Scale Quantum (NISQ) systems and associated programming interfaces make it possible to explore and investigate the design and development of quantum computing techniques for Machine Learning (ML) applications. Among the most recent quantum ML approaches, Quantum Neural Networks (QNN) emerged as an important tool for data analysis. With the QNN advent, higher-level programming interfaces for QNN have been developed. In this paper, we survey the current state-of-the-art high-level programming approaches for QNN development. We discuss target architectures, critical QNN algorithmic components, such as the hybrid workflow of Quantum Annealers and Parametrized Quantum Circuits, QNN architectures, optimizers, gradient calculations, and applications. Finally, we overview the existing programming QNN frameworks, their software architecture, and associated quantum simulators.

18.

HXPY: A High-Performance Data Processing Package for Financial Time-Series Data.

Guo, Jiadong; Peng, Jingshu; Yuan, Hang; Ni, Lionel Ming-Shuan.

J Comput Sci Technol ; 38(1): 3-24, 2023.

Article in English | MEDLINE | ID: mdl-37016601

ABSTRACT

A tremendous amount of data has been generated by global financial markets everyday, and such time-series data needs to be analyzed in real time to explore its potential value. In recent years, we have witnessed the successful adoption of machine learning models on financial data, where the importance of accuracy and timeliness demands highly effective computing frameworks. However, traditional financial time-series data processing frameworks have shown performance degradation and adaptation issues, such as the outlier handling with stock suspension in Pandas and TA-Lib. In this paper, we propose HXPY, a high-performance data processing package with a C++/Python interface for financial time-series data. HXPY supports miscellaneous acceleration techniques such as the streaming algorithm, the vectorization instruction set, and memory optimization, together with various functions such as time window functions, group operations, down-sampling operations, cross-section operations, row-wise or column-wise operations, shape transformations, and alignment functions. The results of benchmark and incremental analysis demonstrate the superior performance of HXPY compared with its counterparts. From MiBs to GiBs data, HXPY significantly outperforms other in-memory dataframe computing rivals even up to hundreds of times. Supplementary Information: The online version contains supplementary material available at 10.1007/s11390-023-2879-5.

19.

Performance evaluation of spatial fuzzy C-means clustering algorithm on GPU for image segmentation.

Ali, Noureddine Ait; El Abbassi, Ahmed; Bouattane, Omar.

Multimed Tools Appl ; 82(5): 6787-6805, 2023.

Article in English | MEDLINE | ID: mdl-35968411

ABSTRACT

Image processing by segmentation technique is an important phase in medical imaging such as MRI. Its objective is to analyze the different tissues in human body. In research area, Fuzzy set is one of the most successful techniques that guarantees a robust classification. Spatial FCM (SFCM); one of the fuzzy c-means variants; considers spatial information to deal with the noisy images. To reduce this iterative algorithm's execution time, a hard SIMD architecture has been planted named the Graphical Processing Unit (GPU). In this work, a great contribution has been done to diagnose, confront and implement three different parallel implementations on GPU. A parallel implementations' extensive study of SFCM entitled PSFCM using 3 × 3 window is presented, and the experiments illustrate a significant decrease in terms of running time of this algorithm known by its high complexity. The experimental results indicate that the parallel version's execution time is about 9.46 times faster than the sequential implementation on image segmentation. This gain in terms of speed-up is achieved on the Nvidia GeForce GT 740 m GPU.

20.

A fast GPU-accelerated Monte Carlo engine for calculation of MLC-collimated electron fields.

Brost, Eric E; Wan Chan Tseung, H; Antolak, John A.

Med Phys ; 50(1): 600-618, 2023 Jan.

Article in English | MEDLINE | ID: mdl-35986907

ABSTRACT

BACKGROUND: Although intensity-modulated radiation therapy and volumetric arc therapy have revolutionized photon external beam therapies, the technological advances associated with electron beam therapy have fallen behind. Modern linear accelerators contain technologies that would allow for more advanced forms of electron treatments, such as beam collimation, using the conventional photon multi-leaf collimator (MLC); however, no commercial solutions exist that calculate dose from such beam delivery modes. Additionally, for clinical adoption to occur, dose calculation times would need to be on par with that of modern dose calculation algorithms. PURPOSE: This work developed a graphics processing unit (GPU)-accelerated Monte Carlo (MC) engine incorporating the Varian TrueBeam linac head geometry for a rapid calculation of electron beams collimated using the conventional photon MLC. METHODS: A compute unified device architecture framework was created for the following: (1) transport of electrons and photons through the linac head geometry, considering multiple scattering, Bremsstrahlung, Møller, Compton, and pair production interactions; (2) electron and photon propagation through the CT geometry, considering all interactions plus the photoelectric effect; and (3) secondary particle cascades through the linac head and within the CT geometry. The linac head collimating geometry was modeled according to the specifications provided by the vendor, who also provided phase-space files. The MC was benchmarked against EGSnrc/DOSXYZnrc/GEANT by simulating individual interactions with simple geometries, pencil, and square beam dose calculations in various phantoms. MC-calculated dose distributions for MLC and jaw-collimated electron fields were compared to measurements in a water phantom and with radiochromic film. RESULTS: Pencil and square beam dose distributions are in good agreement with DOSXYZnrc. Angular and spatial distributions for multiple scattering and secondary particle production in thin slab geometries are in good agreement with EGSnrc and GEANT. Dose profiles for MLC and jaw-collimated 6-20-MeV electron beams showed an average absolute difference of 1.1 and 1.9 mm for the FWHM and 80%-20% penumbra from measured profiles. Percent depth doses showed differences of <5% for as compared to measurement. The computation time on an NVIDIA Tesla V100 card was 2.5 min to achieve a dose uncertainty of <1%, which is â¼300 times faster than published results in a similar geometry using a single-CPU core. CONCLUSIONS: The GPU-based MC can quickly calculate dose for electron fields collimated using the conventional photon MLC. The fast calculation times will allow for a rapid calculation of electron fields for mixed photon and electron particle therapy.

Subject(s)

Electrons , Radiotherapy, Intensity-Modulated , Algorithms , Radiotherapy Dosage , Radiotherapy, Intensity-Modulated/methods , Radiotherapy Planning, Computer-Assisted/methods , Phantoms, Imaging , Particle Accelerators , Monte Carlo Method , Photons

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL