Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Sensors (Basel) ; 24(1)2024 Jan 03.
Artigo em Inglês | MEDLINE | ID: mdl-38203143

RESUMO

New artificial intelligence scenarios, such as high-precision online industrial detection, unmanned driving, etc., are constantly emerging and have resulted in an increasing demand for real-time image processing with high frame rates and low power consumption. Histogram equalization (HE) is a very effective and commonly used image preprocessing algorithm designed to improve the quality of image processing results. However, most existing HE acceleration methods, whether run on general-purpose CPUs or dedicated embedded systems, require further improvement in their frame rate to meet the needs of more complex scenarios. In this paper, we propose an HE acceleration method for FPGAs based on a two-dimensional configurable pipeline architecture. We first optimize the parallelizability of HE with a fully configurable two-dimensional pipeline architecture according to the principle of adapting the algorithm to the hardware, where one dimension can compute the cumulative histogram in parallel and the other dimension can process multiple inputs simultaneously. This optimization also helps in the construction of a simple architecture that achieves a higher frequency when implementing HE on FPGAs, which consist of configurable input units, calculation units, and output units. Finally, we optimize the pipeline and critical path of the calculation units. In the experiments, we deploy the optimized HE on a VCU118 test board and achieve a maximum frequency of 891 MHz (which is up to 22.6 times more acceleration than CPU implementations), as well as a frame rate of 1899 frames per second for 1080p images.

2.
J Synchrotron Radiat ; 30(Pt 1): 227-234, 2023 Jan 01.
Artigo em Inglês | MEDLINE | ID: mdl-36601941

RESUMO

The JUNGFRAU 4-megapixel (4M) charge-integrating pixel-array detector, when operated at a full 2 kHz frame rate, streams data at a rate of 17 GB s-1. To operate this detector for macromolecular crystallography beamlines, a data-acquisition system called Jungfraujoch was developed. The system, running on a single server with field-programmable gate arrays and general-purpose graphics processing units, is capable of handling data produced by the JUNGFRAU 4M detector, including conversion of raw pixel readout to photon counts, compression and on-the-fly spot finding. It was also demonstrated that 30 GB s-1 can be handled in performance tests, indicating that the operation of even larger and faster detectors will be achievable in the future. The source code is available from a public repository.


Assuntos
Software , Síncrotrons , Raios X , Radiografia , Cristalografia por Raios X
3.
Sensors (Basel) ; 23(14)2023 Jul 18.
Artigo em Inglês | MEDLINE | ID: mdl-37514790

RESUMO

Convolutional neural networks (CNNs) have been extensively employed in remote sensing image detection and have exhibited impressive performance over the past few years. However, the abovementioned networks are generally limited by their complex structures, which make them difficult to deploy with power-sensitive and resource-constrained remote sensing edge devices. To tackle this problem, this study proposes a lightweight remote sensing detection network suitable for edge devices and an energy-efficient CNN accelerator based on field-programmable gate arrays (FPGAs). First, a series of network weight reduction and optimization methods are proposed to reduce the size of the network and the difficulty of hardware deployment. Second, a high-energy-efficiency CNN accelerator is developed. The accelerator employs a reconfigurable and efficient convolutional processing engine to perform CNN computations, and hardware optimization was performed for the proposed network structure. The experimental results obtained with the Xilinx ZYNQ Z7020 show that the network achieved higher accuracy with a smaller size, and the CNN accelerator for the proposed network exhibited a throughput of 29.53 GOPS and power consumption of only 2.98 W while consuming only 113 DSPs. In comparison with relevant work, DSP efficiency at an identical level of energy consumption was increased by 1.1-2.5 times, confirming the superiority of the proposed solution and its potential for deployment with remote sensing edge devices.

4.
Sensors (Basel) ; 22(5)2022 Feb 24.
Artigo em Inglês | MEDLINE | ID: mdl-35270919

RESUMO

In the near future, LHC experiments will continue future upgrades by overcoming the technological obsolescence of the detectors and the readout capabilities. Therefore, after the conclusion of a data collection period, CERN will have to face a long shutdown to improve overall performance, by updating the experiments, and implementing more advanced technologies and infrastructures. In particular, the largest LHC experiment, i.e., ATLAS, will upgrade parts of the detector, the trigger, and the data acquisition system. In addition, the ATLAS experiment will complete the implementation of new strategies, algorithms for data handling, and transmission to the final storage apparatus. This paper presents an overview of an upgrade planned for the second half of this decade for the ATLAS experiment. In particular, we show a study of a novel pattern recognition algorithm used in the trigger system, which is a device designed to provide the information needed to select physical events from unnecessary background data. The idea is to use a well known mathematical transform, the Hough transform, as the algorithm for the detection of particle trajectories. The effectiveness of the algorithm has already been validated in the past, regardless of particle physics applications, to recognize generic shapes within images. On the contrary, here, we first propose a software emulation tool, and a subsequent hardware implementation of the Hough transform, for particle physics applications. Until now, the Hough transform has never been implemented on electronics in particle physics experiments, and since a hardware implementation would provide benefits in terms of overall Latency, we complete the studies by comparing the simulated data with a physical system implemented on a Xilinx hardware accelerator (FELIX-II card). In more detail, we have implemented a low-abstraction RTL design of the Hough transform on Xilinx UltraScale+ FPGAs as target devices for filtering applications.

5.
Sensors (Basel) ; 22(12)2022 Jun 07.
Artigo em Inglês | MEDLINE | ID: mdl-35746100

RESUMO

Convolution Neural Networks (CNNs) are gaining ground in deep learning and Artificial Intelligence (AI) domains, and they can benefit from rapid prototyping in order to produce efficient and low-power hardware designs. The inference process of a Deep Neural Network (DNN) is considered a computationally intensive process that requires hardware accelerators to operate in real-world scenarios due to the low latency requirements of real-time applications. As a result, High-Level Synthesis (HLS) tools are gaining popularity since they provide attractive ways to reduce design time complexity directly in register transfer level (RTL). In this paper, we implement a MobileNetV2 model using a state-of-the-art HLS tool in order to conduct a design space exploration and to provide insights on complex hardware designs which are tailored for DNN inference. Our goal is to combine design methodologies with sparsification techniques to produce hardware accelerators that achieve comparable error metrics within the same order of magnitude with the corresponding state-of-the-art systems while also significantly reducing the inference latency and resource utilization. Toward this end, we apply sparse matrix techniques on a MobileNetV2 model for efficient data representation, and we evaluate our designs in two different weight pruning approaches. Experimental results are evaluated with respect to the CIFAR-10 data set using several different design methodologies in order to fully explore their effects on the performance of the model under examination.


Assuntos
Inteligência Artificial , Voo Espacial , Computadores , Redes Neurais de Computação
6.
Entropy (Basel) ; 24(9)2022 Aug 23.
Artigo em Inglês | MEDLINE | ID: mdl-36141063

RESUMO

Entropy is one of the most fundamental notions for understanding complexity. Among all the methods to calculate the entropy, sample entropy (SampEn) is a practical and common method to estimate time-series complexity. Unfortunately, SampEn is a time-consuming method growing in quadratic times with the number of elements, which makes this method unviable when processing large data series. In this work, we evaluate hardware SampEn architectures to offload computation weight, using improved SampEn algorithms and exploiting reconfigurable technologies, such as field-programmable gate arrays (FPGAs), a reconfigurable technology well-known for its high performance and power efficiency. In addition to the fundamental disclosed straightforward SampEn (SF) calculation method, this study evaluates optimized strategies, such as bucket-assist (BA) SampEn and lightweight SampEn based on BubbleSort (BS-LW) and MergeSort (MS-LW) on an embedded CPU, a high-performance CPU and on an FPGA using simulated data and real-world electrocardiograms (ECG) as input data. Irregular storage space and memory access of enhanced algorithms is also studied and estimated in this work. These fast SampEn algorithms are evaluated and profiled using metrics such as execution time, resource use, power and energy consumption based on input data length. Finally, although the implementation of fast SampEn is not significantly faster than versions running on a high-performance CPU, FPGA implementations consume one or two orders of magnitude less energy than a high-performance CPU.

7.
Sensors (Basel) ; 19(14)2019 Jul 11.
Artigo em Inglês | MEDLINE | ID: mdl-31373307

RESUMO

Connected Component Analysis (CCA) plays an important role in several image analysis and pattern recognition algorithms. Being one of the most time-consuming tasks in such applications, specific hardware accelerator for the CCA are highly desirable. As its main characteristic, the design of such an accelerator must be able to complete a run-time process of the input image frame without suspending the input streaming data-flow, by using a reasonable amount of hardware resources. This paper presents a new approach that allows virtually any feature of interest to be extracted in a single-pass from the input image frames. The proposed method has been validated by a proper system hardware implemented in a complete heterogeneous design, within a Xilinx Zynq-7000 Field Programmable Gate Array (FPGA) System on Chip (SoC) device. For processing 640 × 480 input image resolution, only 760 LUTs and 787 FFs were required. Moreover, a frame-rate of ~325 fps and a throughput of 95.37 Mp/s were achieved. When compared to several recent competitors, the proposed design exhibits the most favorable performance-resources trade-off.

8.
Sensors (Basel) ; 18(6)2018 Jun 08.
Artigo em Inglês | MEDLINE | ID: mdl-29890644

RESUMO

Cyber-Physical Systems are experiencing a paradigm shift in which processing has been relocated to the distributed sensing layer and is no longer performed in a centralized manner. This approach, usually referred to as Edge Computing, demands the use of hardware platforms that are able to manage the steadily increasing requirements in computing performance, while keeping energy efficiency and the adaptability imposed by the interaction with the physical world. In this context, SRAM-based FPGAs and their inherent run-time reconfigurability, when coupled with smart power management strategies, are a suitable solution. However, they usually fail in user accessibility and ease of development. In this paper, an integrated framework to develop FPGA-based high-performance embedded systems for Edge Computing in Cyber-Physical Systems is presented. This framework provides a hardware-based processing architecture, an automated toolchain, and a runtime to transparently generate and manage reconfigurable systems from high-level system descriptions without additional user intervention. Moreover, it provides users with support for dynamically adapting the available computing resources to switch the working point of the architecture in a solution space defined by computing performance, energy consumption and fault tolerance. Results show that it is indeed possible to explore this solution space at run time and prove that the proposed framework is a competitive alternative to software-based edge computing platforms, being able to provide not only faster solutions, but also higher energy efficiency for computing-intensive algorithms with significant levels of data-level parallelism.

9.
Sensors (Basel) ; 17(7)2017 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-28671632

RESUMO

A current trend in the development of assistive devices for rehabilitation, for example exoskeletons or active orthoses, is to utilize physiological data to enhance their functionality and usability, for example by predicting the patient's upcoming movements using electroencephalography (EEG) or electromyography (EMG). However, these modalities have different temporal properties and classification accuracies, which results in specific advantages and disadvantages. To use physiological data analysis in rehabilitation devices, the processing should be performed in real-time, guarantee close to natural movement onset support, provide high mobility, and should be performed by miniaturized systems that can be embedded into the rehabilitation device. We present a novel Field Programmable Gate Array (FPGA) -based system for real-time movement prediction using physiological data. Its parallel processing capabilities allows the combination of movement predictions based on EEG and EMG and additionally a P300 detection, which is likely evoked by instructions of the therapist. The system is evaluated in an offline and an online study with twelve healthy subjects in total. We show that it provides a high computational performance and significantly lower power consumption in comparison to a standard PC. Furthermore, despite the usage of fixed-point computations, the proposed system achieves a classification accuracy similar to systems with double precision floating-point precision.


Assuntos
Movimento , Interfaces Cérebro-Computador , Eletroencefalografia , Eletromiografia , Humanos , Aparelhos Ortopédicos , Tecnologia Assistiva
10.
J Med Syst ; 41(2): 35, 2017 Feb.
Artigo em Inglês | MEDLINE | ID: mdl-28063016

RESUMO

The goal of this paper is to implement the secretion mechanism of the Thyroid Hormone (TH) based on bio-mathematical differential eqs. (DE) on an FPGA chip. Hardware Descriptive Language (HDL) is used to develop a behavioral model of the mechanism derived from the DE. The Thyroid Hormone secretion mechanism is simulated with the interaction of the related stimulating and inhibiting hormones. Synthesis of the simulation is done with the aid of CAD tools and downloaded on a Field Programmable Gate Arrays (FPGAs) Chip. The chip output shows identical behavior to that of the designed algorithm through simulation. It is concluded that the chip mimics the Thyroid Hormone secretion mechanism. The chip, operating in real-time, is computer-independent stand-alone system.


Assuntos
Simulação por Computador , Modelos Biológicos , Hormônios Tireóideos/metabolismo , Algoritmos , Humanos , Tiroxina/metabolismo , Tri-Iodotironina/metabolismo , Interface Usuário-Computador
11.
Sensors (Basel) ; 16(2): 181, 2016 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-26840321

RESUMO

Resistive sensor arrays are formed by a large number of individual sensors which are distributed in different ways. This paper proposes a direct connection between an FPGA and a resistive array distributed in M rows and N columns, without the need of analog-to-digital converters to obtain resistance values in the sensor and where the conditioning circuit is reduced to the use of a capacitor in each of the columns of the matrix. The circuit allows parallel measurements of the N resistors which form each of the rows of the array, eliminating the resistive crosstalk which is typical of these circuits. This is achieved by an addressing technique which does not require external elements to the FPGA. Although the typical resistive crosstalk between resistors which are measured simultaneously is eliminated, other elements that have an impact on the measurement of discharge times appear in the proposed architecture and, therefore, affect the uncertainty in resistance value measurements; these elements need to be studied. Finally, the performance of different calibration techniques is assessed experimentally on a discrete resistor array, obtaining for a new model of calibration, a maximum relative error of 0.066% in a range of resistor values which correspond to a tactile sensor.

12.
Sensors (Basel) ; 16(2): 149, 2016 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-26821024

RESUMO

One of the most suitable ways of distributing a resistive sensor array for reading is an array with M rows and N columns. This allows reduced wiring and a certain degree of parallelism in the implementation, although it also introduces crosstalk effects. Several types of circuits can carry out the analogue-digital conversion of this type of sensors. This article focuses on the use of operational amplifiers with capacitive feedback and FPGAs for this task. Specifically, modifications of a previously reported circuit are proposed to reduce the errors due to the non-idealities of the amplifiers and the I/O drivers of the FPGA. Moreover, calibration algorithms are derived from the analysis of the proposed circuitry to reduce the crosstalk error and improve the accuracy. Finally, the performances of the proposals is evaluated experimentally on an array of resistors and for different ranges.

13.
Sensors (Basel) ; 15(12): 31762-80, 2015 Dec 16.
Artigo em Inglês | MEDLINE | ID: mdl-26694403

RESUMO

Direct sensor-digital device interfaces measure time dependent variables of simple circuits to implement analog-to-digital conversion. Field Programmable Gate Arrays (FPGAs) are devices whose hardware can be reconfigured to work in parallel. They usually do not have analog-to-digital converters, but have many general purpose I/O pins. Therefore, direct sensor-FPGA connection is a good choice in complex systems with many sensors because several capture modules can be implemented to perform parallel analog data acquisition. The possibility to work in parallel and with high frequency clock signals improves the bandwidth compared to sequential devices such as conventional microcontrollers. The price to pay is usually the resolution of measurements. This paper proposes capture modules implemented in an FPGA which are able to perform smart acquisition that filter noise and achieve high precision. A calibration technique is also proposed to improve accuracy. Resolutions of 12 effective number of bits are obtained for the reading of resistors in the range of an example piezoresistive tactile sensor.


Assuntos
Computadores , Eletrônica/instrumentação , Modelos Teóricos , Desenho de Equipamento
14.
PeerJ Comput Sci ; 9: e1300, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37346607

RESUMO

One of the fundamental requirements of a real-time system (RTS) is the need to guarantee re-al-time determinism for critical tasks. Task execution rates, operating system (OS) overhead, and task context switching times are just a few of the parameters that can cause jitter and missed deadlines in RTS with soft schedulers. Control systems that are susceptible to jitter can be used in the control of HARD RTS as long as the cumulative value of periodicity deviation and worst-case response time is less than the response time required by that application. This artcle presents field-programmable gate array (FPGA) soft-core processors integration based on different instruction set architectures (ISA), custom central processing unit (CPU) datapath, dedicated hardware thread context, and hardware real-time operating system (RTOS) implementations. Based on existing work problems, one parameter that can negatively influence the performance of an RTS is the additional costs due to the operating system. The scheduling and thread context switching operations can significantly degrade the programming limit for RTS, where the task switching frequency is high. In parallel with the improvement of software scheduling algorithms, their implementation in hardware has been proposed and validated to relieve the processor of scheduling overhead and reduce RTOS-specific overhead.

15.
Sensors (Basel) ; 12(1): 585-611, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-22368487

RESUMO

Background subtraction is considered the first processing stage in video surveillance systems, and consists of determining objects in movement in a scene captured by a static camera. It is an intensive task with a high computational cost. This work proposes an embedded novel architecture on FPGA which is able to extract the background on resource-limited environments and offers low degradation (produced because of the hardware-friendly model modification). In addition, the original model is extended in order to detect shadows and improve the quality of the segmentation of the moving objects. We have analyzed the resource consumption and performance in Spartan3 Xilinx FPGAs and compared to others works available on the literature, showing that the current architecture is a good trade-off in terms of accuracy, performance and resources utilization. With less than a 65% of the resources utilization of a XC3SD3400 Spartan-3A low-cost family FPGA, the system achieves a frequency of 66.5 MHz reaching 32.8 fps with resolution 1,024 × 1,024 pixels, and an estimated power consumption of 5.76 W.


Assuntos
Eletrônica/instrumentação , Modelos Teóricos , Técnica de Subtração/instrumentação , Algoritmos , Simulação por Computador , Computadores , Software , Fatores de Tempo
16.
Front Big Data ; 5: 828666, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35402906

RESUMO

The determination of charged particle trajectories in collisions at the CERN Large Hadron Collider (LHC) is an important but challenging problem, especially in the high interaction density conditions expected during the future high-luminosity phase of the LHC (HL-LHC). Graph neural networks (GNNs) are a type of geometric deep learning algorithm that has successfully been applied to this task by embedding tracker data as a graph-nodes represent hits, while edges represent possible track segments-and classifying the edges as true or fake track segments. However, their study in hardware- or software-based trigger applications has been limited due to their large computational cost. In this paper, we introduce an automated translation workflow, integrated into a broader tool called hls4ml, for converting GNNs into firmware for field-programmable gate arrays (FPGAs). We use this translation tool to implement GNNs for charged particle tracking, trained using the TrackML challenge dataset, on FPGAs with designs targeting different graph sizes, task complexites, and latency/throughput requirements. This work could enable the inclusion of charged particle tracking GNNs at the trigger level for HL-LHC experiments.

17.
Neural Netw ; 123: 372-380, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31901566

RESUMO

Modeling and implementation of biological neurons are key to the fundamental understanding of neural network architectures in the brain and its cognitive behavior. Synchronization of neuronal models play a significant role in neural signal processing as it is very difficult to identify the actual interaction between neurons in living brain. Therefore, the synchronization study of these neuronal architectures has received extensive attention from researchers. Higher biological accuracy of these neuronal units demands more computational overhead and requires more hardware resources for implementation. This paper presents a two coupled hardware implementation of Hindmarsh Rose neuron model which is mathematically simpler model and yet mimics several behaviors of a real biological neuron. These neurons are synchronized using an exponential function. The coupled system shows several behaviors depending upon the parameters of HR model and coupling function. An approximation of coupling function is also provided to reduce the hardware cost. Both simulations and a low cost hardware implementations of exponential synaptic coupling function and its approximation are carried out for comparison. Hardware implementation on field programmable gate array (FPGA) of approximated coupling function shows that the coupled network produces different dynamical behaviors with acceptable error. Hardware implementation shows that the approximated coupling function has significantly lower implementation cost. A spiking neural network based on HR neuron is also shown as a practical application of this coupled HR neural networks. The spiking network successfully encodes and decodes a time varying input.


Assuntos
Modelos Neurológicos , Redes Neurais de Computação , Encéfalo/fisiologia , Análise por Conglomerados , Humanos , Neurônios/fisiologia , Processamento de Sinais Assistido por Computador
18.
Front Neurosci ; 12: 198, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29692700

RESUMO

This paper presents a digital implementation of the Cascade of Asymmetric Resonators with Fast-Acting Compression (CAR-FAC) cochlear model. The CAR part simulates the basilar membrane's (BM) response to sound. The FAC part models the outer hair cell (OHC), the inner hair cell (IHC), and the medial olivocochlear efferent system functions. The FAC feeds back to the CAR by moving the poles and zeros of the CAR resonators automatically. We have implemented a 70-section, 44.1 kHz sampling rate CAR-FAC system on an Altera Cyclone V Field Programmable Gate Array (FPGA) with 18% ALM utilization by using time-multiplexing and pipeline parallelizing techniques and present measurement results here. The fully digital reconfigurable CAR-FAC system is stable, scalable, easy to use, and provides an excellent input stage to more complex machine hearing tasks such as sound localization, sound segregation, speech recognition, and so on.

19.
J Signal Process Syst ; 87(1): 139-156, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-32226579

RESUMO

With security and surveillance, there is an increasing need to process image data efficiently and effectively either at source or in a large data network. Whilst a Field-Programmable Gate Array has been seen as a key technology for enabling this, the design process has been viewed as problematic in terms of the time and effort needed for implementation and verification. The work here proposes a different approach of using optimized FPGA-based soft-core processors which allows the user to exploit the task and data level parallelism to achieve the quality of dedicated FPGA implementations whilst reducing design time. The paper also reports some preliminary progress on the design flow to program the structure. An implementation for a Histogram of Gradients algorithm is also reported which shows that a performance of 328 fps can be achieved with this design approach, whilst avoiding the long design time, verification and debugging steps associated with conventional FPGA implementations.

SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa