RESUMO
Neuronal reconstruction-a process that transforms image volumes into 3D geometries and skeletons of cells- bottlenecks the study of brain function, connectomics and pathology. Domain scientists need exact and complete segmentations to study subtle topological differences. Existing methods are diskbound, dense-access, coupled, single-threaded, algorithmically unscalable and require manual cropping of small windows and proofreading of skeletons due to low topological accuracy. Designing a data-intensive parallel solution suited to a neurons' shape, topology and far-ranging connectivity is particularly challenging due to I/O and load-balance, yet by abstracting these vision tasks into strategically ordered specializations of search, we progressively lower memory by 4 orders of magnitude. This enables 1 mouse brain to be fully processed in-memory on a single server, at 67× the scale with 870× less memory while having 78% higher automated yield than APP2, the previous state of the art in performant reconstruction.
RESUMO
Morphology is a cardinal feature of a neuron that mediates its functions, but profiling neuronal morphologies at scale remains a formidable challenge. Here we describe a generalizable pipeline for large-scale brainwide study of dendritic morphology of genetically-defined single neurons in the mouse brain. We generated a dataset of 3,762 3D-reconstructed and reference-atlas mapped striatal D1- and D2- medium spiny neurons (MSNs). Integrative morphometric analyses reveal distinct impacts of anatomical locations and D1/D2 genetic types on MSN morphologies. To analyze striatal regional features of MSN dendrites without prior anatomical constraints, we assigned MSNs to a lattice of cubic boxes in the reference brain atlas, and summarized morphometric representation ("eigen-morph") for each box and clustered boxes with shared morphometry. This analysis reveals 6 modules with characteristic dendritic features and spanning contiguous striatal territories, each receiving distinct corticostriatal inputs. Finally, we found aging confers robust dendritic length and branching defects in MSNs, while Huntington's disease (HD) mice exhibit selective length-related defects. Together, our study demonstrates a systems-biology approach to profile dendritic morphology of genetically-defined single-neurons; and defines novel striatal D1/D2-MSN morphological territories and aging- or HD-associated pathologies.
RESUMO
Epifluorescence miniature microscopes ('miniscopes') are widely used for in vivo calcium imaging of neural population activity. Imaging data are typically collected during a behavioral task and stored for later offline analysis, but emerging techniques for online imaging can support novel closed-loop experiments in which neural population activity is decoded in real time to trigger neurostimulation or sensory feedback. To achieve short feedback latencies, online imaging systems must be optimally designed to maximize computational speed and efficiency while minimizing errors in population decoding. Here we introduce DeCalciOn, an open-source device for real-time imaging and population decoding of in vivo calcium signals that is hardware compatible with all miniscopes that use the UCLA Data Acquisition (DAQ) interface. DeCalciOn performs online motion stabilization, neural enhancement, calcium trace extraction, and decoding of up to 1024 traces per frame at latencies of <50 ms after fluorescence photons arrive at the miniscope image sensor. We show that DeCalciOn can accurately decode the position of rats (n = 12) running on a linear track from calcium fluorescence in the hippocampal CA1 layer, and can categorically classify behaviors performed by rats (n = 2) during an instrumental task from calcium fluorescence in orbitofrontal cortex. DeCalciOn achieves high decoding accuracy at short latencies using innovations such as field-programmable gate array hardware for real-time image processing and contour-free methods to efficiently extract calcium traces from sensor images. In summary, our system offers an affordable plug-and-play solution for real-time calcium imaging experiments in behaving animals.
Assuntos
Cálcio , Computadores , Ratos , Animais , MicroscopiaRESUMO
Miniaturized calcium imaging is an emerging neural recording technique that has been widely used for monitoring neural activity on a large scale at a specific brain region of rats or mice. Most existing calcium-image analysis pipelines operate offline. This results in long processing latency, making it difficult to realize closed-loop feedback stimulation for brain research. In recent work, we have proposed an FPGA-based real-time calcium image processing pipeline for closed-loop feedback applications. It can perform real-time calcium image motion correction, enhancement, fast trace extraction, and real-time decoding from extracted traces. Here, we extend this work by proposing a variety of neural network based methods for real-time decoding and evaluate the tradeoff among these decoding methods and accelerator designs. We introduce the implementation of the neural network based decoders on the FPGA, and show their speedup against the implementation on the ARM processor. Our FPGA implementation enables the real-time calcium image decoding with sub-ms processing latency for closed-loop feedback applications.
Assuntos
Cálcio , Redes Neurais de Computação , Ratos , Camundongos , Animais , Retroalimentação , Encéfalo/fisiologiaRESUMO
The single-source shortest path (SSSP) problem is one of the most important and well-studied graph problems widely used in many application domains, such as road navigation, neural image reconstruction, and social network analysis. Although we have known various SSSP algorithms for decades, implementing one for large-scale power-law graphs efficiently is still highly challenging today, because â a work-efficient SSSP algorithm requires priority-order traversal of graph data, â¡ the priority queue needs to be scalable both in throughput and capacity, and ⢠priority-order traversal requires extensive random memory accesses on graph data. In this paper, we present SPLAG to accelerate SSSP for power-law graphs on FPGAs. SPLAG uses a coarse-grained priority queue (CGPQ) to enable high-throughput priority-order graph traversal with a large frontier. To mitigate the high-volume random accesses, SPLAG employs a customized vertex cache (CVC) to reduce off-chip memory access and improve the throughput to read and update vertex data. Experimental results on various synthetic and real-world datasets show up to a 4.9× speedup over state-of-the-art SSSP accelerators, a 2.6× speedup over 32-thread CPU running at 4.4 GHz, and a 0.9× speedup over an A100 GPU that has 4.1× power budget and 3.4× HBM bandwidth. Such a high performance would place SPLAG in the 14th position of the Graph 500 benchmark for data intensive applications (the highest using a single FPGA) with only a 45 W power budget. SPLAG is written in high-level synthesis C++ and is fully parameterized, which means it can be easily ported to various different FPGAs with different configurations. SPLAG is open-source at https://github.com/UCLA-VAST/splag.
RESUMO
With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers can now exploit unprecedented external memory bandwidth. This allows more memory-bounded applications to benefit from FPGA acceleration. However, fully utilizing the available bandwidth may not be an easy task. If an application requires multiple processing elements to access multiple HBM channels, we observed a significant drop in the effective bandwidth. The existing high-level synthesis (HLS) programming environment had limitation in producing an efficient communication architecture. In order to solve this problem, we propose HBM Connect, a high-performance customized interconnect for FPGA HBM board. Novel HLS-based optimization techniques are introduced to increase the throughput of AXI bus masters and switching elements. We also present a high-performance customized crossbar that may replace the built-in crossbar. The effectiveness of HBM Connect is demonstrated using Xilinx's Alveo U280 HBM board. Based on bucket sort and merge sort case studies, we explore several design spaces and find the design point with the best resource-performance trade-off. The result shows that HBM Connect improves the resource-performance metrics by 6.5X-211X.
RESUMO
C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-programmable gate array (FPGA) accelerators in many application domains in recent years, thanks to its competitive quality of results (QoR) and short development cycles compared with the traditional register-transfer level design approach. Yet, limited by the sequential C semantics, it remains challenging to adopt the same highly productive high-level programming approach in many other application domains, where coarse-grained tasks run in parallel and communicate with each other at a fine-grained level. While current HLS tools do support task-parallel programs, the productivity is greatly limited â in the code development cycle due to the poor programmability, â¡ in the correctness verification cycle due to restricted software simulation, and ⢠in the QoR tuning cycle due to slow code generation. Such limited productivity often defeats the purpose of HLS and hinder programmers from adopting HLS for task-parallel FPGA accelerators. In this paper, we extend the HLS C++ language and present a fully automated framework with programmer-friendly interfaces, unconstrained software simulation, and fast hierarchical code generation to overcome these limitations and demonstrate how task-parallel programs can be productively supported in HLS. Experimental results based on a wide range of real-world task-parallel programs show that, on average, the lines of kernel and host code are reduced by 22% and 51%, respectively, which considerably improves the programmability. The correctness verification and the iterative QoR tuning cycles are both greatly shortened by 3.2× and 6.8×, respectively. Our work is open-source at https://github.com/UCLA-VAST/tapa/.
RESUMO
Despite an increasing adoption of high-level synthesis (HLS) for its design productivity advantages, there remains a significant gap in the achievable frequency between an HLS design and a handcrafted RTL one. A key factor that limits the timing quality of the HLS outputs is the difficulty in accurately estimating the interconnect delay at the HLS level. This problem becomes even worse when large HLS designs are implemented on the latest multi-die FPGAs. To tackle this challenge, we propose AutoBridge, an automated framework that couples a coarse-grained floorplanning step with pipelining during HLS compilation. First, our approach provides HLS with a view on the global physical layout of the design, allowing HLS to more easily identify and pipeline the long wires, especially those crossing the die boundaries. Second, by exploiting the flexibility of HLS pipelining, the floorplanner is able to distribute the design logic across multiple dies on the FPGA device without degrading clock frequency. This prevents the placer from aggressively packing the logic on a single die which often results in local routing congestion that eventually degrades timing. Since pipelining may introduce additional latency, we further present analysis and algorithms to ensure the added latency will not compromise the overall throughput. AutoBridge can be integrated into the existing CAD toolflow for Xilinx FPGAs. In our experiments with a total of 43 design configurations, we improve the average frequency from 147 MHz to 297 MHz (a 102% improvement) with no loss of throughput and a negligible change in resource utilization. Notably, in 16 experiments we make the originally unroutable designs achieve 274 MHz on average. The tool is available at https://github.com/Licheng-Guo/AutoBridge.
RESUMO
Stencil kernel is an important type of kernel used extensively in many application domains. Over the years, researchers have been studying the optimizations on parallelization, communication reuse, and computation reuse for various target platforms. However, challenges still exist, especially on the computation reuse problem for accelerators, due to the lack of complete design-space exploration and effective design-space pruning. In this paper, we present solutions to the above challenges for a wide range of stencil kernels (i.e., stencil with reduction operations), where the computation reuse patterns are extremely flexible due to the commutative and associative properties. We formally define the complete design space, based on which we present a provably optimal dynamic programming algorithm and a heuristic beam search algorithm that provides near-optimal solutions under an architecture-aware model. Experimental results show that for synthesizing stencil kernels to FPGAs, compared with state-of-the-art stencil compiler without computation reuse capability, our proposed algorithm can reduce the look-up table (LUT) and digital signal processor (DSP) usage by 58.1% and 54.6% on average respectively, which leads to an average speedup of 2.3× for compute-intensive kernels, outperforming the latest CPU/GPU results.
RESUMO
Computed tomography (CT) is a widely used screening and diagnostic tool that allows clinicians to obtain a high-resolution, volumetric image of internal structures in a non-invasive manner. Increasingly, efforts have been made to improve the image quality of low-dose CT (LDCT) to reduce the cumulative radiation exposure of patients undergoing routine screening exams. The resurgence of deep learning has yielded a new approach for noise reduction by training a deep multi-layer convolutional neural networks (CNN) to map the low-dose to normal-dose CT images. However, CNN-based methods heavily rely on convolutional kernels, which use fixed-size filters to process one local neighborhood within the receptive field at a time. As a result, they are not efficient at retrieving structural information across large regions. In this paper, we propose a novel 3D self-attention convolutional neural network for the LDCT denoising problem. Our 3D self-attention module leverages the 3D volume of CT images to capture a wide range of spatial information both within CT slices and between CT slices. With the help of the 3D self-attention module, CNNs are able to leverage pixels with stronger relationships regardless of their distance and achieve better denoising results. In addition, we propose a self-supervised learning scheme to train a domain-specific autoencoder as the perceptual loss function. We combine these two methods and demonstrate their effectiveness on both CNN-based neural networks and WGAN-based neural networks with comprehensive experiments. Tested on the AAPM-Mayo Clinic Low Dose CT Grand Challenge data set, our experiments demonstrate that self-attention (SA) module and autoencoder (AE) perceptual loss function can efficiently enhance traditional CNNs and can achieve comparable or better results than the state-of-the-art methods.
Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Atenção , Humanos , Razão Sinal-Ruído , Tomografia Computadorizada por Raios XRESUMO
Migration and invasion of cancer cells constitute fundamental processes in tumor progression and metastasis. Migratory cancer cells commonly upregulate expression of plasminogen activator inhibitor 1 (PAI1), and PAI1 correlates with poor prognosis in breast cancer. However, mechanisms by which PAI1 promotes migration of cancer cells remain incompletely defined. Here we show that increased PAI1 drives rearrangement of the actin cytoskeleton, mitochondrial fragmentation, and glycolytic metabolism in triple-negative breast cancer (TNBC) cells. In two-dimensional environments, both stable expression of PAI1 and treatment with recombinant PAI1 increased migration, which could be blocked with the specific inhibitor tiplaxtinin. PAI1 also promoted invasion into the extracellular matrix from coculture spheroids with human mammary fibroblasts in fibrin gels. Elevated cellular PAI1 enhanced cytoskeletal features associated with migration, actin-rich migratory structures, and reduced actin stress fibers. In orthotopic tumor xenografts, we discovered that TNBC cells with elevated PAI1 show collagen fibers aligned perpendicular to the tumor margin, an established marker of invasive breast tumors. Further studies revealed that PAI1 activates ERK signaling, a central regulator of motility, and promotes mitochondrial fragmentation. Consistent with known effects of mitochondrial fragmentation on metabolism, fluorescence lifetime imaging microscopy of endogenous NADH showed that PAI1 promotes glycolysis in cell-based assays, orthotopic tumor xenografts, and lung metastases. Together, these data demonstrate for the first time that PAI1 regulates cancer cell metabolism and suggest targeting metabolism to block motility and tumor progression. IMPLICATIONS: We identified a novel mechanism through which cancer cells alter their metabolism to promote tumor progression.
Assuntos
Citoesqueleto de Actina/metabolismo , Neoplasias Pulmonares/patologia , Neoplasias Pulmonares/secundário , Inibidor 1 de Ativador de Plasminogênio/genética , Inibidor 1 de Ativador de Plasminogênio/metabolismo , Neoplasias de Mama Triplo Negativas/patologia , Animais , Linhagem Celular Tumoral , Movimento Celular , Feminino , Perfilação da Expressão Gênica/métodos , Regulação Neoplásica da Expressão Gênica , Glicólise , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Neoplasias Pulmonares/genética , Neoplasias Pulmonares/metabolismo , Sistema de Sinalização das MAP Quinases , Camundongos , Transplante de Neoplasias , Neoplasias de Mama Triplo Negativas/genética , Neoplasias de Mama Triplo Negativas/metabolismo , Regulação para Cima , Sequenciamento Completo do GenomaRESUMO
Molecular analysis of circulating tumor cells (CTCs) at single-cell resolution offers great promise for cancer diagnostics and therapeutics from simple liquid biopsy. Recent development of massively parallel single-cell RNA-sequencing (scRNA-seq) provides a powerful method to resolve the cellular heterogeneity from gene expression and pathway regulation analysis. However, the scarcity of CTCs and the massive contamination of blood cells limit the utility of currently available technologies. Here, we present Hydro-Seq, a scalable hydrodynamic scRNA-seq barcoding technique, for high-throughput CTC analysis. High cell-capture efficiency and contamination removal capability of Hydro-Seq enables successful scRNA-seq of 666 CTCs from 21 breast cancer patient samples at high throughput. We identify breast cancer drug targets for hormone and targeted therapies and tracked individual cells that express markers of cancer stem cells (CSCs) as well as of epithelial/mesenchymal cell state transitions. Transcriptome analysis of these cells provides insights into monitoring target therapeutics and processes underlying tumor metastasis.
Assuntos
Neoplasias da Mama/patologia , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Células Neoplásicas Circulantes/patologia , Células-Tronco Neoplásicas/patologia , Animais , Biomarcadores Tumorais/genética , Biomarcadores Tumorais/isolamento & purificação , Neoplasias da Mama/sangue , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Linhagem Celular , Transição Epitelial-Mesenquimal , Feminino , Perfilação da Expressão Gênica/instrumentação , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/instrumentação , Ensaios de Triagem em Larga Escala/instrumentação , Ensaios de Triagem em Larga Escala/métodos , Humanos , Biópsia Líquida/instrumentação , Biópsia Líquida/métodos , Camundongos , Análise de Sequência de RNA/instrumentação , Análise de Sequência de RNA/métodos , Análise de Célula Única/instrumentação , Análise de Célula Única/métodosRESUMO
Isolation of tumor-initiating cells currently relies on markers that do not reflect essential biologic functions of these cells. We proposed to overcome this limitation by isolating tumor-initiating cells based on enhanced migration, a function tightly linked to tumor-initiating potential through epithelial-to-mesenchymal transition (EMT). We developed a high-throughput microfluidic migration platform with automated cell tracking software and facile recovery of cells for downstream functional and genetic analyses. Using this device, we isolated a small subpopulation of migratory cells with significantly greater tumor formation and metastasis in mouse models. Whole transcriptome sequencing of migratory versus non-migratory cells from two metastatic breast cancer cell lines revealed a unique set of genes as key regulators of tumor-initiating cells. We focused on phosphatidylserine decarboxylase (PISD), a gene downregulated by 8-fold in migratory cells. Breast cancer cells overexpressing PISD exhibited reduced tumor-initiating potential in a high-throughput microfluidic mammosphere device and mouse xenograft model. PISD regulated multiple aspects of mitochondria, highlighting mitochondrial functions as therapeutic targets against cancer stem cells. This research establishes not only a novel microfluidic technology for functional isolation of tumor-initiating cells regardless of cancer type, but also a new approach to identify essential regulators of these cells as targets for drug development.
Assuntos
Carboxiliases/metabolismo , Separação Celular , Técnicas Analíticas Microfluídicas , Células-Tronco Neoplásicas/metabolismo , Animais , Carboxiliases/genética , Linhagem Celular Tumoral , Movimento Celular/genética , Separação Celular/métodos , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Dispositivos Lab-On-A-Chip , Camundongos , Mitocôndrias/metabolismo , Fenótipo , TranscriptomaRESUMO
Reducing radiation doses is one of the key concerns in computed tomography (CT) based 3D reconstruction. Although iterative methods such as the expectation maximization (EM) algorithm can be used to address this issue, applying this algorithm to practice is difficult due to the long execution time. Our goal is to decrease this long execution time to an order of a few minutes, so that low-dose 3D reconstruction can be performed even in time-critical events. In this paper we introduce a novel parallel scheme that takes advantage of numerous block RAMs on field-programmable gate arrays (FPGAs). Also, an external memory bandwidth reduction strategy is presented to reuse both the sinogram and the voxel intensity. Moreover, a customized processing engine based on the FPGA is presented to increase overall throughput while reducing the logic consumption. Finally, a hardware and software flow is proposed to quickly construct a design for various CT machines. The complete reconstruction system is implemented on an FPGA-based server-class node. Experiments on actual patient data show that a 26.9 × speedup can be achieved over a 16-thread multicore CPU implementation.
Assuntos
Tomografia Computadorizada de Feixe Cônico/métodos , Neoplasias Pulmonares/diagnóstico por imagem , Algoritmos , Tomografia Computadorizada de Feixe Cônico/instrumentação , Humanos , Imagens de Fantasmas , Doses de Radiação , Processamento de Sinais Assistido por ComputadorRESUMO
With the end of CPU core scaling due to dark silicon limitations, customized accelerators on FPGAs have gained increased attention in modern datacenters due to their lower power, high performance and energy efficiency. Evidenced by Microsoft's FPGA deployment in its Bing search engine and Intel's 16.7 billion acquisition of Altera, integrating FPGAs into datacenters is considered one of the most promising approaches to sustain future datacenter growth. However, it is quite challenging for existing big data computing systems-like Apache Spark and Hadoop-to access the performance and energy benefits of FPGA accelerators. In this paper we design and implement Blaze to provide programming and runtime support for enabling easy and efficient deployments of FPGA accelerators in datacenters. In particular, Blaze abstracts FPGA accelerators as a service (FaaS) and provides a set of clean programming APIs for big data processing applications to easily utilize those accelerators. Our Blaze runtime implements an FaaS framework to efficiently share FPGA accelerators among multiple heterogeneous threads on a single node, and extends Hadoop YARN with accelerator-centric scheduling to efficiently share them among multiple computing tasks in the cluster. Experimental results using four representative big data applications demonstrate that Blaze greatly reduces the programming efforts to access FPGA accelerators in systems like Apache Spark and YARN, and improves the system throughput by 1.7 × to 3× (and energy efficiency by 1.5× to 2.7×) compared to a conventional CPU-only cluster.
RESUMO
Computer-aided detection and diagnosis (CAD) has been widely investigated to improve radiologists׳ diagnostic accuracy in detecting and characterizing lung disease, as well as to assist with the processing of increasingly sizable volumes of imaging. Lung segmentation is a requisite preprocessing step for most CAD schemes. This paper proposes a parameter-free lung segmentation algorithm with the aim of improving lung nodule detection accuracy, focusing on juxtapleural nodules. A bidirectional chain coding method combined with a support vector machine (SVM) classifier is used to selectively smooth the lung border while minimizing the over-segmentation of adjacent regions. This automated method was tested on 233 computed tomography (CT) studies from the lung imaging database consortium (LIDC), representing 403 juxtapleural nodules. The approach obtained a 92.6% re-inclusion rate. Segmentation accuracy was further validated on 10 randomly selected CT series, finding a 0.3% average over-segmentation ratio and 2.4% under-segmentation rate when compared to manually segmented reference standards done by an expert.
Assuntos
Pulmão/diagnóstico por imagem , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Nódulo Pulmonar Solitário/diagnóstico por imagem , Humanos , Máquina de Vetores de SuporteRESUMO
Theories of neural coding seek to explain how states of the world are mapped onto states of the brain. Here, we compare how an animal's location in space can be encoded by two different kinds of brain states: population vectors stored by patterns of neural firing rates, versus synchronization vectors stored by patterns of synchrony among neural oscillators. It has previously been shown that a population code stored by spatially tuned 'grid cells' can exhibit desirable properties such as high storage capacity and strong fault tolerance; here it is shown that similar properties are attainable with a synchronization code stored by rhythmically bursting 'theta cells' that lack spatial tuning. Simulations of a ring attractor network composed from theta cells suggest how a synchronization code might be implemented using fewer neurons and synapses than a population code with similar storage capacity. It is conjectured that reciprocal connections between grid and theta cells might control phase noise to correct two kinds of errors that can arise in the code: path integration and teleportation errors. Based upon these analyses, it is proposed that a primary function of spatially tuned neurons might be to couple the phases of neural oscillators in a manner that allows them to encode spatial locations as patterns of neural synchrony.
Assuntos
Relógios Biológicos/fisiologia , Modelos Neurológicos , Rede Nervosa/fisiologia , Neurônios/fisiologia , Percepção Espacial/fisiologia , Ritmo Teta/fisiologia , Animais , Simulação por Computador , HumanosRESUMO
Cardiovascular disease (CVD) is a major issue to public health. It contributes 41% to the Chinese death rate each year. This huge loss encouraged us to develop a Wearable Efficient teleCARdiology systEm (WE-CARE) for early warning and prevention of CVD risks in real time. WE-CARE is expected to work 24/7 online for mobile health (mHealth) applications. Unfortunately, this purpose is often disrupted in system experiments and clinical trials, even if related enabling technologies work properly. This phenomenon is rooted in the overload issue of complex Electrocardiogram (ECG) data in terms of system integration. In this study, our main objective is to get a system light-loading technology to enable mHealth with a benchmarked ECG anomaly recognition rate. To achieve this objective, we propose an approach to purify clinical features from ECG raw data based on manifold learning, called the Manifold-based ECG-feature Purification algorithm. Our clinical trials verify that our proposal can detect anomalies with a recognition rate of up to 94% which is highly valuable in daily public health-risk alert applications based on clinical criteria. Most importantly, the experiment results demonstrate that the WE-CARE system enabled by our proposal can enhance system reliability by at least two times and reduce false negative rates to 0.76%, and extend the battery life by 40.54%, in the system integration level.
Assuntos
Eletrocardiografia Ambulatorial/métodos , Telemedicina/métodos , Algoritmos , Doenças Cardiovasculares/diagnóstico , Diagnóstico Precoce , Eletrocardiografia Ambulatorial/instrumentação , Humanos , Aplicações da Informática Médica , Processamento de Sinais Assistido por Computador , Telemedicina/instrumentaçãoRESUMO
In this paper we describe an FPGA-based platform for high-performance and low-power simulation of neural microcircuits composed from integrate-and-fire (IAF) neurons. Based on high-level synthesis, our platform uses design templates to map hierarchies of neuron model to logic fabrics. This approach bypasses high design complexity and enables easy optimization and design space exploration. We demonstrate the benefits of our platform by simulating a variety of neural microcircuits that perform oscillatory path integration, which evidence suggests may be a critical building block of the navigation system inside a rodent's brain. Experiments show that our FPGA simulation engine for oscillatory neural microcircuits can achieve up to 39× speedup compared to software benchmarks on commodity CPU, and 232× energy reduction compared to embedded ARM core.