RESUMO
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
RESUMO
Small animal micro computed tomography (µCT) is an important tool in cancer research and is used to quantify liver and lung tumors. A type of cancer that is intensively investigated with µCT is hepatocellular carcinoma (HCC). µCT scans acquire projections from different angles of the gantry which rotates X-ray source and detector around the animal. Motion of the animal causes inconsistencies between the projections which lead to artifacts in the resulting image. This is problematic in HCC research, where respiratory motion affects the image quality by causing hypodense intensity at the liver edge and smearing out small structures such as tumors. Dealing with respiratory motion is particularly difficult in a high throughput setting when multiple mice are scanned together and projection removal by retrospective respiratory gating may compromise image quality and dose efficiency. In mice, inhalation anesthesia leads to a regular respiration with short gasps and long phases of negligible motion. Using this effect and an iterative reconstruction which can cope with missing angles, we discard the relatively few projections in which the gasping motion occurs. Moreover, since gated acquisition, i.e., acquiring multiple projections from a single gantry angle is not a requirement, this method can be applied to existing scans. We applied our method in a high throughput setting in which four mice with HCC tumors were scanned simultaneously in a multi-mouse bed. To establish a ground truth, we manually selected projections with visible respiratory motion. Our automated intrinsic breathing projection selection achieved an accordance of 97% with manual selection. We reconstructed volumetric images and demonstrated that our intrinsic gating method significantly reduces the hypodense depiction at the cranial liver edge and improves the detectability of small tumors. Furthermore, we show that projection removal in a four mice scan discards only 7.5% more projections than in a single-mouse setting, i.e., four mouse scanning does not substantially compromise dose efficiency or image quality. To the best of our knowledge, no comparable method that combines multi-mouse scans for high throughput, intrinsic respiratory gating, and an available iterative reconstruction has been described for liver tumor imaging before.