Search | VHL Regional Portal

1.

Bimodal Visualization of Industrial X-Ray and Neutron Computed Tomography Data.

Huang, Xuan; Miao, Haichao; Kim, Hyojin; Townsend, Andrew; Champley, Kyle; Tringe, Joseph; Pascucci, Valerio; Bremer, Peer-Timo.

IEEE Trans Vis Comput Graph ; PP2024 Apr 04.

Article in English | MEDLINE | ID: mdl-38578849

ABSTRACT

Advanced manufacturing creates increasingly complex objects with material compositions that are often difficult to characterize by a single modality. Our collaborating domain scientists are going beyond traditional methods by employing both X-ray and neutron computed tomography to obtain complementary representations expected to better resolve material boundaries. However, the use of two modalities creates its own challenges for visualization, requiring either complex adjustments of bimodal transfer functions or the need for multiple views. Together with experts in nondestructive evaluation, we designed a novel interactive bimodal visualization approach to create a combined view of the co-registered X-ray and neutron acquisitions of industrial objects. Using an automatic topological segmentation of the bivariate histogram of X-ray and neutron values as a starting point, the system provides a simple yet effective interface to easily create, explore, and adjust a bimodal visualization. We propose a widget with simple brushing interactions that enables the user to quickly correct the segmented histogram results. Our semiautomated system enables domain experts to intuitively explore large bimodal datasets without the need for either advanced segmentation algorithms or knowledge of visualization techniques. We demonstrate our approach using synthetic examples, industrial phantom objects created to stress bimodal scanning techniques, and real-world objects, and we discuss expert feedback.

2.

MaPPeRTrac: A Massively Parallel, Portable, and Reproducible Tractography Pipeline.

Cai, Lanya T; Moon, Joseph; Camacho, Paul B; Anderson, Aaron T; Chwa, Won Jong; Sutton, Bradley P; Markowitz, Amy J; Palacios, Eva M; Rodriguez, Alexis; Manley, Geoffrey T; Shankar, Shivsundaram; Bremer, Peer-Timo; Mukherjee, Pratik; Madduri, Ravi K.

Neuroinformatics ; 22(2): 177-191, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38446357

ABSTRACT

Large-scale diffusion MRI tractography remains a significant challenge. Users must orchestrate a complex sequence of instructions that requires many software packages with complex dependencies and high computational costs. We developed MaPPeRTrac, an edge-centric tractography pipeline that simplifies and accelerates this process in a wide range of high-performance computing (HPC) environments. It fully automates either probabilistic or deterministic tractography, starting from a subject's magnetic resonance imaging (MRI) data, including structural and diffusion MRI images, to the edge density image (EDI) of their structural connectomes. Dependencies are containerized with Singularity (now called Apptainer) and decoupled from code to enable rapid prototyping and modification. Data derivatives are organized with the Brain Imaging Data Structure (BIDS) to ensure that they are findable, accessible, interoperable, and reusable following FAIR principles. The pipeline takes full advantage of HPC resources using the Parsl parallel programming framework, resulting in the creation of connectome datasets of unprecedented size. MaPPeRTrac is publicly available and tested on commercial and scientific hardware, so it can accelerate brain connectome research for a broader user community. MaPPeRTrac is available at: https://github.com/LLNL/mappertrac .

Subject(s)

Connectome , Magnetic Resonance Imaging , Magnetic Resonance Imaging/methods , Diffusion Magnetic Resonance Imaging/methods , Brain/diagnostic imaging , Connectome/methods

3.

Machine Learning-Driven Multiscale Modeling: Bridging the Scales with a Next-Generation Simulation Infrastructure.

Ingólfsson, Helgi I; Bhatia, Harsh; Aydin, Fikret; Oppelstrup, Tomas; López, Cesar A; Stanton, Liam G; Carpenter, Timothy S; Wong, Sergio; Di Natale, Francesco; Zhang, Xiaohua; Moon, Joseph Y; Stanley, Christopher B; Chavez, Joseph R; Nguyen, Kien; Dharuman, Gautham; Burns, Violetta; Shrestha, Rebika; Goswami, Debanjan; Gulten, Gulcin; Van, Que N; Ramanathan, Arvind; Van Essen, Brian; Hengartner, Nicolas W; Stephen, Andrew G; Turbyville, Thomas; Bremer, Peer-Timo; Gnanakaran, S; Glosli, James N; Lightstone, Felice C; Nissley, Dwight V; Streitz, Frederick H.

J Chem Theory Comput ; 19(9): 2658-2675, 2023 May 09.

Article in English | MEDLINE | ID: mdl-37075065

ABSTRACT

Interdependence across time and length scales is common in biology, where atomic interactions can impact larger-scale phenomenon. Such dependence is especially true for a well-known cancer signaling pathway, where the membrane-bound RAS protein binds an effector protein called RAF. To capture the driving forces that bring RAS and RAF (represented as two domains, RBD and CRD) together on the plasma membrane, simulations with the ability to calculate atomic detail while having long time and large length- scales are needed. The Multiscale Machine-Learned Modeling Infrastructure (MuMMI) is able to resolve RAS/RAF protein-membrane interactions that identify specific lipid-protein fingerprints that enhance protein orientations viable for effector binding. MuMMI is a fully automated, ensemble-based multiscale approach connecting three resolution scales: (1) the coarsest scale is a continuum model able to simulate milliseconds of time for a 1 µm2 membrane, (2) the middle scale is a coarse-grained (CG) Martini bead model to explore protein-lipid interactions, and (3) the finest scale is an all-atom (AA) model capturing specific interactions between lipids and proteins. MuMMI dynamically couples adjacent scales in a pairwise manner using machine learning (ML). The dynamic coupling allows for better sampling of the refined scale from the adjacent coarse scale (forward) and on-the-fly feedback to improve the fidelity of the coarser scale from the adjacent refined scale (backward). MuMMI operates efficiently at any scale, from a few compute nodes to the largest supercomputers in the world, and is generalizable to simulate different systems. As computing resources continue to increase and multiscale methods continue to advance, fully automated multiscale simulations (like MuMMI) will be commonly used to address complex science questions.

Subject(s)

Membrane Proteins , Molecular Dynamics Simulation , Membrane Proteins/chemistry , Cell Membrane/metabolism , Machine Learning , Lipids

4.

The confluence of machine learning and multiscale simulations.

Bhatia, Harsh; Aydin, Fikret; Carpenter, Timothy S; Lightstone, Felice C; Bremer, Peer-Timo; Ingólfsson, Helgi I; Nissley, Dwight V; Streitz, Frederick H.

Curr Opin Struct Biol ; 80: 102569, 2023 06.

Article in English | MEDLINE | ID: mdl-36966691

ABSTRACT

Multiscale modeling has a long history of use in structural biology, as computational biologists strive to overcome the time- and length-scale limits of atomistic molecular dynamics. Contemporary machine learning techniques, such as deep learning, have promoted advances in virtually every field of science and engineering and are revitalizing the traditional notions of multiscale modeling. Deep learning has found success in various approaches for distilling information from fine-scale models, such as building surrogate models and guiding the development of coarse-grained potentials. However, perhaps its most powerful use in multiscale modeling is in defining latent spaces that enable efficient exploration of conformational space. This confluence of machine learning and multiscale simulation with modern high-performance computing promises a new era of discovery and innovation in structural biology.

Subject(s)

Molecular Dynamics Simulation , Molecular Conformation

5.

Scalable Comparative Visualization of Ensembles of Call Graphs.

Kesavan, Suraj P; Bhatia, Harsh; Bhatele, Abhinav; Brink, Stephanie; Pearce, Olga; Gamblin, Todd; Bremer, Peer-Timo; Ma, Kwan-Liu.

IEEE Trans Vis Comput Graph ; 29(3): 1691-1704, 2023 Mar.

Article in English | MEDLINE | ID: mdl-34797765

ABSTRACT

Optimizing the performance of large-scale parallel codes is critical for efficient utilization of computing resources. Code developers often explore various execution parameters, such as hardware configurations, system software choices, and application parameters, and are interested in detecting and understanding bottlenecks in different executions. They often collect hierarchical performance profiles represented as call graphs, which combine performance metrics with their execution contexts. The crucial task of exploring multiple call graphs together is tedious and challenging because of the many structural differences in the execution contexts and significant variability in the collected performance metrics (e.g., execution runtime). In this paper, we present Ensemble CallFlow to support the exploration of ensembles of call graphs using new types of visualizations, analysis, graph operations, and features. We introduce ensemble-Sankey, a new visual design that combines the strengths of resource-flow (Sankey) and box-plot visualization techniques. Whereas the resource-flow visualization can easily and intuitively describe the graphical nature of the call graph, the box plots overlaid on the nodes of Sankey convey the performance variability within the ensemble. Our interactive visual interface provides linked views to help explore ensembles of call graphs, e.g., by facilitating the analysis of structural differences, and identifying similar or distinct call graphs. We demonstrate the effectiveness and usefulness of our design through case studies on large-scale parallel codes.

6.

Tailored approach to study Legionella infection using a lattice light sheet microscope (LLSM).

Yi, Xiyu; Miao, Haichao; Lo, Jacky Kai-Yin; Elsheikh, Maher M; Lee, Tek-Hyung; Jiang, Chenfanfu; Zhang, Yuliang; Segelke, Brent W; Overton, K Wesley; Bremer, Peer-Timo; Laurence, Ted A.

Biomed Opt Express ; 13(8): 4134-4159, 2022 Aug 01.

Article in English | MEDLINE | ID: mdl-36032581

ABSTRACT

Legionella is a genus of ubiquitous environmental pathogens found in freshwater systems, moist soil, and composted materials. More than four decades of Legionella research has provided important insights into Legionella pathogenesis. Although standard commercial microscopes have led to significant advances in understanding Legionella pathogenesis, great potential exists in the deployment of more advanced imaging techniques to provide additional insights. The lattice light sheet microscope (LLSM) is a recently developed microscope for 4D live cell imaging with high resolution and minimum photo-damage. We built a LLSM with an improved version for the optical layout with two path-stretching mirror sets and a novel reconfigurable galvanometer scanner (RGS) module to improve the reproducibility and reliability of the alignment and maintenance of the LLSM. We commissioned this LLSM to study Legionella pneumophila infection with a tailored workflow designed over instrumentation, experiments, and data processing methods. Our results indicate that Legionella pneumophila infection is correlated with a series of morphological signatures such as smoothness, migration pattern and polarity both statistically and dynamically. Our work demonstrates the benefits of using LLSM for studying long-term questions in bacterial infection. Our free-for-use modifications and workflow designs on the use of LLSM system contributes to the adoption and promotion of the state-of-the-art LLSM technology for both academic and commercial applications.

7.

Exploring CRD mobility during RAS/RAF engagement at the membrane.

Nguyen, Kien; López, Cesar A; Neale, Chris; Van, Que N; Carpenter, Timothy S; Di Natale, Francesco; Travers, Timothy; Tran, Timothy H; Chan, Albert H; Bhatia, Harsh; Frank, Peter H; Tonelli, Marco; Zhang, Xiaohua; Gulten, Gulcin; Reddy, Tyler; Burns, Violetta; Oppelstrup, Tomas; Hengartner, Nick; Simanshu, Dhirendra K; Bremer, Peer-Timo; Chen, De; Glosli, James N; Shrestha, Rebika; Turbyville, Thomas; Streitz, Frederick H; Nissley, Dwight V; Ingólfsson, Helgi I; Stephen, Andrew G; Lightstone, Felice C; Gnanakaran, Sandrasegaram.

Biophys J ; 121(19): 3630-3650, 2022 10 04.

Article in English | MEDLINE | ID: mdl-35778842

ABSTRACT

During the activation of mitogen-activated protein kinase (MAPK) signaling, the RAS-binding domain (RBD) and cysteine-rich domain (CRD) of RAF bind to active RAS at the plasma membrane. The orientation of RAS at the membrane may be critical for formation of the RAS-RBDCRD complex and subsequent signaling. To explore how RAS membrane orientation relates to the protein dynamics within the RAS-RBDCRD complex, we perform multiscale coarse-grained and all-atom molecular dynamics (MD) simulations of KRAS4b bound to the RBD and CRD domains of RAF-1, both in solution and anchored to a model plasma membrane. Solution MD simulations describe dynamic KRAS4b-CRD conformations, suggesting that the CRD has sufficient flexibility in this environment to substantially change its binding interface with KRAS4b. In contrast, when the ternary complex is anchored to the membrane, the mobility of the CRD relative to KRAS4b is restricted, resulting in fewer distinct KRAS4b-CRD conformations. These simulations implicate membrane orientations of the ternary complex that are consistent with NMR measurements. While a crystal structure-like conformation is observed in both solution and membrane simulations, a particular intermolecular rearrangement of the ternary complex is observed only when it is anchored to the membrane. This configuration emerges when the CRD hydrophobic loops are inserted into the membrane and helices α3-5 of KRAS4b are solvent exposed. This membrane-specific configuration is stabilized by KRAS4b-CRD contacts that are not observed in the crystal structure. These results suggest modulatory interplay between the CRD and plasma membrane that correlate with RAS/RAF complex structure and dynamics, and potentially influence subsequent steps in the activation of MAPK signaling.

Subject(s)

Cysteine , Proto-Oncogene Proteins c-raf , Binding Sites , Cell Membrane/metabolism , Cysteine/metabolism , Mitogen-Activated Protein Kinases/metabolism , Protein Binding , Proto-Oncogene Proteins c-raf/chemistry , Proto-Oncogene Proteins c-raf/metabolism , Proto-Oncogene Proteins p21(ras)/metabolism , Solvents/metabolism

8.

The Case for Optimized Edge-Centric Tractography at Scale.

Moon, Joseph Y; Mukherjee, Pratik; Madduri, Ravi K; Markowitz, Amy J; Cai, Lanya T; Palacios, Eva M; Manley, Geoffrey T; Bremer, Peer-Timo.

Front Neuroinform ; 16: 752471, 2022.

Article in English | MEDLINE | ID: mdl-35651721

ABSTRACT

The anatomic validity of structural connectomes remains a significant uncertainty in neuroimaging. Edge-centric tractography reconstructs streamlines in bundles between each pair of cortical or subcortical regions. Although edge bundles provides a stronger anatomic embedding than traditional connectomes, calculating them for each region-pair requires exponentially greater computation. We observe that major speedup can be achieved by reducing the number of streamlines used by probabilistic tractography algorithms. To ensure this does not degrade connectome quality, we calculate the identifiability of edge-centric connectomes between test and re-test sessions as a proxy for information content. We find that running PROBTRACKX2 with as few as 1 streamline per voxel per region-pair has no significant impact on identifiability. Variation in identifiability caused by streamline count is overshadowed by variation due to subject demographics. This finding even holds true in an entirely different tractography algorithm using MRTrix. Incidentally, we observe that Jaccard similarity is more effective than Pearson correlation in calculating identifiability for our subject population.

9.

AMM: Adaptive Multilinear Meshes.

Bhatia, Harsh; Hoang, Duong; Morrical, Nate; Pascucci, Valerio; Bremer, Peer-Timo; Lindstrom, Peter.

IEEE Trans Vis Comput Graph ; 28(6): 2350-2363, 2022 Jun.

Article in English | MEDLINE | ID: mdl-35394910

ABSTRACT

Adaptive representations are increasingly indispensable for reducing the in-memory and on-disk footprints of large-scale data. Usual solutions are designed broadly along two themes: reducing data precision, e.g., through compression, or adapting data resolution, e.g., using spatial hierarchies. Recent research suggests that combining the two approaches, i.e., adapting both resolution and precision simultaneously, can offer significant gains over using them individually. However, there currently exist no practical solutions to creating and evaluating such representations at scale. In this work, we present a new resolution-precision-adaptive representation to support hybrid data reduction schemes and offer an interface to existing tools and algorithms. Through novelties in spatial hierarchy, our representation, Adaptive Multilinear Meshes (AMM), provides considerable reduction in the mesh size. AMM creates a piecewise multilinear representation of uniformly sampled scalar data and can selectively relax or enforce constraints on conformity, continuity, and coverage, delivering a flexible adaptive representation. AMM also supports representing the function using mixed-precision values to further the achievable gains in data reduction. We describe a practical approach to creating AMM incrementally using arbitrary orderings of data and demonstrate AMM on six types of resolution and precision datastreams. By interfacing with state-of-the-art rendering tools through VTK, we demonstrate the practical and computational advantages of our representation for visualization techniques. With an open-source release of our tool to create AMM, we make such evaluation of data reduction accessible to the community, which we hope will foster new opportunities and future data reduction schemes.

10.

Machine learning-driven multiscale modeling reveals lipid-dependent dynamics of RAS signaling proteins.

Ingólfsson, Helgi I; Neale, Chris; Carpenter, Timothy S; Shrestha, Rebika; López, Cesar A; Tran, Timothy H; Oppelstrup, Tomas; Bhatia, Harsh; Stanton, Liam G; Zhang, Xiaohua; Sundram, Shiv; Di Natale, Francesco; Agarwal, Animesh; Dharuman, Gautham; Kokkila Schumacher, Sara I L; Turbyville, Thomas; Gulten, Gulcin; Van, Que N; Goswami, Debanjan; Jean-Francois, Frantz; Agamasu, Constance; Chen, De; Hettige, Jeevapani J; Travers, Timothy; Sarkar, Sumantra; Surh, Michael P; Yang, Yue; Moody, Adam; Liu, Shusen; Van Essen, Brian C; Voter, Arthur F; Ramanathan, Arvind; Hengartner, Nicolas W; Simanshu, Dhirendra K; Stephen, Andrew G; Bremer, Peer-Timo; Gnanakaran, S; Glosli, James N; Lightstone, Felice C; McCormick, Frank; Nissley, Dwight V; Streitz, Frederick H.

Proc Natl Acad Sci U S A ; 119(1)2022 01 04.

Article in English | MEDLINE | ID: mdl-34983849

ABSTRACT

RAS is a signaling protein associated with the cell membrane that is mutated in up to 30% of human cancers. RAS signaling has been proposed to be regulated by dynamic heterogeneity of the cell membrane. Investigating such a mechanism requires near-atomistic detail at macroscopic temporal and spatial scales, which is not possible with conventional computational or experimental techniques. We demonstrate here a multiscale simulation infrastructure that uses machine learning to create a scale-bridging ensemble of over 100,000 simulations of active wild-type KRAS on a complex, asymmetric membrane. Initialized and validated with experimental data (including a new structure of active wild-type KRAS), these simulations represent a substantial advance in the ability to characterize RAS-membrane biology. We report distinctive patterns of local lipid composition that correlate with interfacially promiscuous RAS multimerization. These lipid fingerprints are coupled to RAS dynamics, predicted to influence effector binding, and therefore may be a mechanism for regulating cell signaling cascades.

Subject(s)

Cell Membrane/enzymology , Lipids/chemistry , Machine Learning , Molecular Dynamics Simulation , Protein Multimerization , Proto-Oncogene Proteins p21(ras)/chemistry , Signal Transduction , Humans

11.

Towards replacing physical testing of granular materials with a Topology-based Model.

Venkat, Aniketh; Gyulassy, Attila; Kosiba, Graham; Maiti, Amitesh; Reinstein, Henry; Gee, Richard; Bremer, Peer-Timo; Pascucci, Valerio.

IEEE Trans Vis Comput Graph ; 28(1): 76-85, 2022 Jan.

Article in English | MEDLINE | ID: mdl-34882553

ABSTRACT

In the study of packed granular materials, the performance of a sample (e.g., the detonation of a high-energy explosive) often correlates to measurements of a fluid flowing through it. The "effective surface area," the surface area accessible to the airflow, is typically measured using a permeametry apparatus that relates the flow conductance to the permeable surface area via the Carman-Kozeny equation. This equation allows calculating the flow rate of a fluid flowing through the granules packed in the sample for a given pressure drop. However, Carman-Kozeny makes inherent assumptions about tunnel shapes and flow paths that may not accurately hold in situations where the particles possess a wide distribution in shapes, sizes, and aspect ratios, as is true with many powdered systems of technological and commercial interest. To address this challenge, we replicate these measurements virtually on micro-CT images of the powdered material, introducing a new Pore Network Model based on the skeleton of the Morse-Smale complex. Pores are identified as basins of the complex, their incidence encodes adjacency, and the conductivity of the capillary between them is computed from the cross-section at their interface. We build and solve a resistive network to compute an approximate laminar fluid flow through the pore structure. We provide two means of estimating flow-permeable surface area: (i) by direct computation of conductivity, and (ii) by identifying dead-ends in the flow coupled with isosurface extraction and the application of the Carman-Kozeny equation, with the aim of establishing consistency over a range of particle shapes, sizes, porosity levels, and void distribution patterns.

12.

A Visual Comparison of Silent Error Propagation.

Li, Zhimin; Menon, Harshitha; Mohror, Kathryn; Liu, Shusen; Guo, Luanzheng; Bremer, Peer-Timo; Pascucci, Valerio.

IEEE Trans Vis Comput Graph ; PP2022 Dec 20.

Article in English | MEDLINE | ID: mdl-37015425

ABSTRACT

High-performance computing (HPC) systems play a critical role in facilitating scientific discoveries. Their scale and complexity (e.g., the number of computational units and software stack) continue to grow as new systems are expected to process increasingly more data and reduce computing time. However, with more processing elements, the probability that these systems will experience a random bit-flip error that corrupts a program's output also increases, which is often recognized as silent data corruption. Analyzing the resiliency of HPC applications in extreme-scale computing to silent data corruption is crucial but difficult. An HPC application often contains a large number of computation units that need to be tested, and error propagation caused by error corruption is complex and difficult to interpret. To accommodate this challenge, we propose an interactive visualization system that helps HPC researchers understand the resiliency of HPC applications and compare their error propagation. Our system models an application's error propagation to study a program's resiliency by constructing and visualizing its fault tolerance boundary. Coordinating with multiple interactive designs, our system enables domain experts to efficiently explore the complicated spatial and temporal correlation between error propagations. At the end, the system integrated a nonmonotonic error propagation analysis with an adjustable graph propagation visualization to help domain experts examine the details of error propagation and answer such questions as why an error is mitigated or amplified by program execution.

13.

MARGIN: Uncovering Deep Neural Networks Using Graph Signal Analysis.

Anirudh, Rushil; Thiagarajan, Jayaraman J; Sridhar, Rahul; Bremer, Peer-Timo.

Front Big Data ; 4: 589417, 2021.

Article in English | MEDLINE | ID: mdl-34337397

ABSTRACT

Interpretability has emerged as a crucial aspect of building trust in machine learning systems, aimed at providing insights into the working of complex neural networks that are otherwise opaque to a user. There are a plethora of existing solutions addressing various aspects of interpretability ranging from identifying prototypical samples in a dataset to explaining image predictions or explaining mis-classifications. While all of these diverse techniques address seemingly different aspects of interpretability, we hypothesize that a large family of interepretability tasks are variants of the same central problem which is identifying relative change in a model's prediction. This paper introduces MARGIN, a simple yet general approach to address a large set of interpretability tasks MARGIN exploits ideas rooted in graph signal analysis to determine influential nodes in a graph, which are defined as those nodes that maximally describe a function defined on the graph. By carefully defining task-specific graphs and functions, we demonstrate that MARGIN outperforms existing approaches in a number of disparate interpretability challenges.

14.

Visualizing Hierarchical Performance Profiles of Parallel Codes Using CallFlow.

Nguyen, Huu Tan; Bhatele, Abhinav; Jain, Nikhil; Kesavan, Suraj P; Bhatia, Harsh; Gamblin, Todd; Ma, Kwan-Liu; Bremer, Peer-Timo.

IEEE Trans Vis Comput Graph ; 27(4): 2455-2468, 2021 Apr.

Article in English | MEDLINE | ID: mdl-31751276

ABSTRACT

Calling context trees (CCTs) couple performance metrics with call paths, helping understand the execution and performance of parallel programs. To identify performance bottlenecks, programmers and performance analysts visually explore CCTs to form and validate hypotheses regarding degraded performance. However, due to the complexity of parallel programs, existing visual representations do not scale to applications running on a large number of processors. We present CallFlow, an interactive visual analysis tool that provides a high-level overview of CCTs together with semantic refinement operations to progressively explore CCTs. Using a flow-based metaphor, we visualize a CCT by treating execution time as a resource spent during the call chain, and demonstrate the effectiveness of our design with case studies on large-scale, production simulation codes.

15.

Coverage-Based Designs Improve Sample Mining and Hyperparameter Optimization.

Muniraju, Gowtham; Kailkhura, Bhavya; Thiagarajan, Jayaraman J; Bremer, Peer-Timo; Tepedelenlioglu, Cihan; Spanias, Andreas.

IEEE Trans Neural Netw Learn Syst ; 32(3): 1241-1253, 2021 03.

Article in English | MEDLINE | ID: mdl-32305942

ABSTRACT

Sampling one or more effective solutions from large search spaces is a recurring idea in machine learning (ML), and sequential optimization has become a popular solution. Typical examples include data summarization, sample mining for predictive modeling, and hyperparameter optimization. Existing solutions attempt to adaptively trade off between global exploration and local exploitation, in which the initial exploratory sample is critical to their success. While discrepancy-based samples have become the de facto approach for exploration, results from computer graphics suggest that coverage-based designs, e.g., Poisson disk sampling, can be a superior alternative. In order to successfully adopt coverage-based sample designs to ML applications, which were originally developed for 2-D image analysis, we propose fundamental advances by constructing a parameterized family of designs with provably improved coverage characteristics and developing algorithms for effective sample synthesis. Using experiments in sample mining and hyperparameter optimization for supervised learning, we show that our approach consistently outperforms the existing exploratory sampling methods in both blind exploration and sequential search with Bayesian optimization.

16.

Efficient and Flexible Hierarchical Data Layouts for a Unified Encoding of Scalar Field Precision and Resolution.

Hoang, Duong; Summa, Brian; Bhatia, Harsh; Lindstrom, Peter; Klacansky, Pavol; Usher, Will; Bremer, Peer-Timo; Pascucci, Valerio.

IEEE Trans Vis Comput Graph ; 27(2): 603-613, 2021 Feb.

Article in English | MEDLINE | ID: mdl-33048707

ABSTRACT

To address the problem of ever-growing scientific data sizes making data movement a major hindrance to analysis, we introduce a novel encoding for scalar fields: a unified tree of resolution and precision, specifically constructed so that valid cuts correspond to sensible approximations of the original field in the precision-resolution space. Furthermore, we introduce a highly flexible encoding of such trees that forms a parameterized family of data hierarchies. We discuss how different parameter choices lead to different trade-offs in practice, and show how specific choices result in known data representation schemes such as zfp [52], idx [58], and jpeg2000 [76]. Finally, we provide system-level details and empirical evidence on how such hierarchies facilitate common approximate queries with minimal data movement and time, using real-world data sets ranging from a few gigabytes to nearly a terabyte in size. Experiments suggest that our new strategy of combining reductions in resolution and precision is competitive with state-of-the-art compression techniques with respect to data quality, while being significantly more flexible and orders of magnitude faster, and requiring significantly reduced resources.

17.

SpotSDC: Revealing the Silent Data Corruption Propagation in High-Performance Computing Systems.

Li, Zhimin; Menon, Harshitha; Maljovec, Dan; Livnat, Yarden; Liu, Shusen; Mohror, Kathryn; Bremer, Peer-Timo; Pascucci, Valerio.

IEEE Trans Vis Comput Graph ; 27(10): 3938-3952, 2021 Oct.

Article in English | MEDLINE | ID: mdl-32746251

ABSTRACT

The trend of rapid technology scaling is expected to make the hardware of high-performance computing (HPC) systems more susceptible to computational errors due to random bit flips. Some bit flips may cause a program to crash or have a minimal effect on the output, but others may lead to silent data corruption (SDC), i.e., undetected yet significant output errors. Classical fault injection analysis methods employ uniform sampling of random bit flips during program execution to derive a statistical resiliency profile. However, summarizing such fault injection result with sufficient detail is difficult, and understanding the behavior of the fault-corrupted program is still a challenge. In this article, we introduce SpotSDC, a visualization system to facilitate the analysis of a program's resilience to SDC. SpotSDC provides multiple perspectives at various levels of detail of the impact on the output relative to where in the source code the flipped bit occurs, which bit is flipped, and when during the execution it happens. SpotSDC also enables users to study the code protection and provide new insights to understand the behavior of a fault-injected program. Based on lessons learned, we demonstrate how what we found can improve the fault injection campaign method.

18.

Vector Field Decompositions Using Multiscale Poisson Kernel.

Bhatia, Harsh; Kirby, Robert M; Pascucci, Valerio; Bremer, Peer-Timo.

IEEE Trans Vis Comput Graph ; 27(9): 3781-3793, 2021 Sep.

Article in English | MEDLINE | ID: mdl-32248111

ABSTRACT

Extraction of multiscale features using scale-space is one of the fundamental approaches to analyze scalar fields. However, similar techniques for vector fields are much less common, even though it is well known that, for example, turbulent flows contain cascades of nested vortices at different scales. The challenge is that the ideas related to scale-space are based upon iteratively smoothing the data to extract features at progressively larger scale, making it difficult to extract overlapping features. Instead, we consider spatial regions of influence in vector fields as scale, and introduce a new approach for the multiscale analysis of vector fields. Rather than smoothing the flow, we use the natural Helmholtz-Hodge decomposition to split it into small-scale and large-scale components using progressively larger neighborhoods. Our approach creates a natural separation of features by extracting local flow behavior, for example, a small vortex, from large-scale effects, for example, a background flow. We demonstrate our technique on large-scale, turbulent flows, and show multiscale features that cannot be extracted using state-of-the-art techniques.

19.

Designing accurate emulators for scientific processes using calibration-driven deep models.

Thiagarajan, Jayaraman J; Venkatesh, Bindya; Anirudh, Rushil; Bremer, Peer-Timo; Gaffney, Jim; Anderson, Gemma; Spears, Brian.

Nat Commun ; 11(1): 5622, 2020 Nov 06.

Article in English | MEDLINE | ID: mdl-33159053

ABSTRACT

Predictive models that accurately emulate complex scientific processes can achieve speed-ups over numerical simulators or experiments and at the same time provide surrogates for improving the subsequent analysis. Consequently, there is a recent surge in utilizing modern machine learning methods to build data-driven emulators. In this work, we study an often overlooked, yet important, problem of choosing loss functions while designing such emulators. Popular choices such as the mean squared error or the mean absolute error are based on a symmetric noise assumption and can be unsuitable for heterogeneous data or asymmetric noise distributions. We propose Learn-by-Calibrating, a novel deep learning approach based on interval calibration for designing emulators that can effectively recover the inherent noise structure without any explicit priors. Using a large suite of use-cases, we demonstrate the efficacy of our approach in providing high-quality emulators, when compared to widely-adopted loss function choices, even in small-data regimes.

20.

Capturing Biologically Complex Tissue-Specific Membranes at Different Levels of Compositional Complexity.

Ingólfsson, Helgi I; Bhatia, Harsh; Zeppelin, Talia; Bennett, W F Drew; Carpenter, Kristy A; Hsu, Pin-Chia; Dharuman, Gautham; Bremer, Peer-Timo; Schiøtt, Birgit; Lightstone, Felice C; Carpenter, Timothy S.

J Phys Chem B ; 124(36): 7819-7829, 2020 09 10.

Article in English | MEDLINE | ID: mdl-32790367

ABSTRACT

Plasma membranes (PMs) contain hundreds of different lipid species that contribute differently to overall bilayer properties. By modulation of these properties, membrane protein function can be affected. Furthermore, inhomogeneous lipid mixing and domains of lipid enrichment/depletion can sort proteins and provide optimal local environments. Recent coarse-grained (CG) Martini molecular dynamics efforts have provided glimpses into lipid organization of different PMs: an "Average" and a "Brain" PM. Their high complexity and large size require long simulations (â¼80 µs) for proper sampling. Thus, these simulations are computationally taxing. This level of complexity is beyond the possibilities of all-atom simulations, raising the question-what complexity is needed for "realistic" bilayer properties? We constructed CG Martini PM models of varying complexity (63 down to 8 different lipids). Lipid tail saturations and headgroup combinations were kept as consistent as possible for the "tissues'" (Average/Brain) at three levels of compositional complexity. For each system, we analyzed membrane properties to evaluate which features can be retained at lower complexity and validate eight-component bilayers that can act as reliable mimetics for Average or Brain PMs. Systems of reduced complexity deliver a more robust and malleable tool for computational membrane studies and allow for equivalent all-atom simulations and experiments.

Subject(s)

Lipid Bilayers , Molecular Dynamics Simulation , Cell Membrane , Membranes , Proteins

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL