Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 25
Filtrar
1.
Proc Natl Acad Sci U S A ; 119(16): e2020242119, 2022 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-35412902

RESUMEN

Assembly of biomolecules at solid­water interfaces requires molecules to traverse complex orientation-dependent energy landscapes through processes that are poorly understood, largely due to the dearth of in situ single-molecule measurements and statistical analyses of the rotational dynamics that define directional selection. Emerging capabilities in high-speed atomic force microscopy and machine learning have allowed us to directly determine the orientational energy landscape and observe and quantify the rotational dynamics for protein nanorods on the surface of muscovite mica under a variety of conditions. Comparisons with kinetic Monte Carlo simulations show that the transition rates between adjacent orientation-specific energetic minima can largely be understood through traditional models of in-plane Brownian rotation across a biased energy landscape, with resulting transition rates that are exponential in the energy barriers between states. However, transitions between more distant angular states are decoupled from barrier height, with jump-size distributions showing a power law decay that is characteristic of a nonclassical Levy-flight random walk, indicating that large jumps are enabled by alternative modes of motion via activated states. The findings provide insights into the dynamics of biomolecules at solid­liquid interfaces that lead to self-assembly, epitaxial matching, and other orientationally anisotropic outcomes and define a general procedure for exploring such dynamics with implications for hybrid biomolecular­inorganic materials design.


Asunto(s)
Nanotubos , Proteínas , Rotación , Silicatos de Aluminio/química , Difusión , Aprendizaje Automático , Microscopía de Fuerza Atómica , Método de Montecarlo , Nanotubos/química , Proteínas/química , Soluciones , Propiedades de Superficie
2.
Cell ; 133(2): 364-74, 2008 Apr 18.
Artículo en Inglés | MEDLINE | ID: mdl-18423206

RESUMEN

To fully understand animal transcription networks, it is essential to accurately measure the spatial and temporal expression patterns of transcription factors and their targets. We describe a registration technique that takes image-based data from hundreds of Drosophila blastoderm embryos, each costained for a reference gene and one of a set of genes of interest, and builds a model VirtualEmbryo. This model captures in a common framework the average expression patterns for many genes in spite of significant variation in morphology and expression between individual embryos. We establish the method's accuracy by showing that relationships between a pair of genes' expression inferred from the model are nearly identical to those measured in embryos costained for the pair. We present a VirtualEmbryo containing data for 95 genes at six time cohorts. We show that known gene-regulatory interactions can be automatically recovered from this data set and predict hundreds of new interactions.


Asunto(s)
Drosophila melanogaster/genética , Redes Reguladoras de Genes , Modelos Genéticos , Animales , Blastodermo , Drosophila melanogaster/metabolismo , Embrión no Mamífero/metabolismo , Regulación del Desarrollo de la Expresión Génica , Genes de Insecto
3.
Anal Chem ; 89(11): 5818-5823, 2017 06 06.
Artículo en Inglés | MEDLINE | ID: mdl-28467051

RESUMEN

Mass spectrometry imaging (MSI) has primarily been applied in localizing biomolecules within biological matrices. Although well-suited, the application of MSI for comparing thousands of spatially defined spotted samples has been limited. One reason for this is a lack of suitable and accessible data processing tools for the analysis of large arrayed MSI sample sets. The OpenMSI Arrayed Analysis Toolkit (OMAAT) is a software package that addresses the challenges of analyzing spatially defined samples in MSI data sets. OMAAT is written in Python and is integrated with OpenMSI ( http://openmsi.nersc.gov ), a platform for storing, sharing, and analyzing MSI data. By using a web-based python notebook (Jupyter), OMAAT is accessible to anyone without programming experience yet allows experienced users to leverage all features. OMAAT was evaluated by analyzing an MSI data set of a high-throughput glycoside hydrolase activity screen comprising 384 samples arrayed onto a NIMS surface at a 450 µm spacing, decreasing analysis time >100-fold while maintaining robust spot-finding. The utility of OMAAT was demonstrated for screening metabolic activities of different sized soil particles, including hydrolysis of sugars, revealing a pattern of size dependent activities. These results introduce OMAAT as an effective toolkit for analyzing spatially defined samples in MSI. OMAAT runs on all major operating systems, and the source code can be obtained from the following GitHub repository: https://github.com/biorack/omaat .


Asunto(s)
Análisis de Datos , Espectrometría de Masas/métodos , Programas Informáticos , Conjuntos de Datos como Asunto , Glicósido Hidrolasas , Tamaño de la Partícula , Suelo/química
4.
Anal Chem ; 87(9): 4658-66, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-25825055

RESUMEN

Mass spectrometry imaging enables label-free, high-resolution spatial mapping of the chemical composition of complex, biological samples. Typical experiments require selecting ions and/or positions from the images: ions for fragmentation studies to identify keystone compounds and positions for follow up validation measurements using microdissection or other orthogonal techniques. Unfortunately, with modern imaging machines, these must be selected from an overwhelming amount of raw data. Existing techniques to reduce the volume of data, the most popular of which are principle component analysis and non-negative matrix factorization, have the disadvantage that they return difficult-to-interpret linear combinations of actual data elements. In this work, we show that CX and CUR matrix decompositions can be used directly to address this selection need. CX and CUR matrix decompositions use empirical statistical leverage scores of the input data to provide provably good low-rank approximations of the measured data that are expressed in terms of actual ions and actual positions, as opposed to difficult-to-interpret eigenions and eigenpositions. We show that this leads to effective prioritization of information for both ions and positions. In particular, important ions can be found either by using the leverage scores as a ranking function and using a deterministic greedy selection algorithm or by using the leverage scores as an importance sampling distribution and using a random sampling algorithm; however, selection of important positions from the original matrix performed significantly better when they were chosen with the random sampling algorithm. Also, we show that 20 ions or 40 locations can be used to reconstruct the original matrix to a tolerance of 17% error for a widely studied image of brain lipids; and we provide a scalable implementation of this method that is applicable for analysis of the raw data where there are often more than a million rows and/or columns, which is larger than SVD-based low-rank approximation methods can handle. These results introduce the concept of CX/CUR matrix factorizations to mass spectrometry imaging, describing their utility and illustrating principled algorithmic approaches to deal with the overwhelming amount of data generated by modern mass spectrometry imaging.


Asunto(s)
Lípidos/análisis , Espectrometría de Masas , Algoritmos , Encéfalo , Humanos , Iones/análisis
5.
J Nat Prod ; 78(6): 1231-42, 2015 Jun 26.
Artículo en Inglés | MEDLINE | ID: mdl-25981198

RESUMEN

An integrated omics approach using genomics, transcriptomics, metabolomics (MALDI mass spectrometry imaging, MSI), and bioinformatics was employed to study spatiotemporal formation and deposition of health-protecting polymeric lignans and plant defense cyanogenic glucosides. Intact flax (Linum usitatissimum) capsules and seed tissues at different development stages were analyzed. Transcriptome analyses indicated distinct expression patterns of dirigent protein (DP) gene family members encoding (-)- and (+)-pinoresinol-forming DPs and their associated downstream metabolic processes, respectively, with the former expressed at early seed coat development stages. Genes encoding (+)-pinoresinol-forming DPs were, in contrast, expressed at later development stages. Recombinant DP expression and DP assays also unequivocally established their distinct stereoselective biochemical functions. Using MALDI MSI and ion mobility separation analyses, the pinoresinol downstream derivatives, secoisolariciresinol diglucoside (SDG) and SDG hydroxymethylglutaryl ester, were localized and detectable only in early seed coat development stages. SDG derivatives were then converted into higher molecular weight phenolics during seed coat maturation. By contrast, the plant defense cyanogenic glucosides, the monoglucosides linamarin/lotaustralin, were detected throughout the flax capsule, whereas diglucosides linustatin/neolinustatin only accumulated in endosperm and embryo tissues. A putative biosynthetic pathway to the cyanogens is proposed on the basis of transcriptome coexpression data. Localization of all metabolites was at ca. 20 µm resolution, with the web based tool OpenMSI enabling not only resolution enhancement but also an interactive system for real-time searching for any ion in the tissue under analysis.


Asunto(s)
Lino/química , Furanos/química , Glicósidos/química , Lignanos/química , Semillas/química , Butileno Glicoles/análisis , Lino/genética , Furanos/análisis , Glucósidos/análisis , Glicósidos/análisis , Lignanos/análisis , Estructura Molecular , Nitrilos/análisis , Espectrometría de Masa por Láser de Matriz Asistida de Ionización Desorción
6.
bioRxiv ; 2024 Jan 09.
Artículo en Inglés | MEDLINE | ID: mdl-38260593

RESUMEN

Understanding brain function necessitates linking neural activity with corresponding behavior. Structured behavioral experiments are crucial for probing the neural computations and dynamics underlying behavior; however, adequately representing their complex data is a significant challenge. Currently, a comprehensive data standard that fully encapsulates task-based experiments, integrating neural activity with the richness of behavioral context, is lacking. We designed a data model, as an extension to the NWB neurophysiology data standard, to represent structured behavioral neuroscience experiments, spanning stimulus delivery, timestamped events and responses, and simultaneous neural recordings. This data format is validated through its application to a variety of experimental designs, showcasing its potential to advance integrative analyses of neural circuits and complex behaviors. This work introduces a comprehensive data standard designed to capture and store a spectrum of behavioral data, encapsulating the multifaceted nature of modern neuroscience experiments.

7.
bioRxiv ; 2024 Apr 15.
Artículo en Inglés | MEDLINE | ID: mdl-38328074

RESUMEN

Scientific progress depends on reliable and reproducible results. Progress can also be accelerated when data are shared and re-analyzed to address new questions. Current approaches to storing and analyzing neural data typically involve bespoke formats and software that make replication, as well as the subsequent reuse of data, difficult if not impossible. To address these challenges, we created Spyglass, an open-source software framework that enables reproducible analyses and sharing of data and both intermediate and final results within and across labs. Spyglass uses the Neurodata Without Borders (NWB) standard and includes pipelines for several core analyses in neuroscience, including spectral filtering, spike sorting, pose tracking, and neural decoding. It can be easily extended to apply both existing and newly developed pipelines to datasets from multiple sources. We demonstrate these features in the context of a cross-laboratory replication by applying advanced state space decoding algorithms to publicly available data. New users can try out Spyglass on a Jupyter Hub hosted by HHMI and 2i2c: https://spyglass.hhmi.2i2c.cloud/.

8.
Anal Chem ; 85(21): 10354-61, 2013 Nov 05.
Artículo en Inglés | MEDLINE | ID: mdl-24087878

RESUMEN

Mass spectrometry imaging (MSI) enables researchers to directly probe endogenous molecules directly within the architecture of the biological matrix. Unfortunately, efficient access, management, and analysis of the data generated by MSI approaches remain major challenges to this rapidly developing field. Despite the availability of numerous dedicated file formats and software packages, it is a widely held viewpoint that the biggest challenge is simply opening, sharing, and analyzing a file without loss of information. Here we present OpenMSI, a software framework and platform that addresses these challenges via an advanced, high-performance, extensible file format and Web API for remote data access (http://openmsi.nersc.gov). The OpenMSI file format supports storage of raw MSI data, metadata, and derived analyses in a single, self-describing format based on HDF5 and is supported by a large range of analysis software (e.g., Matlab and R) and programming languages (e.g., C++, Fortran, and Python). Careful optimization of the storage layout of MSI data sets using chunking, compression, and data replication accelerates common, selective data access operations while minimizing data storage requirements and are critical enablers of rapid data I/O. The OpenMSI file format has shown to provide >2000-fold improvement for image access operations, enabling spectrum and image retrieval in less than 0.3 s across the Internet even for 50 GB MSI data sets. To make remote high-performance compute resources accessible for analysis and to facilitate data sharing and collaboration, we describe an easy-to-use yet powerful Web API, enabling fast and convenient access to MSI data, metadata, and derived analysis results stored remotely to facilitate high-performance data analysis and enable implementation of Web based data sharing, visualization, and analysis.


Asunto(s)
Internet , Espectrometría de Masas/métodos , Programas Informáticos
9.
IEEE Trans Vis Comput Graph ; 28(10): 3471-3485, 2022 10.
Artículo en Inglés | MEDLINE | ID: mdl-33684039

RESUMEN

Contour trees are used for topological data analysis in scientific visualization. While originally computed with serial algorithms, recent work has introduced a vector-parallel algorithm. However, this algorithm is relatively slow for fully augmented contour trees which are needed for many practical data analysis tasks. We therefore introduce a representation called the hyperstructure that enables efficient searches through the contour tree and use it to construct a fully augmented contour tree in data parallel, with performance on average 6 times faster than the state-of-the-art parallel algorithm in the TTK topological toolkit.


Asunto(s)
Gráficos por Computador , Algoritmos
10.
Elife ; 112022 10 04.
Artículo en Inglés | MEDLINE | ID: mdl-36193886

RESUMEN

The neurophysiology of cells and tissues are monitored electrophysiologically and optically in diverse experiments and species, ranging from flies to humans. Understanding the brain requires integration of data across this diversity, and thus these data must be findable, accessible, interoperable, and reusable (FAIR). This requires a standard language for data and metadata that can coevolve with neuroscience. We describe design and implementation principles for a language for neurophysiology data. Our open-source software (Neurodata Without Borders, NWB) defines and modularizes the interdependent, yet separable, components of a data language. We demonstrate NWB's impact through unified description of neurophysiology data across diverse modalities and species. NWB exists in an ecosystem, which includes data management, analysis, visualization, and archive tools. Thus, the NWB data language enables reproduction, interchange, and reuse of diverse neurophysiology data. More broadly, the design principles of NWB are generally applicable to enhance discovery across biology through data FAIRness.


The brain is an immensely complex organ which regulates many of the behaviors that animals need to survive. To understand how the brain works, scientists monitor and record brain activity under different conditions using a variety of experimental techniques. These neurophysiological studies are often conducted on multiple types of cells in the brain as well as a variety of species, ranging from mice to flies, or even frogs and worms. Such a range of approaches provides us with highly informative, complementary 'views' of the brain. However, to form a complete, coherent picture of how the brain works, scientists need to be able to integrate all the data from these different experiments. For this to happen effectively, neurophysiology data need to meet certain criteria: namely, they must be findable, accessible, interoperable, and re-usable (or FAIR for short). However, the sheer diversity of neurophysiology experiments impedes the 'FAIR'-ness of the information obtained from them. To overcome this problem, researchers need a standardized way to communicate their experiments and share their results ­ in other words, a 'standard language' to describe neurophysiology data. Rübel, Tritt, Ly, Dichter, Ghosh et al. therefore set out to create such a language that was not only FAIR, but could also co-evolve with neurophysiology research. First, they produced a computer software program (called Neurodata Without Borders, or NWB for short) which generated and defined the different components of the new standard language. Then, other tools for data management were created to expand the NWB platform using the standardized language. This included data analysis and visualization methods, as well as an 'archive' to store and access data. Testing the new language and associated tools showed that they indeed allowed researchers to access, analyze, and share information from many different types of experiments, in organisms ranging from flies to humans. The NWB software is open-source, meaning that anyone can obtain a copy and make changes to it. Thus, NWB and its associated resources provide the basis for a collaborative, community-based system for sharing neurophysiology data. Rübel et al. hope that NWB will inspire similar developments across other fields of biology that share similar levels of complexity with neurophysiology.


Asunto(s)
Ciencia de los Datos , Ecosistema , Humanos , Metadatos , Neurofisiología , Programas Informáticos
11.
IEEE Trans Vis Comput Graph ; 27(4): 2437-2454, 2021 04.
Artículo en Inglés | MEDLINE | ID: mdl-31689193

RESUMEN

As data sets grow to exascale, automated data analysis and visualization are increasingly important, to intermediate human understanding and to reduce demands on disk storage via in situ analysis. Trends in architecture of high performance computing systems necessitate analysis algorithms to make effective use of combinations of massively multicore and distributed systems. One of the principal analytic tools is the contour tree, which analyses relationships between contours to identify features of more than local importance. Unfortunately, the predominant algorithms for computing the contour tree are explicitly serial, and founded on serial metaphors, which has limited the scalability of this form of analysis. While there is some work on distributed contour tree computation, and separately on hybrid GPU-CPU computation, there is no efficient algorithm with strong formal guarantees on performance allied with fast practical performance. We report the first shared SMP algorithm for fully parallel contour tree computation, with formal guarantees of O(lg V lg t) parallel steps and O(V lg V) work for data with V samples and t contour tree supernodes, and implementations with more than 30× parallel speed up on both CPU using TBB and GPU using Thrust and up 70× speed up compared to the serial sweep and merge algorithm.

12.
Proc IPDPS (Conf) ; 2020: 906-915, 2020 May.
Artículo en Inglés | MEDLINE | ID: mdl-34632467

RESUMEN

Many applications are increasingly becoming I/O-bound. To improve scalability, analytical models of parallel I/O performance are often consulted to determine possible I/O optimizations. However, I/O performance modeling has predominantly focused on applications that directly issue I/O requests to a parallel file system or a local storage device. These I/O models are not directly usable by applications that access data through standardized I/O libraries, such as HDF5, FITS, and NetCDF, because a single I/O request to an object can trigger a cascade of I/O operations to different storage blocks. The I/O performance characteristics of applications that rely on these libraries is a complex function of the underlying data storage model, user-configurable parameters and object-level access patterns. As a consequence, I/O optimization is predominantly an ad-hoc process that is performed by application developers, who are often domain scientists with limited desire to delve into nuances of the storage hierarchy of modern computers. This paper presents an analytical cost model to predict the end-to-end execution time of applications that perform I/O through established array management libraries. The paper focuses on the HDF5 and Zarr array libraries, as examples of I/O libraries with radically different storage models: HDF5 stores every object in one file, while Zarr creates multiple files to store different objects. We find that accessing array objects via these I/O libraries introduces new overheads and optimizations. Specifically, in addition to I/O time, it is crucial to model the cost of transforming data to a particular storage layout (memory copy cost), as well as model the benefit of accessing a software cache. We evaluate the model on real applications that process observations (neuroscience) and simulation results (plasma physics). The evaluation on three HPC clusters reveals that I/O accounts for as little as 10% of the execution time in some cases, and hence models that only focus on I/O performance cannot accurately capture the performance of applications that use standard array storage libraries. In parallel experiments, our model correctly predicts the fastest storage library between HDF5 and Zarr 94% of the time, in contrast with 70% of the time for a cutting-edge I/O model.

13.
Proc IEEE Int Conf Big Data ; 2019: 165-179, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-34632466

RESUMEN

A ubiquitous problem in aggregating data across different experimental and observational data sources is a lack of software infrastructure that enables flexible and extensible standardization of data and metadata. To address this challenge, we developed HDMF, a hierarchical data modeling framework for modern science data standards. With HDMF, we separate the process of data standardization into three main components: (1) data modeling and specification, (2) data I/O and storage, and (3) data interaction and data APIs. To enable standards to support the complex requirements and varying use cases throughout the data life cycle, HDMF provides object mapping infrastructure to insulate and integrate these various components. This approach supports the flexible development of data standards and extensions, optimized storage backends, and data APIs, while allowing the other components of the data standards ecosystem to remain stable. To meet the demands of modern, large-scale science data, HDMF provides advanced data I/O functionality for iterative data write, lazy data load, and parallel I/O. It also supports optimization of data storage via support for chunking, compression, linking, and modular data storage. We demonstrate the application of HDMF in practice to design NWB 2.0 [13], a modern data standard for collaborative science across the neurophysiology community.

14.
ACS Chem Biol ; 14(4): 704-714, 2019 04 19.
Artículo en Inglés | MEDLINE | ID: mdl-30896917

RESUMEN

Metabolomics is a widely used technology for obtaining direct measures of metabolic activities from diverse biological systems. However, ambiguous metabolite identifications are a common challenge and biochemical interpretation is often limited by incomplete and inaccurate genome-based predictions of enzyme activities (that is, gene annotations). Metabolite Annotation and Gene Integration (MAGI) generates a metabolite-gene association score using a biochemical reaction network. This is calculated by a method that emphasizes consensus between metabolites and genes via biochemical reactions. To demonstrate the potential of this method, we applied MAGI to integrate sequence data and metabolomics data collected from Streptomyces coelicolor A3(2), an extensively characterized bacterium that produces diverse secondary metabolites. Our findings suggest that coupling metabolomics and genomics data by scoring consensus between the two increases the quality of both metabolite identifications and gene annotations in this organism. MAGI also made biochemical predictions for poorly annotated genes that were consistent with the extensive literature on this important organism. This limited analysis suggests that using metabolomics data has the potential to improve annotations in sequenced organisms and also provides testable hypotheses for specific biochemical functions. MAGI is freely available for academic use both as an online tool at https://magi.nersc.gov and with source code available at https://github.com/biorack/magi .


Asunto(s)
Proteínas Bacterianas/genética , Proteínas Bacterianas/metabolismo , Metabolómica , Streptomyces coelicolor , Bases de Datos Genéticas , Genoma Bacteriano , Genómica , Anotación de Secuencia Molecular , Streptomyces coelicolor/genética , Streptomyces coelicolor/metabolismo
15.
IEEE Trans Vis Comput Graph ; 24(1): 1025-1035, 2018 01.
Artículo en Inglés | MEDLINE | ID: mdl-28866551

RESUMEN

Mass spectrometry imaging (MSI) is a transformative imaging method that supports the untargeted, quantitative measurement of the chemical composition and spatial heterogeneity of complex samples with broad applications in life sciences, bioenergy, and health. While MSI data can be routinely collected, its broad application is currently limited by the lack of easily accessible analysis methods that can process data of the size, volume, diversity, and complexity generated by MSI experiments. The development and application of cutting-edge analytical methods is a core driver in MSI research for new scientific discoveries, medical diagnostics, and commercial-innovation. However, the lack of means to share, apply, and reproduce analyses hinders the broad application, validation, and use of novel MSI analysis methods. To address this central challenge, we introduce the Berkeley Analysis and Storage Toolkit (BASTet), a novel framework for shareable and reproducible data analysis that supports standardized data and analysis interfaces, integrated data storage, data provenance, workflow management, and a broad set of integrated tools. Based on BASTet, we describe the extension of the OpenMSI mass spectrometry imaging science gateway to enable web-based sharing, reuse, analysis, and visualization of data analyses and derived data products. We demonstrate the application of BASTet and OpenMSI in practice to identify and compare characteristic substructures in the mouse brain based on their chemical composition measured via MSI.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos Factuales , Espectrometría de Masas/métodos , Imagen Molecular/métodos , Interfaz Usuario-Computador , Difusión de la Información , Espectrometría de Masas/normas , Imagen Molecular/normas
17.
Front Neuroinform ; 10: 48, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27867355

RESUMEN

Neuroscience continues to experience a tremendous growth in data; in terms of the volume and variety of data, the velocity at which data is acquired, and in turn the veracity of data. These challenges are a serious impediment to sharing of data, analyses, and tools within and across labs. Here, we introduce BRAINformat, a novel data standardization framework for the design and management of scientific data formats. The BRAINformat library defines application-independent design concepts and modules that together create a general framework for standardization of scientific data. We describe the formal specification of scientific data standards, which facilitates sharing and verification of data and formats. We introduce the concept of Managed Objects, enabling semantic components of data formats to be specified as self-contained units, supporting modular and reusable design of data format components and file storage. We also introduce the novel concept of Relationship Attributes for modeling and use of semantic relationships between data objects. Based on these concepts we demonstrate the application of our framework to design and implement a standard format for electrophysiology data and show how data standardization and relationship-modeling facilitate data analysis and sharing. The format uses HDF5, enabling portable, scalable, and self-describing data storage and integration with modern high-performance computing for data-driven discovery. The BRAINformat library is open source, easy-to-use, and provides detailed user and developer documentation and is freely available at: https://bitbucket.org/oruebel/brainformat.

18.
IEEE Comput Graph Appl ; 36(3): 22-35, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-28113157

RESUMEN

The generation of short pulses of ion beams through the interaction of an intense laser with a plasma sheath offers the possibility of compact and cheaper ion sources for many applications--from fast ignition and radiography of dense targets to hadron therapy and injection into conventional accelerators. To enable the efficient analysis of large-scale, high-fidelity particle accelerator simulations using the Warp simulation suite, the authors introduce the Warp In situ Visualization Toolkit (WarpIV). WarpIV integrates state-of-the-art in situ visualization and analysis using VisIt with Warp, supports management and control of complex in situ visualization and analysis workflows, and implements integrated analytics to facilitate query- and feature-based data analytics and efficient large-scale data analysis. WarpIV enables for the first time distributed parallel, in situ visualization of the full simulation data using high-performance compute resources as the data is being generated by Warp. The authors describe the application of WarpIV to study and compare large 2D and 3D ion accelerator simulations, demonstrating significant differences in the acceleration process in 2D and 3D simulations. WarpIV is available to the public via https://bitbucket.org/berkeleylab/warpiv. The Warp In situ Visualization Toolkit (WarpIV) supports large-scale, parallel, in situ visualization and analysis and facilitates query- and feature-based analytics, enabling for the first time high-performance analysis of large-scale, high-fidelity particle accelerator simulations while the data is being generated by the Warp simulation suite. This supplemental material https://extras.computer.org/extra/mcg2016030022s1.pdf provides more details regarding the memory profiling and optimization and the Yee grid recentering optimization results discussed in the main article.

19.
J Mol Graph Model ; 53: 59-71, 2014 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-25068440

RESUMEN

Molecular dynamics (MD) simulation is a crucial tool for understanding principles behind important biochemical processes such as protein folding and molecular interaction. With the rapidly increasing power of modern computers, large-scale MD simulation experiments can be performed regularly, generating huge amounts of MD data. An important question is how to analyze and interpret such massive and complex data. One of the (many) challenges involved in analyzing MD simulation data computationally is the high-dimensionality of such data. Given a massive collection of molecular conformations, researchers typically need to rely on their expertise and prior domain knowledge in order to retrieve certain conformations of interest. It is not easy to make and test hypotheses as the data set as a whole is somewhat "invisible" due to its high dimensionality. In other words, it is hard to directly access and examine individual conformations from a sea of molecular structures, and to further explore the entire data set. There is also no easy and convenient way to obtain a global view of the data or its various modalities of biochemical information. To this end, we present an interactive, collaborative visual analytics tool for exploring massive, high-dimensional molecular dynamics simulation data sets. The most important utility of our tool is to provide a platform where researchers can easily and effectively navigate through the otherwise "invisible" simulation data sets, exploring and examining molecular conformations both as a whole and at individual levels. The visualization is based on the concept of a topological landscape, which is a 2D terrain metaphor preserving certain topological and geometric properties of the high dimensional protein energy landscape. In addition to facilitating easy exploration of conformations, this 2D terrain metaphor also provides a platform where researchers can visualize and analyze various properties (such as contact density) overlayed on the top of the 2D terrain. Finally, the software provides a collaborative environment where multiple researchers can assemble observations and biochemical events into storyboards and share them in real time over the Internet via a client-server architecture. The software is written in Scala and runs on the cross-platform Java Virtual Machine. Binaries and source code are available at http://www.aylasoftware.org and have been released under the GNU General Public License.


Asunto(s)
Simulación de Dinámica Molecular , Programas Informáticos , Gráficos por Computador , Anotación de Secuencia Molecular , Conformación Proteica , Pliegue de Proteína , Proteínas/química
20.
IEEE Trans Vis Comput Graph ; 20(2): 196-210, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24356363

RESUMEN

Plasma-based particle accelerators can produce and sustain thousands of times stronger acceleration fields than conventional particle accelerators, providing a potential solution to the problem of the growing size and cost of conventional particle accelerators. To facilitate scientific knowledge discovery from the ever growing collections of accelerator simulation data generated by accelerator physicists to investigate next-generation plasma-based particle accelerator designs, we describe a novel approach for automatic detection and classification of particle beams and beam substructures due to temporal differences in the acceleration process, here called acceleration features. The automatic feature detection in combination with a novel visualization tool for fast, intuitive, query-based exploration of acceleration features enables an effective top-down data exploration process, starting from a high-level, feature-based view down to the level of individual particles. We describe the application of our analysis in practice to analyze simulations of single pulse and dual and triple colliding pulse accelerator designs, and to study the formation and evolution of particle beams, to compare substructures of a beam, and to investigate transverse particle loss.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA