Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 20
Filter
1.
PLoS Comput Biol ; 16(11): e1008316, 2020 11.
Article in English | MEDLINE | ID: mdl-33170857

ABSTRACT

Computational science has been greatly improved by the use of containers for packaging software and data dependencies. In a scholarly context, the main drivers for using these containers are transparency and support of reproducibility; in turn, a workflow's reproducibility can be greatly affected by the choices that are made with respect to building containers. In many cases, the build process for the container's image is created from instructions provided in a Dockerfile format. In support of this approach, we present a set of rules to help researchers write understandable Dockerfiles for typical data science workflows. By following the rules in this article, researchers can create containers suitable for sharing with fellow scientists, for including in scholarly communication such as education or scientific papers, and for effective and sustainable personal workflows.


Subject(s)
Data Science , Guidelines as Topic , Programming Languages , Software , Algorithms , Reproducibility of Results
2.
Prostate ; 77(3): 291-298, 2017 02.
Article in English | MEDLINE | ID: mdl-27775165

ABSTRACT

BACKGROUND: Prostate cancer prognosis is variable, and management decisions involve balancing patients' risks of recurrence and recurrence-free death. Moreover, the roles of body mass index (BMI) and race in risk of recurrence are controversial [1,2]. To address these issues, we developed and cross-validated RAPS (Risks After Prostate Surgery), a personal prediction model for biochemical recurrence (BCR) within 10 years of radical prostatectomy (RP) that includes BMI and race as possible predictors, and recurrence-free death as a competing risk. METHODS: RAPS uses a patient's risk factors at surgery to assign him a recurrence probability based on statistical learning methods applied to a cohort of 1,276 patients undergoing RP at the University of Pennsylvania. We compared the performance of RAPS to that of an existing model with respect to calibration (by comparing observed and predicted outcomes), and discrimination (using the area under the receiver operating characteristic curve (AUC)). RESULTS: RAPS' cross-validated BCR predictions provided better calibration than those of an existing model that underestimated patients' risks. Discrimination was similar for the two models, with BCR AUCs of 0.793, 95% confidence interval (0.766-0.820) for RAPS, and 0.780 (0.745-0.815) for the existing model. RAPS' most important BCR predictors were tumor grade, preoperative prostate-specific antigen (PSA) level and BMI; race was less important [3]. RAPS' predictions can be obtained online at https://predict.shinyapps.io/raps. CONCLUSION: RAPS' cross-validated BCR predictions were better calibrated than those of an existing model, and BMI information contributed substantially to these predictions. RAPS predictions for recurrence-free death were limited by lack of co-morbidity data; however the model provides a simple framework for extension to include such data. Its use and extension should facilitate decision strategies for post-RP prostate cancer management. Prostate 77:291-298, 2017. © 2016 Wiley Periodicals, Inc.


Subject(s)
Neoplasm Recurrence, Local/diagnosis , Prostatectomy/trends , Prostatic Neoplasms/diagnosis , Prostatic Neoplasms/surgery , Aged , Cohort Studies , Humans , Longitudinal Studies , Male , Middle Aged , Neoplasm Recurrence, Local/blood , Predictive Value of Tests , Prostate-Specific Antigen/blood , Prostatic Neoplasms/blood , ROC Curve
3.
Neuroimage ; 124(Pt B): 1242-1244, 2016 Jan 01.
Article in English | MEDLINE | ID: mdl-25869863

ABSTRACT

NeuroVault.org is dedicated to storing outputs of analyses in the form of statistical maps, parcellations and atlases, a unique strategy that contrasts with most neuroimaging repositories that store raw acquisition data or stereotaxic coordinates. Such maps are indispensable for performing meta-analyses, validating novel methodology, and deciding on precise outlines for regions of interest (ROIs). NeuroVault is open to maps derived from both healthy and clinical populations, as well as from various imaging modalities (sMRI, fMRI, EEG, MEG, PET, etc.). The repository uses modern web technologies such as interactive web-based visualization, cognitive decoding, and comparison with other maps to provide researchers with efficient, intuitive tools to improve the understanding of their results. Each dataset and map is assigned a permanent Universal Resource Locator (URL), and all of the data is accessible through a REST Application Programming Interface (API). Additionally, the repository supports the NIDM-Results standard and has the ability to parse outputs from popular FSL and SPM software packages to automatically extract relevant metadata. This ease of use, modern web-integration, and pioneering functionality holds promise to improve the workflow for making inferences about and sharing whole-brain statistical maps.


Subject(s)
Brain Mapping/statistics & numerical data , Databases, Factual , Information Dissemination , Access to Information , Humans , Image Processing, Computer-Assisted , Magnetic Resonance Imaging , Neuroimaging
4.
F1000Res ; 13: 203, 2024.
Article in English | MEDLINE | ID: mdl-38868668

ABSTRACT

Converged computing is an emerging area of computing that brings together the best of both worlds for high performance computing (HPC) and cloud-native communities. The economic influence of cloud computing and the need for workflow portability, flexibility, and manageability are driving this emergence. Navigating the uncharted territory and building an effective space for both HPC and cloud require collaborative technological development and research. In this work, we focus on developing components for the converged workload manager, the central component of batch workflows running in any environment. From the cloud we base our work on Kubernetes, the de facto standard batch workload orchestrator. From HPC the orchestrator counterpart is Flux Framework, a fully hierarchical resource management and graph-based scheduler with a modular architecture that supports sophisticated scheduling and job management. Bringing these managers together consists of implementing Flux inside of Kubernetes, enabling hierarchical resource management and scheduling that scales without burdening the Kubernetes scheduler. This paper introduces the Flux Operator - an on-demand HPC workload manager deployed in Kubernetes. Our work describes design decisions, mapping components between environments, and experimental features. We perform experiments that compare application performance when deployed by the Flux Operator and the MPI Operator and present the results. Finally, we review remaining challenges and describe our vision of the future for improved technological innovation and collaboration through converged computing.


Subject(s)
Cloud Computing , Workload , Workflow
5.
Front Immunol ; 15: 1378512, 2024.
Article in English | MEDLINE | ID: mdl-38629078

ABSTRACT

Python for Population Genomics (PyPop) is a software package that processes genotype and allele data and performs large-scale population genetic analyses on highly polymorphic multi-locus genotype data. In particular, PyPop tests data conformity to Hardy-Weinberg equilibrium expectations, performs Ewens-Watterson tests for selection, estimates haplotype frequencies, measures linkage disequilibrium, and tests significance. Standardized means of performing these tests is key for contemporary studies of evolutionary biology and population genetics, and these tests are central to genetic studies of disease association as well. Here, we present PyPop 1.0.0, a new major release of the package, which implements new features using the more robust infrastructure of GitHub, and is distributed via the industry-standard Python Package Index. New features include implementation of the asymmetric linkage disequilibrium measures and, of particular interest to the immunogenetics research communities, support for modern nomenclature, including colon-delimited allele names, and improvements to meta-analysis features for aggregating outputs for multiple populations. Code available at: https://zenodo.org/records/10080668 and https://github.com/alexlancaster/pypop.


Subject(s)
Metagenomics , Software , Genetics, Population , Genotype , Haplotypes , Meta-Analysis as Topic
6.
Front Neurosci ; 17: 1233416, 2023.
Article in English | MEDLINE | ID: mdl-37694123

ABSTRACT

With the advent of multivariate pattern analysis (MVPA) as an important analytic approach to fMRI, new insights into the functional organization of the brain have emerged. Several software packages have been developed to perform MVPA analysis, but deploying them comes with the cost of adjusting data to individual idiosyncrasies associated with each package. Here we describe PyMVPA BIDS-App, a fast and robust pipeline based on the data organization of the BIDS standard that performs multivariate analyses using powerful functionality of PyMVPA. The app runs flexibly with blocked and event-related fMRI experimental designs, is capable of performing classification as well as representational similarity analysis, and works both within regions of interest or on the whole brain through searchlights. In addition, the app accepts as input both volumetric and surface-based data. Inspections into the intermediate stages of the analyses are available and the readability of final results are facilitated through visualizations. The PyMVPA BIDS-App is designed to be accessible to novice users, while also offering more control to experts through command-line arguments in a highly reproducible environment.

7.
Sci Adv ; 9(21): eadg5702, 2023 05 26.
Article in English | MEDLINE | ID: mdl-37235661

ABSTRACT

Genome-wide phenotypic screens in the budding yeast Saccharomyces cerevisiae, enabled by its knockout collection, have produced the largest, richest, and most systematic phenotypic description of any organism. However, integrative analyses of this rich data source have been virtually impossible because of the lack of a central data repository and consistent metadata annotations. Here, we describe the aggregation, harmonization, and analysis of ~14,500 yeast knockout screens, which we call Yeast Phenome. Using this unique dataset, we characterized two unknown genes (YHR045W and YGL117W) and showed that tryptophan starvation is a by-product of many chemical treatments. Furthermore, we uncovered an exponential relationship between phenotypic similarity and intergenic distance, which suggests that gene positions in both yeast and human genomes are optimized for function.


Subject(s)
Saccharomyces cerevisiae , Humans , Saccharomyces cerevisiae/genetics
8.
F1000Res ; 10: 33, 2021.
Article in English | MEDLINE | ID: mdl-34035898

ABSTRACT

Data analysis often entails a multitude of heterogeneous steps, from the application of various command line tools to the usage of scripting languages like R or Python for the generation of plots and tables. It is widely recognized that data analyses should ideally be conducted in a reproducible way. Reproducibility enables technical validation and regeneration of results on the original or even new data. However, reproducibility alone is by no means sufficient to deliver an analysis that is of lasting impact (i.e., sustainable) for the field, or even just one research group. We postulate that it is equally important to ensure adaptability and transparency. The former describes the ability to modify the analysis to answer extended or slightly different research questions. The latter describes the ability to understand the analysis in order to judge whether it is not only technically, but methodologically valid. Here, we analyze the properties needed for a data analysis to become reproducible, adaptable, and transparent. We show how the popular workflow management system Snakemake can be used to guarantee this, and how it enables an ergonomic, combined, unified representation of all steps involved in data analysis, ranging from raw data processing, to quality control and fine-grained, interactive exploration and plotting of final results.


Subject(s)
Data Analysis , Software , Reproducibility of Results , Workflow
9.
Gigascience ; 7(5)2018 05 01.
Article in English | MEDLINE | ID: mdl-29718213

ABSTRACT

Background: Here, we present the Scientific Filesystem (SCIF), an organizational format that supports exposure of executables and metadata for discoverability of scientific applications. The format includes a known filesystem structure, a definition for a set of environment variables describing it, and functions for generation of the variables and interaction with the libraries, metadata, and executables located within. SCIF makes it easy to expose metadata, multiple environments, installation steps, files, and entry points to render scientific applications consistent, modular, and discoverable. A SCIF can be installed on a traditional host or in a container technology such as Docker or Singularity. We start by reviewing the background and rationale for the SCIF, followed by an overview of the specification and the different levels of internal modules ("apps") that the organizational format affords. Finally, we demonstrate that SCIF is useful by implementing and discussing several use cases that improve user interaction and understanding of scientific applications. SCIF is released along with a client and integration in the Singularity 2.4 software to quickly install and interact with SCIF. When used inside of a reproducible container, a SCIF is a recipe for reproducibility and introspection of the functions and users that it serves. Results: We use SCIF to evaluate container software, provide metrics, serve scientific workflows, and execute a primary function under different contexts. To encourage collaboration and sharing of applications, we developed tools along with an open source, version-controlled, tested, and programmatically accessible web infrastructure. SCIF and associated resources are available at https://sci-f.github.io. The ease of using SCIF, especially in the context of containers, offers promise for scientists' work to be self-documenting and programatically parseable for maximum reproducibility. SCIF opens up an abstraction from underlying programming languages and packaging logic to work with scientific applications, opening up new opportunities for scientific software development.


Subject(s)
Information Storage and Retrieval , Science , Software , Metadata , Programming Languages , Workflow
10.
PLoS One ; 12(11): e0188511, 2017.
Article in English | MEDLINE | ID: mdl-29186161

ABSTRACT

Here we present Singularity Hub, a framework to build and deploy Singularity containers for mobility of compute, and the singularity-python software with novel metrics for assessing reproducibility of such containers. Singularity containers make it possible for scientists and developers to package reproducible software, and Singularity Hub adds automation to this workflow by building, capturing metadata for, visualizing, and serving containers programmatically. Our novel metrics, based on custom filters of content hashes of container contents, allow for comparison of an entire container, including operating system, custom software, and metadata. First we will review Singularity Hub's primary use cases and how the infrastructure has been designed to support modern, common workflows. Next, we conduct three analyses to demonstrate build consistency, reproducibility metric and performance and interpretability, and potential for discovery. This is the first effort to demonstrate a rigorous assessment of measurable similarity between containers and operating systems. We provide these capabilities within Singularity Hub, as well as the source software singularity-python that provides the underlying functionality. Singularity Hub is available at https://singularity-hub.org, and we are excited to provide it as an openly available platform for building, and deploying scientific containers.


Subject(s)
Computers , Software , Reproducibility of Results
11.
PLoS One ; 12(5): e0177459, 2017.
Article in English | MEDLINE | ID: mdl-28494014

ABSTRACT

Here we present Singularity, software developed to bring containers and reproducibility to scientific computing. Using Singularity containers, developers can work in reproducible environments of their choosing and design, and these complete environments can easily be copied and executed on other platforms. Singularity is an open source initiative that harnesses the expertise of system and software engineers and researchers alike, and integrates seamlessly into common workflows for both of these groups. As its primary use case, Singularity brings mobility of computing to both users and HPC centers, providing a secure means to capture and distribute software and compute environments. This ability to create and deploy reproducible environments across these centers, a previously unmet need, makes Singularity a game changing development for computational science.


Subject(s)
Software , Computers , User-Computer Interface
12.
Front Psychol ; 7: 610, 2016.
Article in English | MEDLINE | ID: mdl-27199843

ABSTRACT

The administration of behavioral and experimental paradigms for psychology research is hindered by lack of a coordinated effort to develop and deploy standardized paradigms. While several frameworks (Mason and Suri, 2011; McDonnell et al., 2012; de Leeuw, 2015; Lange et al., 2015) have provided infrastructure and methods for individual research groups to develop paradigms, missing is a coordinated effort to develop paradigms linked with a system to easily deploy them. This disorganization leads to redundancy in development, divergent implementations of conceptually identical tasks, disorganized and error-prone code lacking documentation, and difficulty in replication. The ongoing reproducibility crisis in psychology and neuroscience research (Baker, 2015; Open Science Collaboration, 2015) highlights the urgency of this challenge: reproducible research in behavioral psychology is conditional on deployment of equivalent experiments. A large, accessible repository of experiments for researchers to develop collaboratively is most efficiently accomplished through an open source framework. Here we present the Experiment Factory, an open source framework for the development and deployment of web-based experiments. The modular infrastructure includes experiments, virtual machines for local or cloud deployment, and an application to drive these components and provide developers with functions and tools for further extension. We release this infrastructure with a deployment (http://www.expfactory.org) that researchers are currently using to run a set of over 80 standardized web-based experiments on Amazon Mechanical Turk. By providing open source tools for both deployment and development, this novel infrastructure holds promise to bring reproducibility to the administration of experiments, and accelerate scientific progress by providing a shared community resource of psychological paradigms.

13.
Sci Data ; 3: 160102, 2016 12 06.
Article in English | MEDLINE | ID: mdl-27922621

ABSTRACT

Only a tiny fraction of the data and metadata produced by an fMRI study is finally conveyed to the community. This lack of transparency not only hinders the reproducibility of neuroimaging results but also impairs future meta-analyses. In this work we introduce NIDM-Results, a format specification providing a machine-readable description of neuroimaging statistical results along with key image data summarising the experiment. NIDM-Results provides a unified representation of mass univariate analyses including a level of detail consistent with available best practices. This standardized representation allows authors to relay methods and results in a platform-independent regularized format that is not tied to a particular neuroimaging software package. Tools are available to export NIDM-Result graphs and associated files from the widely used SPM and FSL software packages, and the NeuroVault repository can import NIDM-Results archives. The specification is publically available at: http://nidm.nidash.org/specs/nidm-results.html.


Subject(s)
Brain Mapping/statistics & numerical data , Brain/physiology , Information Dissemination/methods , Magnetic Resonance Imaging/statistics & numerical data , Data Interpretation, Statistical , Humans , Information Storage and Retrieval , Linear Models , Meta-Analysis as Topic , Reproducibility of Results
14.
Sci Data ; 3: 160044, 2016 Jun 21.
Article in English | MEDLINE | ID: mdl-27326542

ABSTRACT

The development of magnetic resonance imaging (MRI) techniques has defined modern neuroimaging. Since its inception, tens of thousands of studies using techniques such as functional MRI and diffusion weighted imaging have allowed for the non-invasive study of the brain. Despite the fact that MRI is routinely used to obtain data for neuroscience research, there has been no widely adopted standard for organizing and describing the data collected in an imaging experiment. This renders sharing and reusing data (within or between labs) difficult if not impossible and unnecessarily complicates the application of automatic pipelines and quality assurance protocols. To solve this problem, we have developed the Brain Imaging Data Structure (BIDS), a standard for organizing and describing MRI datasets. The BIDS standard uses file formats compatible with existing software, unifies the majority of practices already common in the field, and captures the metadata necessary for most common data processing operations.


Subject(s)
Datasets as Topic , Magnetic Resonance Imaging , Neuroimaging , Data Collection/methods , Data Collection/standards , Datasets as Topic/standards , Humans
15.
Front Neuroinform ; 9: 6, 2015.
Article in English | MEDLINE | ID: mdl-25859214

ABSTRACT

Targeted collaboration is becoming more challenging with the ever-increasing number of publications, conferences, and academic responsibilities that the modern-day researcher must synthesize. Specifically, the field of neuroimaging had roughly 10,000 new papers in PubMed for the year 2013, presenting tens of thousands of international authors, each a potential collaborator working on some sub-domain in the field. To remove the burden of synthesizing an entire corpus of publications, talks, and conference interactions to find and assess collaborations, we combine meta-analytical neuroimaging informatics methods with machine learning and network analysis toward this goal. We present "AuthorSynth," a novel application prototype that includes (1) a collaboration network to identify researchers with similar results reported in the literature; and (2) a 2D plot-"brain lattice"-to visually summarize a single author's contribution to the field, and allow for searching of authors based on behavioral terms. This method capitalizes on intelligent synthesis of the neuroimaging literature, and demonstrates that data-driven approaches can be used to confirm existing collaborations, reveal potential ones, and identify gaps in published knowledge. We believe this tool exemplifies how methods from neuroimaging informatics can better inform researchers about progress and knowledge in the field, and enhance the modern workflow of finding collaborations.

16.
AMIA Annu Symp Proc ; 2015: 2073-82, 2015.
Article in English | MEDLINE | ID: mdl-26958307

ABSTRACT

The task of mapping neurological disorders in the human brain must be informed by multiple measurements of an individual's phenotype - neuroimaging, genomics, and behavior. We developed a novel meta-analytical approach to integrate disparate resources and generated transcriptional maps of neurological disorders in the human brain yielding a purely computational procedure to pinpoint the brain location of transcribed genes likely to be involved in either onset or maintenance of the neurological condition.


Subject(s)
Brain Mapping , Nervous System Diseases/diagnostic imaging , Neuroimaging , Pattern Recognition, Automated , Brain , Humans , Phenotype
17.
Front Neurosci ; 9: 418, 2015.
Article in English | MEDLINE | ID: mdl-26578875

ABSTRACT

The computation of image similarity is important for a wide range of analyses in neuroimaging, from decoding to meta-analysis. In many cases the images being compared have empty voxels, but the effects of such empty voxels on image similarity metrics are poorly understood. We present a detailed investigation of the influence of different degrees of image thresholding on the outcome of pairwise image comparison. Given a pair of brain maps for which one of the maps is thresholded, we show that an analysis using the intersection of non-zero voxels across images at a threshold of Z = ±1.0 maximizes accuracy for retrieval of a list of maps of the same contrast, and thresholding up to Z = ±2.0 can increase accuracy as compared to comparison using unthresholded maps. Finally, maps can be thresholded up to to Z = ±3.0 (corresponding to 25% of voxels non-empty within a standard brain mask) and still maintain a lower bound of 90% accuracy. Our results suggest that a small degree of thresholding may improve the accuracy of image similarity computations, and that robust meta-analytic image similarity comparisons can be obtained using thresholded images.

18.
Front Neuroinform ; 9: 8, 2015.
Article in English | MEDLINE | ID: mdl-25914639

ABSTRACT

Here we present NeuroVault-a web based repository that allows researchers to store, share, visualize, and decode statistical maps of the human brain. NeuroVault is easy to use and employs modern web technologies to provide informative visualization of data without the need to install additional software. In addition, it leverages the power of the Neurosynth database to provide cognitive decoding of deposited maps. The data are exposed through a public REST API enabling other services and tools to take advantage of it. NeuroVault is a new resource for researchers interested in conducting meta- and coactivation analyses.

19.
Nat Commun ; 6: 8885, 2015 Dec 09.
Article in English | MEDLINE | ID: mdl-26648521

ABSTRACT

Psychiatric disorders are characterized by major fluctuations in psychological function over the course of weeks and months, but the dynamic characteristics of brain function over this timescale in healthy individuals are unknown. Here, as a proof of concept to address this question, we present the MyConnectome project. An intensive phenome-wide assessment of a single human was performed over a period of 18 months, including functional and structural brain connectivity using magnetic resonance imaging, psychological function and physical health, gene expression and metabolomics. A reproducible analysis workflow is provided, along with open access to the data and an online browser for results. We demonstrate dynamic changes in brain connectivity over the timescales of days to months, and relations between brain connectivity, gene expression and metabolites. This resource can serve as a testbed to study the joint dynamics of human brain and metabolic function over time, an approach that is critical for the development of precision medicine strategies for brain disorders.


Subject(s)
Brain/physiology , Neural Pathways , Brain/diagnostic imaging , Follow-Up Studies , Gene Expression , Gene Regulatory Networks , Humans , Magnetic Resonance Imaging , Male , Middle Aged , Phenotype , Radiography
20.
PLoS One ; 9(4): e95493, 2014.
Article in English | MEDLINE | ID: mdl-24748378

ABSTRACT

Analyzing Functional Magnetic Resonance Imaging (fMRI) of resting brains to determine the spatial location and activity of intrinsic brain networks--a novel and burgeoning research field--is limited by the lack of ground truth and the tendency of analyses to overfit the data. Independent Component Analysis (ICA) is commonly used to separate the data into signal and Gaussian noise components, and then map these components on to spatial networks. Identifying noise from this data, however, is a tedious process that has proven hard to automate, particularly when data from different institutions, subjects, and scanners is used. Here we present an automated method to delineate noisy independent components in ICA using a data-driven infrastructure that queries a database of 246 spatial and temporal features to discover a computational signature of different types of noise. We evaluated the performance of our method to detect noisy components from healthy control fMRI (sensitivity = 0.91, specificity = 0.82, cross validation accuracy (CVA) = 0.87, area under the curve (AUC) = 0.93), and demonstrate its generalizability by showing equivalent performance on (1) an age- and scanner-matched cohort of schizophrenia patients from the same institution (sensitivity = 0.89, specificity = 0.83, CVA = 0.86), (2) an age-matched cohort on an equivalent scanner from a different institution (sensitivity = 0.88, specificity = 0.88, CVA = 0.88), and (3) an age-matched cohort on a different scanner from a different institution (sensitivity = 0.72, specificity = 0.92, CVA = 0.79). We additionally compare our approach with a recently published method. Our results suggest that our method is robust to noise variations due to population as well as scanner differences, thereby making it well suited to the goal of automatically distinguishing noise from functional networks to enable investigation of human brain function.


Subject(s)
Brain Mapping , Brain/physiology , Magnetic Resonance Imaging , Algorithms , Datasets as Topic , Humans , Image Processing, Computer-Assisted/methods , Magnetic Resonance Imaging/methods , Magnetic Resonance Imaging/standards , Principal Component Analysis , ROC Curve , Regression Analysis , Reproducibility of Results
SELECTION OF CITATIONS
SEARCH DETAIL