Search | VHL Regional Portal

Correction: Reproducible big data science: A case study in continuous FAIRness.

Madduri, Ravi; Chard, Kyle; D'Arcy, Mike; Jung, Segun C; Rodriguez, Alexis; Sulakhe, Dinanath; Deutsch, Eric; Funk, Cory; Heavner, Ben; Richards, Matthew; Shannon, Paul; Glusman, Gustavo; Price, Nathan; Kesselman, Carl; Foster, Ian.

PLoS One ; 18(11): e0294883, 2023.

Article in English | MEDLINE | ID: mdl-37988378

ABSTRACT

[This corrects the article DOI: 10.1371/journal.pone.0213013.].

Making Common Fund data more findable: catalyzing a data ecosystem.

Charbonneau, Amanda L; Brady, Arthur; Czajkowski, Karl; Aluvathingal, Jain; Canchi, Saranya; Carter, Robert; Chard, Kyle; Clarke, Daniel J B; Crabtree, Jonathan; Creasy, Heather H; D'Arcy, Mike; Felix, Victor; Giglio, Michelle; Gingrich, Alicia; Harris, Rayna M; Hodges, Theresa K; Ifeonu, Olukemi; Jeon, Minji; Kropiwnicki, Eryk; Lim, Marisa C W; Liming, R Lee; Lumian, Jessica; Mahurkar, Anup A; Mandal, Meisha; Munro, James B; Nadendla, Suvarna; Richter, Rudyard; Romano, Cia; Rocca-Serra, Philippe; Schor, Michael; Schuler, Robert E; Tangmunarunkit, Hongsuda; Waldrop, Alex; Williams, Cris; Word, Karen; Sansone, Susanna-Assunta; Ma'ayan, Avi; Wagner, Rick; Foster, Ian; Kesselman, Carl; Brown, C Titus; White, Owen.

Gigascience ; 112022 11 21.

Article in English | MEDLINE | ID: mdl-36409836

ABSTRACT

The Common Fund Data Ecosystem (CFDE) has created a flexible system of data federation that enables researchers to discover datasets from across the US National Institutes of Health Common Fund without requiring that data owners move, reformat, or rehost those data. This system is centered on a catalog that integrates detailed descriptions of biomedical datasets from individual Common Fund Programs' Data Coordination Centers (DCCs) into a uniform metadata model that can then be indexed and searched from a centralized portal. This Crosscut Metadata Model (C2M2) supports the wide variety of data types and metadata terms used by individual DCCs and can readily describe nearly all forms of biomedical research data. We detail its use to ingest and index data from 11 DCCs.

Subject(s)

Ecosystem , Financial Management , Metadata

Towards Co-Evolution of Data-Centric Ecosystems.

Schuler, Robert; Czajkowski, Karl; D'Arcy, Mike; Tangmunarunkit, Hongsuda; Kesselman, Carl.

Sci Stat Database Manag ; 20202020 Jul.

Article in English | MEDLINE | ID: mdl-37614739

ABSTRACT

Database evolution is a notoriously difficult task, and it is exacerbated by the necessity to evolve database-dependent applications. As science becomes increasingly dependent on sophisticated data management, the need to evolve an array of database-driven systems will only intensify. In this paper, we present an architecture for data-centric ecosystems that allows the components to seamlessly co-evolve by centralizing the models and mappings at the data service and pushing model-adaptive interactions to the database clients. Boundary objects fill the gap where applications are unable to adapt and need a stable interface to interact with the components of the ecosystem. Finally, evolution of the ecosystem is enabled via integrated schema modification and model management operations. We present use cases from actual experiences that demonstrate the utility of our approach.

Reproducible big data science: A case study in continuous FAIRness.

PLoS One ; 14(4): e0213013, 2019.

Article in English | MEDLINE | ID: mdl-30973881

ABSTRACT

Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility-thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes.

Subject(s)

Big Data , Data Science/statistics & numerical data , Databases, Factual/statistics & numerical data , Algorithms , Humans , Information Dissemination , Longitudinal Studies , Software

Neuroanatomical morphometric characterization of sex differences in youth using statistical learning.

Sepehrband, Farshid; Lynch, Kirsten M; Cabeen, Ryan P; Gonzalez-Zacarias, Clio; Zhao, Lu; D'Arcy, Mike; Kesselman, Carl; Herting, Megan M; Dinov, Ivo D; Toga, Arthur W; Clark, Kristi A.

Neuroimage ; 172: 217-227, 2018 05 15.

Article in English | MEDLINE | ID: mdl-29414494

ABSTRACT

Exploring neuroanatomical sex differences using a multivariate statistical learning approach can yield insights that cannot be derived with univariate analysis. While gross differences in total brain volume are well-established, uncovering the more subtle, regional sex-related differences in neuroanatomy requires a multivariate approach that can accurately model spatial complexity as well as the interactions between neuroanatomical features. Here, we developed a multivariate statistical learning model using a support vector machine (SVM) classifier to predict sex from MRI-derived regional neuroanatomical features from a single-site study of 967 healthy youth from the Philadelphia Neurodevelopmental Cohort (PNC). Then, we validated the multivariate model on an independent dataset of 682 healthy youth from the multi-site Pediatric Imaging, Neurocognition and Genetics (PING) cohort study. The trained model exhibited an 83% cross-validated prediction accuracy, and correctly predicted the sex of 77% of the subjects from the independent multi-site dataset. Results showed that cortical thickness of the middle occipital lobes and the angular gyri are major predictors of sex. Results also demonstrated the inferential benefits of going beyond classical regression approaches to capture the interactions among brain features in order to better characterize sex differences in male and female youths. We also identified specific cortical morphological measures and parcellation techniques, such as cortical thickness as derived from the Destrieux atlas, that are better able to discriminate between males and females in comparison to other brain atlases (Desikan-Killiany, Brodmann and subcortical atlases).

Subject(s)

Brain/anatomy & histology , Image Interpretation, Computer-Assisted/methods , Sex Characteristics , Support Vector Machine , Adolescent , Child , Female , Humans , Magnetic Resonance Imaging/methods , Male , Young Adult

A system architecture for sharing de-identified, research-ready brain scans and health information across clinical imaging centers.

Chervenak, Ann L; van Erp, Theo G M; Kesselman, Carl; D'Arcy, Mike; Sobell, Janet; Keator, David; Dahm, Lisa; Murry, Jim; Law, Meng; Hasso, Anton; Ames, Joseph; Macciardi, Fabio; Potkin, Steven G.

Stud Health Technol Inform ; 175: 19-28, 2012.

Article in English | MEDLINE | ID: mdl-22941984

ABSTRACT

Progress in our understanding of brain disorders increasingly relies on the costly collection of large standardized brain magnetic resonance imaging (MRI) data sets. Moreover, the clinical interpretation of brain scans benefits from compare and contrast analyses of scans from patients with similar, and sometimes rare, demographic, diagnostic, and treatment status. A solution to both needs is to acquire standardized, research-ready clinical brain scans and to build the information technology infrastructure to share such scans, along with other pertinent information, across hospitals. This paper describes the design, deployment, and operation of a federated imaging system that captures and shares standardized, de-identified clinical brain images in a federation across multiple institutions. In addition to describing innovative aspects of the system architecture and our initial testing of the deployed infrastructure, we also describe the Standardized Imaging Protocol (SIP) developed for the project and our interactions with the Institutional Review Board (IRB) regarding handling patient data in the federated environment.

Subject(s)

Brain Diseases/pathology , Brain/pathology , Information Dissemination/methods , Information Storage and Retrieval/methods , Internet , Medical Informatics/methods , Radiology Information Systems/organization & administration , Humans

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL