Your browser doesn't support javascript.
loading
Towards Machine-FAIR: Representing software and datasets to facilitate reuse and scientific discovery by machines.
Wagner, Michael M; Hogan, William R; Levander, John D; Diller, Matthew.
Afiliação
  • Wagner MM; Department of Biomedical Informatics, University of Pittsburgh, 5607 Baum Boulevard, Pittsburgh, PA 15206-3701, USA. Electronic address: mmw1@pitt.edu.
  • Hogan WR; Data Science Institute, Medical College of Wisconsin, Milwaukee, WI, USA.
  • Levander JD; Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA, USA.
  • Diller M; Department of Health Outcomes and Biomedical Informatics, University of Florida, Gainesville, FL, USA.
J Biomed Inform ; 154: 104647, 2024 Jun.
Article em En | MEDLINE | ID: mdl-38692465
ABSTRACT

OBJECTIVE:

To use software, datasets, and data formats in the domain of Infectious Disease Epidemiology as a test collection to evaluate a novel M1 use case, which we introduce in this paper. M1 is a machine that upon receipt of a new digital object of research exhaustively finds all valid compositions of it with existing objects.

METHOD:

We implemented a data-format-matching-only M1 using exhaustive search, which we refer to as M1DFM. We then ran M1DFM on the test collection and used error analysis to identify needed semantic constraints.

RESULTS:

Precision of M1DFM search was 61.7%. Error analysis identified needed semantic constraints and needed changes in handling of data services. Most semantic constraints were simple, but one data format was sufficiently complex to be practically impossible to represent semantic constraints over, from which we conclude limitatively that software developers will have to meet the machines halfway by engineering software whose inputs are sufficiently simple that their semantic constraints can be represented, akin to the simple APIs of services. We summarize these insights as M1-FAIR guiding principles for composability and suggest a roadmap for progressively capable devices in the service of reuse and accelerated scientific discovery.

CONCLUSION:

Algorithmic search of digital repositories for valid workflow compositions has potential to accelerate scientific discovery but requires a scalable solution to the problem of knowledge acquisition about semantic constraints on software inputs. Additionally, practical limitations on the logical complexity of semantic constraints must be respected, which has implications for the design of software.
Assuntos
Palavras-chave

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software Limite: Humans Idioma: En Revista: J Biomed Inform Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article

Texto completo: 1 Base de dados: MEDLINE Assunto principal: Software Limite: Humans Idioma: En Revista: J Biomed Inform Assunto da revista: INFORMATICA MEDICA Ano de publicação: 2024 Tipo de documento: Article