Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
EMBO J ; 42(23): e115008, 2023 Dec 01.
Artigo em Inglês | MEDLINE | ID: mdl-37964598

RESUMO

The main goals and challenges for the life science communities in the Open Science framework are to increase reuse and sustainability of data resources, software tools, and workflows, especially in large-scale data-driven research and computational analyses. Here, we present key findings, procedures, effective measures and recommendations for generating and establishing sustainable life science resources based on the collaborative, cross-disciplinary work done within the EOSC-Life (European Open Science Cloud for Life Sciences) consortium. Bringing together 13 European life science research infrastructures, it has laid the foundation for an open, digital space to support biological and medical research. Using lessons learned from 27 selected projects, we describe the organisational, technical, financial and legal/ethical challenges that represent the main barriers to sustainability in the life sciences. We show how EOSC-Life provides a model for sustainable data management according to FAIR (findability, accessibility, interoperability, and reusability) principles, including solutions for sensitive- and industry-related resources, by means of cross-disciplinary training and best practices sharing. Finally, we illustrate how data harmonisation and collaborative work facilitate interoperability of tools, data, solutions and lead to a better understanding of concepts, semantics and functionalities in the life sciences.


Assuntos
Disciplinas das Ciências Biológicas , Pesquisa Biomédica , Software , Fluxo de Trabalho
2.
Entropy (Basel) ; 24(5)2022 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-35626601

RESUMO

We present a novel method for interpolating univariate time series data. The proposed method combines multi-point fractional Brownian bridges, a genetic algorithm, and Takens' theorem for reconstructing a phase space from univariate time series data. The basic idea is to first generate a population of different stochastically-interpolated time series data, and secondly, to use a genetic algorithm to find the pieces in the population which generate the smoothest reconstructed phase space trajectory. A smooth trajectory curve is hereby found to have a low variance of second derivatives along the curve. For simplicity, we refer to the developed method as PhaSpaSto-interpolation, which is an abbreviation for phase-space-trajectory-smoothing stochastic interpolation. The proposed approach is tested and validated with a univariate time series of the Lorenz system, five non-model data sets and compared to a cubic spline interpolation and a linear interpolation. We find that the criterion for smoothness guarantees low errors on known model and non-model data. Finally, we interpolate the discussed non-model data sets, and show the corresponding improved phase space portraits. The proposed method is useful for interpolating low-sampled time series data sets for, e.g., machine learning, regression analysis, or time series prediction approaches. Further, the results suggest that the variance of second derivatives along a given phase space trajectory is a valuable tool for phase space analysis of non-model time series data, and we expect it to be useful for future research.

3.
J Biomed Inform ; 64: 232-254, 2016 12.
Artigo em Inglês | MEDLINE | ID: mdl-27789415

RESUMO

Complex data driven experiments form the basis of biomedical research. Recent findings warn that the context in which the software is run, that is the infrastructure and the third party dependencies, can have a crucial impact on the final results delivered by a computational experiment. This implies that in order to replicate the same result, not only the same data must be used, but also it must be run on an equivalent software stack. In this paper we present the VFramework that enables assessing replicability of workflows. It identifies whether any differences in software dependencies among two executions of the same workflow exist and whether they have impact on the produced results. We also conduct a case study in which we investigate the impact of software dependencies on replicability of Taverna workflows used in biomedical research of Huntington's disease. We re-execute analysed workflows in environments differing in operating system distribution and configuration. The results show that the VFramework can be used to identify the impact of software dependencies on the replicability of biomedical workflows. Furthermore, we observe that despite the fact that the workflows are executed in a controlled environment, they still depend on specific tools installed in the environment. The context model used by the VFramework improves the deficiencies of provenance traces and documents also such tools. Based on our findings we define guidelines for workflow owners that enable them to improve replicability of their workflows.


Assuntos
Pesquisa Biomédica/estatística & dados numéricos , Software , Fluxo de Trabalho , Biologia Computacional , Humanos
4.
Artigo em Inglês | MEDLINE | ID: mdl-37267137

RESUMO

The commercial use of machine learning (ML) is spreading; at the same time, ML models are becoming more complex and more expensive to train, which makes intellectual property protection (IPP) of trained models a pressing issue. Unlike other domains that can build on a solid understanding of the threats, attacks, and defenses available to protect their IP, ML-related research in this regard is still very fragmented. This is also due to a missing unified view as well as a common taxonomy of these aspects. In this article, we systematize our findings on IPP in ML while focusing on threats and attacks identified and defenses proposed at the time of writing. We develop a comprehensive threat model for IP in ML, categorizing attacks and defenses within a unified and consolidated taxonomy, thus bridging research from both the ML and security communities.

5.
Int J Multimed Inf Retr ; 7(3): 157-171, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30956928

RESUMO

Nowadays, there is a proliferation of available information sources from different modalities-text, images, audio, video and more. Information objects are not isolated anymore. They are frequently connected via metadata, semantic links, etc. This leads to various challenges in graph-based information retrieval. This paper is concerned with the reachability analysis of multimodal graph modelled collections. We use our framework to leverage the combination of features of different modalities through our formulation of faceted search. This study highlights the effect of different facets and link types in improving reachability of relevant information objects. The experiments are performed on the Image CLEF 2011 Wikipedia collection with about 400,000 documents and images. The results demonstrate that the combination of different facets is conductive to obtain higher reachability. We obtain 373% recall gain for very hard topics by using our graph model of the collection. Further, by adding semantic links to the collection, we gain a 10% increase in the overall recall.

6.
Neural Netw ; 19(6-7): 911-22, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16782304

RESUMO

Self-Organizing Maps have been applied in various industrial applications and have proven to be a valuable data mining tool. In order to fully benefit from their potential, advanced visualization techniques assist the user in analyzing and interpreting the maps. We propose two new methods for depicting the SOM based on vector fields, namely the Gradient Field and Borderline visualization techniques, to show the clustering structure at various levels of detail. We explain how this method can be used on aggregated parts of the SOM that show which factors contribute to the clustering structure, and show how to use it for finding correlations and dependencies in the underlying data. We provide examples on several artificial and real-world data sets to point out the strengths of our technique, specifically as a means to combine different types of visualizations offering effective multidimensional information visualization of SOMs.


Assuntos
Algoritmos , Análise por Conglomerados , Bases de Dados Factuais , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Gráficos por Computador , Simulação por Computador
7.
Artigo em Inglês | MEDLINE | ID: mdl-26167542

RESUMO

Reproducibility and reusability of research results is an important concern in scientific communication and science policy. A foundational element of reproducibility and reusability is the open and persistently available presentation of research data. However, many common approaches for primary data publication in use today do not achieve sufficient long-term robustness, openness, accessibility or uniformity. Nor do they permit comprehensive exploitation by modern Web technologies. This has led to several authoritative studies recommending uniform direct citation of data archived in persistent repositories. Data are to be considered as first-class scholarly objects, and treated similarly in many ways to cited and archived scientific and scholarly literature. Here we briefly review the most current and widely agreed set of principle-based recommendations for scholarly data citation, the Joint Declaration of Data Citation Principles (JDDCP). We then present a framework for operationalizing the JDDCP; and a set of initial recommendations on identifier schemes, identifier resolution behavior, required metadata elements, and best practices for realizing programmatic machine actionability of cited data. The main target audience for the common implementation guidelines in this article consists of publishers, scholarly organizations, and persistent data repositories, including technical staff members in these organizations. But ordinary researchers can also benefit from these recommendations. The guidance provided here is intended to help achieve widespread, uniform human and machine accessibility of deposited data, in support of significantly improved verification, validation, reproducibility and re-use of scholarly/scientific data.

8.
IEEE Trans Vis Comput Graph ; 20(12): 1703-12, 2014 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-26356884

RESUMO

Multi-class classifiers often compute scores for the classification samples describing probabilities to belong to different classes. In order to improve the performance of such classifiers, machine learning experts need to analyze classification results for a large number of labeled samples to find possible reasons for incorrect classification. Confusion matrices are widely used for this purpose. However, they provide no information about classification scores and features computed for the samples. We propose a set of integrated visual methods for analyzing the performance of probabilistic classifiers. Our methods provide insight into different aspects of the classification results for a large number of samples. One visualization emphasizes at which probabilities these samples were classified and how these probabilities correlate with classification error in terms of false positives and false negatives. Another view emphasizes the features of these samples and ranks them by their separation power between selected true and false classifications. We demonstrate the insight gained using our technique in a benchmarking classification dataset, and show how it enables improving classification performance by interactively defining and evaluating post-classification rules.

9.
IEEE Trans Neural Netw ; 19(9): 1518-30, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18779085

RESUMO

In this paper, we present a neural classifier algorithm that locally approximates the decision surface of labeled data by a patchwork of separating hyperplanes, which are arranged under certain topological constraints similar to those of self-organizing maps (SOMs). We take advantage of the fact that these boundaries can often be represented by linear ones connected by a low-dimensional nonlinear manifold, thus influencing the placement of the separators. The resulting classifier allows for a voting scheme that averages over the classification results of neighboring hyperplanes. Our algorithm is computationally efficient both in terms of training and classification. Further, we present a model selection method to estimate the topology of the classification boundary. We demonstrate the algorithm's usefulness on several artificial and real-world data sets and compare it to the state-of-the-art supervised learning algorithms.


Assuntos
Algoritmos , Técnicas de Apoio para a Decisão , Modelos Teóricos , Redes Neurais de Computação , Reconhecimento Automatizado de Padrão/métodos , Inteligência Artificial , Simulação por Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA