Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
Más filtros

Banco de datos
Tipo del documento
País de afiliación
Intervalo de año de publicación
1.
Cluster Comput ; 13(3): 315-333, 2010 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-22623878

RESUMEN

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multidimensional parameter space consisting of input performance parameters to the applications that are known to affect their execution times. While some performance parameters such as grouping of workflow components and their mapping to machines do not affect the accuracy of the analysis, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple such parameters. Using two real-world applications in the spatial, multidimensional data analysis domain, we present an experimental evaluation of the proposed framework.

2.
Pac Symp Biocomput ; 24: 208-219, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-30864323

RESUMEN

Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers' approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.


Asunto(s)
Benchmarking/métodos , Semántica , Flujo de Trabajo , Algoritmos , Biología Computacional/métodos , Perfilación de la Expresión Génica/estadística & datos numéricos , Genómica , Metadatos , Proteínas/genética , Proteínas/metabolismo , Reproducibilidad de los Resultados , Análisis de Secuencia de ARN/estadística & datos numéricos
3.
CEUR Workshop Proc ; 1931: 63-70, 2017 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-30034319

RESUMEN

Scientific collaborations involving multiple institutions are increasingly commonplace. It is not unusual for publications to have dozens or hundreds of authors, in some cases even a few thousands. Gathering the information for such papers may be very time consuming, since the author list must include authors who made different kinds of contributions and whose affiliations are hard to track. Similarly, when datasets are contributed by multiple institutions, the collection and processing details may also be hard to assemble due to the many individuals involved. We present our work to date on automatically generating author lists and other portions of scientific papers for multi-institutional collaborations based on the metadata created to represent the people, data, and activities involved. Our initial focus is ENIGMA, a large international collaboration for neuroimaging genetics.

4.
Genome Med ; 7: 73, 2015 Jul 25.
Artículo en Inglés | MEDLINE | ID: mdl-26289940

RESUMEN

BACKGROUND: Recent highly publicized cases of premature patient assignment into clinical trials, resulting from non-reproducible omics analyses, have prompted many to call for a more thorough examination of translational omics and highlighted the critical need for transparency and reproducibility to ensure patient safety. The use of workflow platforms such as Galaxy and Taverna have greatly enhanced the use, transparency and reproducibility of omics analysis pipelines in the research domain and would be an invaluable tool in a clinical setting. However, the use of these workflow platforms requires deep domain expertise that, particularly within the multi-disciplinary fields of translational and clinical omics, may not always be present in a clinical setting. This lack of domain expertise may put patient safety at risk and make these workflow platforms difficult to operationalize in a clinical setting. In contrast, semantic workflows are a different class of workflow platform where resultant workflow runs are transparent, reproducible, and semantically validated. Through semantic enforcement of all datasets, analyses and user-defined rules/constraints, users are guided through each workflow run, enhancing analytical validity and patient safety. METHODS: To evaluate the effectiveness of semantic workflows within translational and clinical omics, we have implemented a clinical omics pipeline for annotating DNA sequence variants identified through next generation sequencing using the Workflow Instance Generation and Specialization (WINGS) semantic workflow platform. RESULTS: We found that the implementation and execution of our clinical omics pipeline in a semantic workflow helped us to meet the requirements for enhanced transparency, reproducibility and analytical validity recommended for clinical omics. We further found that many features of the WINGS platform were particularly primed to help support the critical needs of clinical omics analyses. CONCLUSIONS: This is the first implementation and execution of a clinical omics pipeline using semantic workflows. Evaluation of this implementation provides guidance for their use in both translational and clinical settings.


Asunto(s)
Biología Computacional/métodos , Flujo de Trabajo , Humanos , Neoplasias/genética , Reproducibilidad de los Resultados , Análisis de Secuencia de ADN
5.
Artículo en Inglés | MEDLINE | ID: mdl-22068617

RESUMEN

Data analysis processes in scientific applications can be expressed as coarse-grain workflows of complex data processing operations with data flow dependencies between them. Performance optimization of these workflows can be viewed as a search for a set of optimal values in a multi-dimensional parameter space. While some performance parameters such as grouping of workflow components and their mapping to machines do not a ect the accuracy of the output, others may dictate trading the output quality of individual components (and of the whole workflow) for performance. This paper describes an integrated framework which is capable of supporting performance optimizations along multiple dimensions of the parameter space. Using two real-world applications in the spatial data analysis domain, we present an experimental evaluation of the proposed framework.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA