Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 21
Filtrar
1.
J Synchrotron Radiat ; 2024 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-39007823

RESUMO

StreamSAXS is a Python-based small- and wide-angle X-ray scattering (SAXS/WAXS) data analysis workflow platform with graphical user interface (GUI). It aims to provide an interactive and user-friendly tool for analysis of both batch data files and real-time data streams. Users can easily create customizable workflows through the GUI to meet their specific needs. One characteristic of StreamSAXS is its plug-in framework, which enables developers to extend the built-in workflow tasks. Another feature is the support for both already acquired and real-time data sources, allowing StreamSAXS to function as an offline analysis platform or be integrated into large-scale acquisition systems for end-to-end data management. This paper presents the core design of StreamSAXS and provides user cases demonstrating its utilization for SAXS/WAXS data analysis in offline and online scenarios.

2.
Sensors (Basel) ; 19(20)2019 Oct 11.
Artigo em Inglês | MEDLINE | ID: mdl-31614544

RESUMO

Discovering the Bayesian network (BN) structure from big datasets containing rich causal relationships is becoming increasingly valuable for modeling and reasoning under uncertainties in many areas with big data gathered from sensors due to high volume and fast veracity. Most of the current BN structure learning algorithms have shortcomings facing big data. First, learning a BN structure from the entire big dataset is an expensive task which often ends in failure due to memory constraints. Second, it is quite difficult to select a learner from numerous BN structure learning algorithms to consistently achieve good learning accuracy. Lastly, there is a lack of an intelligent method that merges separately learned BN structures into a well structured BN network. To address these shortcomings, we introduce a novel parallel learning approach called PEnBayes (Parallel Ensemble-based Bayesian network learning). PEnBayes starts with an adaptive data preprocessing phase that calculates the Appropriate Learning Size and intelligently divides a big dataset for fast distributed local structure learning. Then, PEnBayes learns a collection of local BN Structures in parallel using a two-layered weighted adjacent matrix-based structure ensemble method. Lastly, PEnBayes merges the local BN Structures into a global network structure using the structure ensemble method at the global layer. For the experiment, we generate big data sets by simulating sensor data from patient monitoring, transportation, and disease diagnosis domains. The Experimental results show that PEnBayes achieves a significantly improved execution performance with more consistent and stable results compared with three baseline learning algorithms.

3.
BMC Bioinformatics ; 19(1): 235, 2018 06 22.
Artigo em Inglês | MEDLINE | ID: mdl-29929475

RESUMO

BACKGROUND: In the rational drug design process, an ensemble of conformations obtained from a molecular dynamics simulation plays a crucial role in docking experiments. Some studies have found that Fully-Flexible Receptor (FFR) models predict realistic binding energy accurately and improve scoring to enhance selectiveness. At the same time, methods have been proposed to reduce the high computational costs involved in considering the explicit flexibility of proteins in receptor-ligand docking. This study introduces a novel method to optimize ensemble docking-based experiments by reducing the size of an InhA FFR model at docking runtime and scaling docking workflow invocations on cloud virtual machines. RESULTS: First, in order to find the most affordable cost-benefit pool of virtual machines, we evaluated the performance of the docking workflow invocations in different configurations of Azure instances. Second, we validated the gains obtained by the proposed method based on the quality of the Reduced Fully-Flexible Receptor (RFFR) models produced using AutoDock4.2. The analyses show that the proposed method reduced the model size by approximately 50% while covering at least 86% of the best docking results from the 74 ligands tested. Third, we tested our novel method using AutoDock Vina, a different docking software, and showed the positive accuracy achieved in the resulting RFFR models. Finally, our results demonstrated that the method proposed optimized ensemble docking experiments and is applicable to different docking software. In addition, it detected new binding modes, which would be unreachable if employing only the rigid structure used to generate the InhA FFR model. CONCLUSIONS: Our results showed that the selective method is a valuable strategy for optimizing ensemble docking-based experiments using different docking software. The RFFR models produced by discarding non-promising snapshots from the original model are accurately shaped for a larger number of ligands, and the elapsed time spent in the ensemble docking experiments are considerably reduced.


Assuntos
Desenho de Fármacos , Simulação de Acoplamento Molecular/métodos
4.
Zhongguo Zhong Yao Za Zhi ; 42(23): 4488-4493, 2017 Dec.
Artigo em Chinês | MEDLINE | ID: mdl-29376242

RESUMO

The whole process quality control and management of traditional Chinese medicine (TCM) decoction pieces is a system engineering, involving the base environment, seeds and seedlings, harvesting, processing and other multiple steps, so the accurate identification of factors in TCM production process that may induce the quality risk, as well as reasonable quality control measures are very important. At present, the concept of quality risk is mainly concentrated in the aspects of management and regulations, etc. There is no comprehensive analysis on possible risks in the quality control process of TCM decoction pieces, or analysis summary of effective quality control schemes. A whole process quality control and management system for TCM decoction pieces based on TCM quality tree was proposed in this study. This system effectively combined the process analysis method of TCM quality tree with the quality risk management, and can help managers to make real-time decisions while realizing the whole process quality control of TCM. By providing personalized web interface, this system can realize user-oriented information feedback, and was convenient for users to predict, evaluate and control the quality of TCM. In the application process, the whole process quality control and management system of the TCM decoction pieces can identify the related quality factors such as base environment, cultivation and pieces processing, extend and modify the existing scientific workflow according to their own production conditions, and provide different enterprises with their own quality systems, to achieve the personalized service. As a new quality management model, this paper can provide reference for improving the quality of Chinese medicine production and quality standardization.


Assuntos
Medicamentos de Ervas Chinesas/normas , Medicina Tradicional Chinesa/normas , Controle de Qualidade , Internet , Gestão da Qualidade Total
5.
Behav Res Methods ; 48(2): 542-52, 2016 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-26170051

RESUMO

This article describes a new open source scientific workflow system, the TimeStudio Project, dedicated to the behavioral and brain sciences. The program is written in MATLAB and features a graphical user interface for the dynamic pipelining of computer algorithms developed as TimeStudio plugins. TimeStudio includes both a set of general plugins (for reading data files, modifying data structures, visualizing data structures, etc.) and a set of plugins specifically developed for the analysis of event-related eyetracking data as a proof of concept. It is possible to create custom plugins to integrate new or existing MATLAB code anywhere in a workflow, making TimeStudio a flexible workbench for organizing and performing a wide range of analyses. The system also features an integrated sharing and archiving tool for TimeStudio workflows, which can be used to share workflows both during the data analysis phase and after scientific publication. TimeStudio thus facilitates the reproduction and replication of scientific studies, increases the transparency of analyses, and reduces individual researchers' analysis workload. The project website ( http://timestudioproject.com ) contains the latest releases of TimeStudio, together with documentation and user forums.


Assuntos
Algoritmos , Pesquisa Comportamental/métodos , Neurociências/métodos , Software , Interface Usuário-Computador , Fluxo de Trabalho , Comportamento , Encéfalo/fisiologia , Medições dos Movimentos Oculares , Humanos , Estatística como Assunto
6.
J Biomed Inform ; 56: 239-64, 2015 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-26079262

RESUMO

CONTEXT: Most specialized users (scientists) that use bioinformatics applications do not have suitable training on software development. Software Product Line (SPL) employs the concept of reuse considering that it is defined as a set of systems that are developed from a common set of base artifacts. In some contexts, such as in bioinformatics applications, it is advantageous to develop a collection of related software products, using SPL approach. If software products are similar enough, there is the possibility of predicting their commonalities, differences and then reuse these common features to support the development of new applications in the bioinformatics area. OBJECTIVES: This paper presents the PL-Science approach which considers the context of SPL and ontology in order to assist scientists to define a scientific experiment, and to specify a workflow that encompasses bioinformatics applications of a given experiment. This paper also focuses on the use of ontologies to enable the use of Software Product Line in biological domains. METHOD: In the context of this paper, Scientific Software Product Line (SSPL) differs from the Software Product Line due to the fact that SSPL uses an abstract scientific workflow model. This workflow is defined according to a scientific domain and using this abstract workflow model the products (scientific applications/algorithms) are instantiated. RESULTS: Through the use of ontology as a knowledge representation model, we can provide domain restrictions as well as add semantic aspects in order to facilitate the selection and organization of bioinformatics workflows in a Scientific Software Product Line. The use of ontologies enables not only the expression of formal restrictions but also the inferences on these restrictions, considering that a scientific domain needs a formal specification. CONCLUSIONS: This paper presents the development of the PL-Science approach, encompassing a methodology and an infrastructure, and also presents an approach evaluation. This evaluation presents case studies in bioinformatics, which were conducted in two renowned research institutions in Brazil.


Assuntos
Biologia Computacional/instrumentação , Biologia Computacional/métodos , Software , Algoritmos , Brasil , Computação em Nuvem , Análise por Conglomerados , Bases de Dados Factuais , Internet , Variações Dependentes do Observador , Linguagens de Programação , Alinhamento de Sequência , Análise de Sequência de DNA
7.
J Biomed Inform ; 49: 119-33, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24462600

RESUMO

Due to the upcoming data deluge of genome data, the need for storing and processing large-scale genome data, easy access to biomedical analyses tools, efficient data sharing and retrieval has presented significant challenges. The variability in data volume results in variable computing and storage requirements, therefore biomedical researchers are pursuing more reliable, dynamic and convenient methods for conducting sequencing analyses. This paper proposes a Cloud-based bioinformatics workflow platform for large-scale next-generation sequencing analyses, which enables reliable and highly scalable execution of sequencing analyses workflows in a fully automated manner. Our platform extends the existing Galaxy workflow system by adding data management capabilities for transferring large quantities of data efficiently and reliably (via Globus Transfer), domain-specific analyses tools preconfigured for immediate use by researchers (via user-specific tools integration), automatic deployment on Cloud for on-demand resource allocation and pay-as-you-go pricing (via Globus Provision), a Cloud provisioning tool for auto-scaling (via HTCondor scheduler), and the support for validating the correctness of workflows (via semantic verification tools). Two bioinformatics workflow use cases as well as performance evaluation are presented to validate the feasibility of the proposed approach.


Assuntos
Biologia Computacional , Armazenamento e Recuperação da Informação , Análise de Sequência/instrumentação
8.
Front Genet ; 13: 941996, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-36092917

RESUMO

Constructing a novel bioinformatic workflow by reusing and repurposing fragments crossing workflows is regarded as an error-avoiding and effort-saving strategy. Traditional techniques have been proposed to discover scientific workflow fragments leveraging their profiles and historical usages of their activities (or services). However, social relations of workflows, including relations between services and their developers have not been explored extensively. In fact, current techniques describe invoking relations between services, mostly, and they can hardly reveal implicit relations between services. To address this challenge, we propose a social-aware scientific workflow knowledge graph (S 2 KG) to capture common types of entities and various types of relations by analyzing relevant information about bioinformatic workflows and their developers recorded in repositories. Using attributes of entities such as credit and creation time, the union impact of several positive and negative links in S 2 KG is identified, to evaluate the feasibility of workflow fragment construction. To facilitate the discovery of single services, a service invoking network is extracted form S 2 KG, and service communities are constructed accordingly. A bioinformatic workflow fragment discovery mechanism based on Yen's method is developed to discover appropriate fragments with respect to certain user's requirements. Extensive experiments are conducted, where bioinformatic workflows publicly accessible at the myExperiment repository are adopted. Evaluation results show that our technique performs better than the state-of-the-art techniques in terms of the precision, recall, and F1.

9.
PeerJ Comput Sci ; 7: e747, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34805503

RESUMO

BACKGROUND: Recent technological developments have enabled the execution of more scientific solutions on cloud platforms. Cloud-based scientific workflows are subject to various risks, such as security breaches and unauthorized access to resources. By attacking side channels or virtual machines, attackers may destroy servers, causing interruption and delay or incorrect output. Although cloud-based scientific workflows are often used for vital computational-intensive tasks, their failure can come at a great cost. METHODOLOGY: To increase workflow reliability, we propose the Fault and Intrusion-tolerant Workflow Scheduling algorithm (FITSW). The proposed workflow system uses task executors consisting of many virtual machines to carry out workflow tasks. FITSW duplicates each sub-task three times, uses an intermediate data decision-making mechanism, and then employs a deadline partitioning method to determine sub-deadlines for each sub-task. This way, dynamism is achieved in task scheduling using the resource flow. The proposed technique generates or recycles task executors, keeps the workflow clean, and improves efficiency. Experiments were conducted on WorkflowSim to evaluate the effectiveness of FITSW using metrics such as task completion rate, success rate and completion time. RESULTS: The results show that FITSW not only raises the success rate by about 12%, it also improves the task completion rate by 6.2% and minimizes the completion time by about 15.6% in comparison with intrusion tolerant scientific workflow ITSW system.

10.
Front Bioinform ; 1: 731345, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-36303787

RESUMO

Predicting the physical or functional associations through protein-protein interactions (PPIs) represents an integral approach for inferring novel protein functions and discovering new drug targets during repositioning analysis. Recent advances in high-throughput data generation and multi-omics techniques have enabled large-scale PPI predictions, thus promoting several computational methods based on different levels of biological evidence. However, integrating multiple results and strategies to optimize, extract interaction features automatically and scale up the entire PPI prediction process is still challenging. Most procedures do not offer an in-silico validation process to evaluate the predicted PPIs. In this context, this paper presents the PredPrIn scientific workflow that enables PPI prediction based on multiple lines of evidence, including the structure, sequence, and functional annotation categories, by combining boosting and stacking machine learning techniques. We also present a pipeline (PPIVPro) for the validation process based on cellular co-localization filtering and a focused search of PPI evidence on scientific publications. Thus, our combined approach provides means to extensive scale training or prediction of new PPIs and a strategy to evaluate the prediction quality. PredPrIn and PPIVPro are publicly available at https://github.com/YasCoMa/predprin and https://github.com/YasCoMa/ppi_validation_process.

11.
F1000Res ; 10: 320, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34136134

RESUMO

Workflows are the keystone of bioimage analysis, and the NEUBIAS (Network of European BioImage AnalystS) community is trying to gather the actors of this field and organize the information around them.  One of its most recent outputs is the opening of the F1000Research NEUBIAS gateway, whose main objective is to offer a channel of publication for bioimage analysis workflows and associated resources. In this paper we want to express some personal opinions and recommendations related to finding, handling and developing bioimage analysis workflows.  The emergence of "big data" in bioimaging and resource-intensive analysis algorithms make local data storage and computing solutions a limiting factor. At the same time, the need for data sharing with collaborators and a general shift towards remote work, have created new challenges and avenues for the execution and sharing of bioimage analysis workflows. These challenges are to reproducibly run workflows in remote environments, in particular when their components come from different software packages, but also to document them and link their parameters and results by following the FAIR principles (Findable, Accessible, Interoperable, Reusable) to foster open and reproducible science. In this opinion paper, we focus on giving some directions to the reader to tackle these challenges and navigate through this complex ecosystem, in order to find and use workflows, and to compare workflows addressing the same problem. We also discuss tools to run workflows in the cloud and on High Performance Computing resources, and suggest ways to make these workflows FAIR.


Assuntos
Biologia Computacional , Ecossistema , Algoritmos , Armazenamento e Recuperação da Informação , Fluxo de Trabalho
12.
PeerJ ; 8: e8214, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-31934500

RESUMO

Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases including cancer. Despite the advances in whole genome sequencing, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. We present sv-callers, a highly portable workflow that enables parallel execution of multiple SV detection tools, as well as provide users with example analyses of detected SV callsets in a Jupyter Notebook. This workflow supports easy deployment of software dependencies, configuration and addition of new analysis tools. Moreover, porting it to different computing systems requires minimal effort. Finally, we demonstrate the utility of the workflow by performing both somatic and germline SV analyses on different high-performance computing systems.

13.
Gigascience ; 8(5)2019 05 01.
Artigo em Inglês | MEDLINE | ID: mdl-31029061

RESUMO

BACKGROUND: The complex nature of biological data has driven the development of specialized software tools. Scientific workflow management systems simplify the assembly of such tools into pipelines, assist with job automation, and aid reproducibility of analyses. Many contemporary workflow tools are specialized or not designed for highly complex workflows, such as with nested loops, dynamic scheduling, and parametrization, which is common in, e.g., machine learning. FINDINGS: SciPipe is a workflow programming library implemented in the programming language Go, for managing complex and dynamic pipelines in bioinformatics, cheminformatics, and other fields. SciPipe helps in particular with workflow constructs common in machine learning, such as extensive branching, parameter sweeps, and dynamic scheduling and parametrization of downstream tasks. SciPipe builds on flow-based programming principles to support agile development of workflows based on a library of self-contained, reusable components. It supports running subsets of workflows for improved iterative development and provides a data-centric audit logging feature that saves a full audit trace for every output file of a workflow, which can be converted to other formats such as HTML, TeX, and PDF on demand. The utility of SciPipe is demonstrated with a machine learning pipeline, a genomics, and a transcriptomics pipeline. CONCLUSIONS: SciPipe provides a solution for agile development of complex and dynamic pipelines, especially in machine learning, through a flexible application programming interface suitable for scientists used to programming or scripting.


Assuntos
Biologia Computacional , Genômica , Software , Biblioteca Gênica , Aprendizado de Máquina , Linguagens de Programação , Fluxo de Trabalho
14.
Asian Pac J Cancer Prev ; 19(1): 243-246, 2018 Jan 27.
Artigo em Inglês | MEDLINE | ID: mdl-29374408

RESUMO

Objective: Epigenetic modifications involving DNA methylation and histone statud are responsible for the stable maintenance of cellular phenotypes. Abnormalities may be causally involved in cancer development and therefore could have diagnostic potential. The field of epigenomics refers to all epigenetic modifications implicated in control of gene expression, with a focus on better understanding of human biology in both normal and pathological states. Epigenomics scientific workflow is essentially a data processing pipeline to automate the execution of various genome sequencing operations or tasks. Cloud platform is a popular computing platform for deploying large scale epigenomics scientific workflow. Its dynamic environment provides various resources to scientific users on a pay-per-use billing model. Scheduling epigenomics scientific workflow tasks is a complicated problem in cloud platform. We here focused on application of an improved particle swam optimization (IPSO) algorithm for this purpose. Methods: The IPSO algorithm was applied to find suitable resources and allocate epigenomics tasks so that the total cost was minimized for detection of epigenetic abnormalities of potential application for cancer diagnosis. Result: The results showed that IPSO based task to resource mapping reduced total cost by 6.83 percent as compared to the traditional PSO algorithm. Conclusion: The results for various cancer diagnosis tasks showed that IPSO based task to resource mapping can achieve better costs when compared to PSO based mapping for epigenomics scientific application workflow.

15.
Front Neurosci ; 12: 236, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29692705

RESUMO

Existing tools for the preprocessing of EEG data provide a large choice of methods to suitably prepare and analyse a given dataset. Yet it remains a challenge for the average user to integrate methods for batch processing of the increasingly large datasets of modern research, and compare methods to choose an optimal approach across the many possible parameter configurations. Additionally, many tools still require a high degree of manual decision making for, e.g., the classification of artifacts in channels, epochs or segments. This introduces extra subjectivity, is slow, and is not reproducible. Batching and well-designed automation can help to regularize EEG preprocessing, and thus reduce human effort, subjectivity, and consequent error. The Computational Testing for Automated Preprocessing (CTAP) toolbox facilitates: (i) batch processing that is easy for experts and novices alike; (ii) testing and comparison of preprocessing methods. Here we demonstrate the application of CTAP to high-resolution EEG data in three modes of use. First, a linear processing pipeline with mostly default parameters illustrates ease-of-use for naive users. Second, a branching pipeline illustrates CTAP's support for comparison of competing methods. Third, a pipeline with built-in parameter-sweeping illustrates CTAP's capability to support data-driven method parameterization. CTAP extends the existing functions and data structure from the well-known EEGLAB toolbox, based on Matlab, and produces extensive quality control outputs. CTAP is available under MIT open-source licence from https://github.com/bwrc/ctap.

16.
PeerJ ; 5: e3509, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28695067

RESUMO

There are many steps in analyzing transcriptome data, from the acquisition of raw data to the selection of a subset of representative genes that explain a scientific hypothesis. The data produced can be represented as networks of interactions among genes and these may additionally be integrated with other biological databases, such as Protein-Protein Interactions, transcription factors and gene annotation. However, the results of these analyses remain fragmented, imposing difficulties, either for posterior inspection of results, or for meta-analysis by the incorporation of new related data. Integrating databases and tools into scientific workflows, orchestrating their execution, and managing the resulting data and its respective metadata are challenging tasks. Additionally, a great amount of effort is equally required to run in-silico experiments to structure and compose the information as needed for analysis. Different programs may need to be applied and different files are produced during the experiment cycle. In this context, the availability of a platform supporting experiment execution is paramount. We present GeNNet, an integrated transcriptome analysis platform that unifies scientific workflows with graph databases for selecting relevant genes according to the evaluated biological systems. It includes GeNNet-Wf, a scientific workflow that pre-loads biological data, pre-processes raw microarray data and conducts a series of analyses including normalization, differential expression inference, clusterization and gene set enrichment analysis. A user-friendly web interface, GeNNet-Web, allows for setting parameters, executing, and visualizing the results of GeNNet-Wf executions. To demonstrate the features of GeNNet, we performed case studies with data retrieved from GEO, particularly using a single-factor experiment in different analysis scenarios. As a result, we obtained differentially expressed genes for which biological functions were analyzed. The results are integrated into GeNNet-DB, a database about genes, clusters, experiments and their properties and relationships. The resulting graph database is explored with queries that demonstrate the expressiveness of this data model for reasoning about gene interaction networks. GeNNet is the first platform to integrate the analytical process of transcriptome data with graph databases. It provides a comprehensive set of tools that would otherwise be challenging for non-expert users to install and use. Developers can add new functionality to components of GeNNet. The derived data allows for testing previous hypotheses about an experiment and exploring new ones through the interactive graph database environment. It enables the analysis of different data on humans, rhesus, mice and rat coming from Affymetrix platforms. GeNNet is available as an open source platform at https://github.com/raquele/GeNNet and can be retrieved as a software container with the command docker pull quelopes/gennet.

17.
J Cheminform ; 8: 8, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26865863

RESUMO

BACKGROUND: The testing of theoretical models with experimental data is an integral part of the scientific method, and a logical place to search for new ways of stimulating scientific productivity. Often experiment/theory comparisons may be viewed as a workflow comprised of well-defined, rote operations distributed over several distinct computers, as exemplified by the way in which predictions from electronic structure theories are evaluated with results from spectroscopic experiments. For workflows such as this, which may be laborious and time consuming to perform manually, software that could orchestrate the operations and transfer results between computers in a seamless and automated fashion would offer major efficiency gains. Such tools also promise to alter how researchers interact with data outside their field of specialization by, e.g., making raw experimental results more accessible to theorists, and the outputs of theoretical calculations more readily comprehended by experimentalists. RESULTS: An implementation of an automated workflow has been developed for the integrated analysis of data from nuclear magnetic resonance (NMR) experiments and electronic structure calculations. Kepler (Altintas et al. 2004) open source software was used to coordinate the processing and transfer of data at each step of the workflow. This workflow incorporated several open source software components, including electronic structure code to compute NMR parameters, a program to simulate NMR signals, NMR data processing programs, and others. The Kepler software was found to be sufficiently flexible to address several minor implementation challenges without recourse to other software solutions. The automated workflow was demonstrated with data from a [Formula: see text] NMR study of uranyl salts described previously (Cho et al. in J Chem Phys 132:084501, 2010). CONCLUSIONS: The functional implementation of an automated process linking NMR data with electronic structure predictions demonstrates that modern software tools such as Kepler can be used to construct programs that comprehensively manage complex, multi-step scientific workflows spanning several different computers. Automation of the workflow can greatly accelerate the pace of discovery, and allows researchers to focus on the fundamental scientific questions rather than mastery of specialized software and data processing techniques. Future developments that would expand the scope and power of this approach include tools to standardize data and associated metadata formats, and the creation of interactive user interfaces to allow real-time exploration of the effects of program inputs on calculated outputs.

18.
J Proteomics ; 129: 93-97, 2015 Nov 03.
Artigo em Inglês | MEDLINE | ID: mdl-26232110

RESUMO

Selecting the most appropriate surrogate peptides to represent a target protein is a major component of experimental design in Multiple Reaction Monitoring (MRM). Our software PeptidePicker with its v-score remains distinctive in its approach of integrating information about the proteins, their tryptic peptides, and the suitability of these peptides for MRM that is available online in UniProtKB, NCBI's dbSNP, ExPASy, PeptideAtlas, PRIDE, and GPMDB. The scoring algorithm reflects our "best knowledge" for selecting candidate peptides for MRM, based on the uniqueness of the peptide in the targeted proteome, its physiochemical properties, and whether it has previously been observed. Here we present an updated approach where we have already compiled a list of all possible surrogate peptides of the human proteome. Using our stringent selection criteria, the list includes 165k suitable MRM peptides covering 17k proteins of the human reviewed proteins in UniProtKB. Compared to average of 2-4min per protein for retrieving and integrating the information, the precompiled list includes all peptides available instantly. This allows a more cohesive and faster design of a multiplexed MRM experiment and provides insights into evidence for a protein's existence. We will keep this list up-to-date as proteomics data repositories continue to grow. This article is part of a Special Issue entitled: Computational Proteomics.


Assuntos
Bases de Dados de Proteínas , Armazenamento e Recuperação da Informação/métodos , Espectrometria de Massas/métodos , Peptídeos/química , Proteoma/química , Análise de Sequência de Proteína/métodos , Sequência de Aminoácidos , Sistemas de Gerenciamento de Base de Dados , Humanos , Dados de Sequência Molecular , Alinhamento de Sequência/métodos , Interface Usuário-Computador
19.
J Proteomics ; 106: 151-61, 2014 Jun 25.
Artigo em Inglês | MEDLINE | ID: mdl-24769191

RESUMO

One challenge in Multiple Reaction Monitoring (MRM)-based proteomics is to select the most appropriate surrogate peptides to represent a target protein. We present here a software package to automatically generate these most appropriate surrogate peptides for an LC/MRM-MS analysis. Our method integrates information about the proteins, their tryptic peptides, and the suitability of these peptides for MRM which is available online in UniProtKB, NCBI's dbSNP, ExPASy, PeptideAtlas, PRIDE, and GPMDB. The scoring algorithm reflects our knowledge in choosing the best candidate peptides for MRM, based on the uniqueness of the peptide in the targeted proteome, its physiochemical properties, and whether it previously has been observed. The modularity of the workflow allows further extension and additional selection criteria to be incorporated. We have developed a simple Web interface where the researcher provides the protein accession number, the subject organism, and peptide-specific options. Currently, the software is designed for human and mouse proteomes, but additional species can be easily be added. Our software improved the peptide selection by eliminating human error, considering multiple data sources and all of the isoforms of the protein, and resulted in faster peptide selection - approximately 50 proteins per hour compared to 8 per day. BIOLOGICAL SIGNIFICANCE: Compiling a list of optimal surrogate peptides for target proteins to be analyzed by LC/MRM-MS has been a cumbersome process, in which expert researchers retrieved information from different online repositories and used their own reasoning to find the most appropriate peptides. Our scientific workflow automates this process by integrating information from different data sources including UniProt, Global Proteome Machine, NCBI's dbSNP, and PeptideAtlas, simulating the researchers' reasoning, and incorporating their knowledge of how to select the best proteotypic peptides for an MRM analysis. The developed software can help to standardize the selection of peptides, eliminate human error, and increase productivity.


Assuntos
Biologia Computacional/métodos , Peptídeos/química , Proteômica/métodos , Algoritmos , Animais , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas , Camundongos , Modelos Estatísticos , Linguagens de Programação , Proteoma , Reprodutibilidade dos Testes , Software , Tripsina/química , Interface Usuário-Computador , Fluxo de Trabalho
20.
Front Neuroinform ; 3: 35, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19847314

RESUMO

A streamlined scientific workflow system that can track the details of the data processing history is critical for the efficient handling of fundamental routines used in scientific research. In the scientific workflow research community, the information that describes the details of data processing history is referred to as "provenance" which plays an important role in most of the existing workflow management systems. Despite its importance, however, provenance modeling and management is still a relatively new area in the scientific workflow research community. The proper scope, representation, granularity and implementation of a provenance model can vary from domain to domain and pose a number of challenges for an efficient pipeline design. This paper provides a case study on structured provenance modeling and management problems in the neuroimaging domain by introducing the Bio-Swarm-Pipeline. This new model, which is evaluated in the paper through real world scenarios, systematically addresses the provenance scope, representation, granularity, and implementation issues related to the neuroimaging domain. Although this model stems from applications in neuroimaging, the system can potentially be adapted to a wide range of bio-medical application scenarios.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA