Pesquisa | BVS IEC

SPRINT: a new parallel framework for R.

Hill, Jon; Hambley, Matthew; Forster, Thorsten; Mewissen, Muriel; Sloan, Terence M; Scharinger, Florian; Trew, Arthur; Ghazal, Peter.

BMC Bioinformatics ; 9: 558, 2008 Dec 29.

Artigo em Inglês | MEDLINE | ID: mdl-19114001

RESUMO

BACKGROUND: Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. RESULTS: We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. CONCLUSION: SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more functions are added to the framework.

Assuntos

Biologia Computacional/métodos , Metodologias Computacionais , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Algoritmos , Animais , Gráficos por Computador , Bases de Dados Genéticas , Perfilação da Expressão Gênica/métodos , Genômica , Humanos , Reconhecimento Automatizado de Padrão , Linguagens de Programação , Interface Usuário-Computador

GPX-Macrophage Expression Atlas: a database for expression profiles of macrophages challenged with a variety of pro-inflammatory, anti-inflammatory, benign and pathogen insults.

Grimes, Graeme R; Moodie, Stuart; Beattie, John S; Craigon, Marie; Dickinson, Paul; Forster, Thorsten; Livingston, Andrew D; Mewissen, Muriel; Robertson, Kevin A; Ross, Alan J; Sing, Garwin; Ghazal, Peter.

BMC Genomics ; 6: 178, 2005 Dec 12.

Artigo em Inglês | MEDLINE | ID: mdl-16343346

RESUMO

BACKGROUND: Macrophages play an integral role in the host immune system, bridging innate and adaptive immunity. As such, they are finely attuned to extracellular and intracellular stimuli and respond by rapidly initiating multiple signalling cascades with diverse effector functions. The macrophage cell is therefore an experimentally and clinically amenable biological system for the mapping of biological pathways. The goal of the macrophage expression atlas is to systematically investigate the pathway biology and interaction network of macrophages challenged with a variety of insults, in particular via infection and activation with key inflammatory mediators. As an important first step towards this we present a single searchable database resource containing high-throughput macrophage gene expression studies. DESCRIPTION: The GPX Macrophage Expression Atlas (GPX-MEA) is an online resource for gene expression based studies of a range of macrophage cell types following treatment with pathogens and immune modulators. GPX-MEA follows the MIAME standard and includes an objective quality score with each experiment. It places special emphasis on rigorously capturing the experimental design and enables the searching of expression data from different microarray experiments. Studies may be queried on the basis of experimental parameters, sample information and quality assessment score. The ability to compare the expression values of individual genes across multiple experiments is provided. In addition, the database offers access to experimental annotation and analysis files and includes experiments and raw data previously unavailable to the research community. CONCLUSION: GPX-MEA is the first example of a quality scored gene expression database focussed on a macrophage cellular system that allows efficient identification of transcriptional patterns. The resource will provide novel insights into the phenotypic response of macrophages to a variety of benign, inflammatory, and pathogen insults. GPX-MEA is available through the GPX website at http://www.gti.ed.ac.uk/GPX.

Assuntos

Biologia Computacional , Bases de Dados Genéticas , Perfilação da Expressão Gênica , Macrófagos/química , Animais , Coleta de Dados , Humanos , Análise em Microsséries/métodos , Controle de Qualidade , Projetos de Pesquisa , Design de Software

PEDRo: a database for storing, searching and disseminating experimental proteomics data.

Garwood, Kevin; McLaughlin, Thomas; Garwood, Chris; Joens, Scott; Morrison, Norman; Taylor, Christopher F; Carroll, Kathleen; Evans, Caroline; Whetton, Anthony D; Hart, Sarah; Stead, David; Yin, Zhikang; Brown, Alistair J P; Hesketh, Andrew; Chater, Keith; Hansson, Lena; Mewissen, Muriel; Ghazal, Peter; Howard, Julie; Lilley, Kathryn S; Gaskell, Simon J; Brass, Andy; Hubbard, Simon J; Oliver, Stephen G; Paton, Norman W.

BMC Genomics ; 5: 68, 2004 Sep 17.

Artigo em Inglês | MEDLINE | ID: mdl-15377392

RESUMO

BACKGROUND: Proteomics is rapidly evolving into a high-throughput technology, in which substantial and systematic studies are conducted on samples from a wide range of physiological, developmental, or pathological conditions. Reference maps from 2D gels are widely circulated. However, there is, as yet, no formally accepted standard representation to support the sharing of proteomics data, and little systematic dissemination of comprehensive proteomic data sets. RESULTS: This paper describes the design, implementation and use of a Proteome Experimental Data Repository (PEDRo), which makes comprehensive proteomics data sets available for browsing, searching and downloading. It is also serves to extend the debate on the level of detail at which proteomics data should be captured, the sorts of facilities that should be provided by proteome data management systems, and the techniques by which such facilities can be made available. CONCLUSIONS: The PEDRo database provides access to a collection of comprehensive descriptions of experimental data sets in proteomics. Not only are these data sets interesting in and of themselves, they also provide a useful early validation of the PEDRo data model, which has served as a starting point for the ongoing standardisation activity through the Proteome Standards Initiative of the Human Proteome Organisation.

Assuntos

Sistemas de Gerenciamento de Base de Dados/tendências , Bases de Dados de Proteínas/tendências , Proteômica , Animais , Proteínas de Arabidopsis/química , Proteínas de Bactérias/química , Candida albicans/química , Candida albicans/patogenicidade , Candida glabrata/química , Candida glabrata/patogenicidade , Biologia Computacional/métodos , Proteínas Fúngicas/química , Proteínas de Helminto/química , Doenças do Jejuno/parasitologia , Camundongos , Proteínas/química , Proteínas de Saccharomyces cerevisiae/química , Design de Software , Streptomyces coelicolor/química , Trichinella spiralis/patogenicidade , Triquinelose/patologia

Parallel classification and feature selection in microarray data using SPRINT.

Mitchell, Lawrence; Sloan, Terence M; Mewissen, Muriel; Ghazal, Peter; Forster, Thorsten; Piotrowski, Michal; Trew, Arthur.

Concurr Comput ; 26(4): 854-865, 2014 Mar 25.

Artigo em Inglês | MEDLINE | ID: mdl-24883047

RESUMO

The statistical language R is favoured by many biostatisticians for processing microarray data. In recent times, the quantity of data that can be obtained in experiments has risen significantly, making previously fast analyses time consuming or even not possible at all with the existing software infrastructure. High performance computing (HPC) systems offer a solution to these problems but at the expense of increased complexity for the end user. The Simple Parallel R Interface is a library for R that aims to reduce the complexity of using HPC systems by providing biostatisticians with drop-in parallelised replacements of existing R functions. In this paper we describe parallel implementations of two popular techniques: exploratory clustering analyses using the random forest classifier and feature selection through identification of differentially expressed genes using the rank product method.

Optimization of a parallel permutation testing function for the SPRINT R package.

Petrou, Savvas; Sloan, Terence M; Mewissen, Muriel; Forster, Thorsten; Piotrowski, Michal; Dobrzelecki, Bartosz; Ghazal, Peter; Trew, Arthur; Hill, Jon.

Concurr Comput ; 23(17): 2258-2268, 2011 Dec 10.

Artigo em Inglês | MEDLINE | ID: mdl-23335858

RESUMO

The statistical language R and its Bioconductor package are favoured by many biostatisticians for processing microarray data. The amount of data produced by some analyses has reached the limits of many common bioinformatics computing infrastructures. High Performance Computing systems offer a solution to this issue. The Simple Parallel R Interface (SPRINT) is a package that provides biostatisticians with easy access to High Performance Computing systems and allows the addition of parallelized functions to R. Previous work has established that the SPRINT implementation of an R permutation testing function has close to optimal scaling on up to 512 processors on a supercomputer. Access to supercomputers, however, is not always possible, and so the work presented here compares the performance of the SPRINT implementation on a supercomputer with benchmarks on a range of platforms including cloud resources and a common desktop machine with multiprocessing capabilities. Copyright © 2011 John Wiley & Sons, Ltd.

Multi-factorial analysis of class prediction error: estimating optimal number of biomarkers for various classification rules.

Khondoker, Mizanur R; Bachmann, Till T; Mewissen, Muriel; Dickinson, Paul; Dobrzelecki, Bartosz; Campbell, Colin J; Mount, Andrew R; Walton, Anthony J; Crain, Jason; Schulze, Holger; Giraud, Gerard; Ross, Alan J; Ciani, Ilenia; Ember, Stuart W J; Tlili, Chaker; Terry, Jonathan G; Grant, Eilidh; McDonnell, Nicola; Ghazal, Peter.

J Bioinform Comput Biol ; 8(6): 945-65, 2010 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-21121020

RESUMO

Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).

Assuntos

Biomarcadores , Biologia Computacional , Inteligência Artificial , Biomarcadores/sangue , Classificação/métodos , Bases de Dados Factuais , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Análise em Microsséries/estatística & dados numéricos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA