Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 2 de 2
Filtrar
Mais filtros

Base de dados
Tipo de documento
País/Região como assunto
Ano de publicação
Intervalo de ano de publicação
1.
J Bioinform Comput Biol ; 8(6): 945-65, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21121020

RESUMO

Machine learning and statistical model based classifiers have increasingly been used with more complex and high dimensional biological data obtained from high-throughput technologies. Understanding the impact of various factors associated with large and complex microarray datasets on the predictive performance of classifiers is computationally intensive, under investigated, yet vital in determining the optimal number of biomarkers for various classification purposes aimed towards improved detection, diagnosis, and therapeutic monitoring of diseases. We investigate the impact of microarray based data characteristics on the predictive performance for various classification rules using simulation studies. Our investigation using Random Forest, Support Vector Machines, Linear Discriminant Analysis and k-Nearest Neighbour shows that the predictive performance of classifiers is strongly influenced by training set size, biological and technical variability, replication, fold change and correlation between biomarkers. Optimal number of biomarkers for a classification problem should therefore be estimated taking account of the impact of all these factors. A database of average generalization errors is built for various combinations of these factors. The database of generalization errors can be used for estimating the optimal number of biomarkers for given levels of predictive accuracy as a function of these factors. Examples show that curves from actual biological data resemble that of simulated data with corresponding levels of data characteristics. An R package optBiomarker implementing the method is freely available for academic use from the Comprehensive R Archive Network (http://www.cran.r-project.org/web/packages/optBiomarker/).


Assuntos
Biomarcadores , Biologia Computacional , Inteligência Artificial , Biomarcadores/sangue , Classificação/métodos , Bases de Dados Factuais , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Análise em Microsséries/estatística & dados numéricos , Modelos Estatísticos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos
2.
Philos Trans A Math Phys Eng Sci ; 367(1897): 2471-81, 2009 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-19451103

RESUMO

We provide an insight into the challenge of building and supporting a scientific data infrastructure with reference to our experience working with scientists from computational particle physics and molecular biology. We illustrate how, with modern high-performance computing resources, even small scientific groups can generate huge volumes (petabytes) of valuable scientific data and explain how grid technology can be used to manage, publish, share and curate these data. We describe the DiGS software application, which we have developed to meet the needs of smaller communities and we have highlighted the key elements of its functionality.


Assuntos
Redes de Comunicação de Computadores , Segurança Computacional , Sistemas Computacionais , Armazenamento e Recuperação da Informação , Biologia Molecular/estatística & dados numéricos , Física/estatística & dados numéricos , Reino Unido , Interface Usuário-Computador
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA