Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 4 de 4
Filtrar
1.
BMC Bioinformatics ; 10: 449, 2009 Dec 29.
Artigo em Inglês | MEDLINE | ID: mdl-20040098

RESUMO

BACKGROUND: Microarrays depend on appropriate probe design to deliver the promise of accurate genome-wide measurement. Probe design, ideally, produces a unique probe-target match with homogeneous duplex stability over the complete set of probes. Much of microarray pre-processing is concerned with adjusting for non-ideal probes that do not report target concentration accurately. Cross-hybridizing probes (non-unique), probe composition and structure, as well as platform effects such as instrument limitations, have been shown to affect the interpretation of signal. Data cleansing pipelines seldom filter specifically for these constraints, relying instead on general statistical tests to remove the most variable probes from the samples in a study. This adjusts probes contributing to ProbeSet (gene) values in a study-specific manner. We refer to the complete set of factors as biologically applied filter levels (BaFL) and have assembled an analysis pipeline for managing them consistently. The pipeline and associated experiments reported here examine the outcome of comprehensively excluding probes affected by known factors on inter-experiment target behavior consistency. RESULTS: We present here a 'white box' probe filtering and intensity transformation protocol that incorporates currently understood factors affecting probe and target interactions; the method has been tested on data from the Affymetrix human GeneChip HG-U95Av2, using two independent datasets from studies of a complex lung adenocarcinoma phenotype. The protocol incorporates probe-specific effects from SNPs, cross-hybridization and low heteroduplex affinity, as well as effects from scanner sensitivity, sample batches, and includes simple statistical tests for identifying unresolved biological factors leading to sample variability. Subsequent to filtering for these factors, the consistency and reliability of the remaining measurements is shown to be markedly improved. CONCLUSIONS: The data cleansing protocol yields reproducible estimates of a given probe or ProbeSet's (gene's) relative expression that translates across datasets, allowing for credible cross-experiment comparisons. We provide supporting evidence for the validity of removing several large classes of probes, and for our approaches for removing outlying samples. The resulting expression profiles demonstrate consistency across the two independent datasets. Finally, we demonstrate that, given an appropriate sampling pool, the method enhances the t-test's statistical power to discriminate significantly different means over sample classes.


Assuntos
Perfilação da Expressão Gênica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software
2.
BMC Bioinformatics ; 7: 74, 2006 Feb 16.
Artigo em Inglês | MEDLINE | ID: mdl-16483359

RESUMO

BACKGROUND: Accurate methods for extraction of meaningful patterns in high dimensional data have become increasingly important with the recent generation of data types containing measurements across thousands of variables. Principal components analysis (PCA) is a linear dimensionality reduction (DR) method that is unsupervised in that it relies only on the data; projections are calculated in Euclidean or a similar linear space and do not use tuning parameters for optimizing the fit to the data. However, relationships within sets of nonlinear data types, such as biological networks or images, are frequently mis-rendered into a low dimensional space by linear methods. Nonlinear methods, in contrast, attempt to model important aspects of the underlying data structure, often requiring parameter(s) fitting to the data type of interest. In many cases, the optimal parameter values vary when different classification algorithms are applied on the same rendered subspace, making the results of such methods highly dependent upon the type of classifier implemented. RESULTS: We present the results of applying the spectral method of Lafon, a nonlinear DR method based on the weighted graph Laplacian, that minimizes the requirements for such parameter optimization for two biological data types. We demonstrate that it is successful in determining implicit ordering of brain slice image data and in classifying separate species in microarray data, as compared to two conventional linear methods and three nonlinear methods (one of which is an alternative spectral method). This spectral implementation is shown to provide more meaningful information, by preserving important relationships, than the methods of DR presented for comparison. Tuning parameter fitting is simple and is a general, rather than data type or experiment specific approach, for the two datasets analyzed here. Tuning parameter optimization is minimized in the DR step to each subsequent classification method, enabling the possibility of valid cross-experiment comparisons. CONCLUSION: Results from the spectral method presented here exhibit the desirable properties of preserving meaningful nonlinear relationships in lower dimensional space and requiring minimal parameter fitting, providing a useful algorithm for purposes of visualization and classification across diverse datasets, a common challenge in systems biology.


Assuntos
Encéfalo/anatomia & histologia , Biologia Computacional/métodos , Interpretação Estatística de Dados , Processamento de Imagem Assistida por Computador/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Algoritmos , Inteligência Artificial , Encéfalo/patologia , Análise por Conglomerados , Gráficos por Computador , Metodologias Computacionais , Fibroblastos/metabolismo , Humanos , Armazenamento e Recuperação da Informação , Modelos Biológicos , Modelos Estatísticos , Distribuição Normal , Reconhecimento Automatizado de Padrão , Análise de Componente Principal , Análise de Regressão , Alinhamento de Sequência
3.
BMC Res Notes ; 6: 511, 2013 Dec 05.
Artigo em Inglês | MEDLINE | ID: mdl-24308566

RESUMO

BACKGROUND: The availability of genetic data has increased dramatically in recent years. The greatest value of this data is its potential for personalized medicine. Many new associations are reported every day from Genome Wide Association Studies (GWAS). However, robust, reproducible associations are elusive for some complex diseases. Ontologies present a potential way to distinguish between spurious associations and those with a potential influence on the phenotype. Such an approach would be based on finding associations of the same genetic variant with closely related, but distinct, phenotypes. This approach can be accomplished with a phenotype ontology that also holds genetic association data. RESULTS: Here, we report a structured knowledge application to navigate and to facilitate the discovery of relationships between different phenotypes and their genetic associations. CONCLUSIONS: OGA allows users to (1) find the intersecting set of genes for phenotypes of interest, (2) find empirical p values for such observations and (3) OGA outperforms similar applications in number of total concepts and genes mapped.


Assuntos
Estudo de Associação Genômica Ampla , Genótipo , Fenótipo , Software , Predisposição Genética para Doença , Genoma Humano , Humanos , Anotação de Sequência Molecular
4.
PLoS One ; 6(9): e24220, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21915301

RESUMO

Genome-wide association studies (GWAS) are a valuable approach to understanding the genetic basis of complex traits. One of the challenges of GWAS is the translation of genetic association results into biological hypotheses suitable for further investigation in the laboratory. To address this challenge, we introduce Network Interface Miner for Multigenic Interactions (NIMMI), a network-based method that combines GWAS data with human protein-protein interaction data (PPI). NIMMI builds biological networks weighted by connectivity, which is estimated by use of a modification of the Google PageRank algorithm. These weights are then combined with genetic association p-values derived from GWAS, producing what we call 'trait prioritized sub-networks.' As a proof of principle, NIMMI was tested on three GWAS datasets previously analyzed for height, a classical polygenic trait. Despite differences in sample size and ancestry, NIMMI captured 95% of the known height associated genes within the top 20% of ranked sub-networks, far better than what could be achieved by a single-locus approach. The top 2% of NIMMI height-prioritized sub-networks were significantly enriched for genes involved in transcription, signal transduction, transport, and gene expression, as well as nucleic acid, phosphate, protein, and zinc metabolism. All of these sub-networks were ranked near the top across all three height GWAS datasets we tested. We also tested NIMMI on a categorical phenotype, Crohn's disease. NIMMI prioritized sub-networks involved in B- and T-cell receptor, chemokine, interleukin, and other pathways consistent with the known autoimmune nature of Crohn's disease. NIMMI is a simple, user-friendly, open-source software tool that efficiently combines genetic association data with biological networks, translating GWAS findings into biological hypotheses.


Assuntos
Estudo de Associação Genômica Ampla/métodos , Quimiocinas/metabolismo , Doença de Crohn/metabolismo , Humanos , Interleucinas/metabolismo , Ligação Proteica , Receptores de Antígenos de Linfócitos T/metabolismo
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA