RESUMEN
Systems biology requires mathematical tools not only to analyse large genomic datasets, but also to explore large experimental spaces in a systematic yet economical way. We demonstrate that two-factor combinatorial design (CD), shown to be useful in software testing, can be used to design a small set of experiments that would allow biologists to explore larger experimental spaces. Further, the results of an initial set of experiments can be used to seed further 'Adaptive' CD experimental designs. As a proof of principle, we demonstrate the usefulness of this Adaptive CD approach by analysing data from the effects of six binary inputs on the regulation of genes in the N-assimilation pathway of Arabidopsis. This CD approach identified the more important regulatory signals previously discovered by traditional experiments using far fewer experiments, and also identified examples of input interactions previously unknown. Tests using simulated data show that Adaptive CD suffers from fewer false positives than traditional experimental designs in determining decisive inputs, and succeeds far more often than traditional or random experimental designs in determining when genes are regulated by input interactions. We conclude that Adaptive CD offers an economical framework for discovering dominant inputs and interactions that affect different aspects of genomic outputs and organismal responses.
Asunto(s)
Algoritmos , Proteínas de Arabidopsis/metabolismo , Arabidopsis/metabolismo , Modelos Biológicos , Nitrógeno/metabolismo , Transducción de Señal/fisiología , Adaptación Fisiológica/fisiología , Adaptación Fisiológica/efectos de la radiación , Arabidopsis/efectos de la radiación , Técnicas Químicas Combinatorias , Simulación por Computador , Luz , Modelos Logísticos , Sensibilidad y Especificidad , Transducción de Señal/efectos de la radiaciónRESUMEN
We report a simple new algorithm, cis/TF, that uses genomewide expression data and the full genomic sequence to match transcription factors to their binding sites. Most previous computational methods discovered binding sites by clustering genes having similar expression patterns and then identifying over-represented subsequences in the promoter regions of those genes. By contrast, cis/TF asserts that B is a likely binding site of a transcription factor T if the expression pattern of T is correlated to the composite expression patterns of all genes containing B, even when those genes are not mutually correlated. Thus, our method focuses on binding sites rather than genes. The algorithm has successfully identified experimentally-supported transcription factor binding relationships in tests on several data sets from Saccharomyces cerevisiae.