Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 5 de 5
Filtrar
1.
Nat Rev Genet ; 11(10): 733-9, 2010 10.
Artigo em Inglês | MEDLINE | ID: mdl-20838408

RESUMO

High-throughput technologies are widely used, for example to assay genetic variants, gene and protein expression, and epigenetic modifications. One often overlooked complication with such studies is batch effects, which occur because measurements are affected by laboratory conditions, reagent lots and personnel differences. This becomes a major problem when batch effects are correlated with an outcome of interest and lead to incorrect conclusions. Using both published studies and our own analyses, we argue that batch effects (as well as other technical and biological artefacts) are widespread and critical to address. We review experimental and computational approaches for doing so.


Assuntos
Biotecnologia/métodos , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Análise de Sequência de DNA/métodos , Biotecnologia/normas , Biotecnologia/estatística & dados numéricos , Biologia Computacional/métodos , Genômica/normas , Genômica/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/normas , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Publicações Periódicas como Assunto/normas , Projetos de Pesquisa/normas , Projetos de Pesquisa/estatística & dados numéricos , Análise de Sequência de DNA/normas , Análise de Sequência de DNA/estatística & dados numéricos
2.
Nucleic Acids Res ; 39(16): 7020-33, 2011 Sep 01.
Artigo em Inglês | MEDLINE | ID: mdl-21622658

RESUMO

DNA polymerase I (pol I) processes RNA primers during lagging-strand synthesis and fills small gaps during DNA repair reactions. However, it is unclear how pol I and pol III work together during replication and repair or how extensive pol I processing of Okazaki fragments is in vivo. Here, we address these questions by analyzing pol I mutations generated through error-prone replication of ColE1 plasmids. The data were obtained by direct sequencing, allowing an accurate determination of the mutation spectrum and distribution. Pol I's mutational footprint suggests: (i) during leading-strand replication pol I is gradually replaced by pol III over at least 1.3 kb; (ii) pol I processing of Okazaki fragments is limited to ∼20 nt and (iii) the size of Okazaki fragments is short (∼250 nt). While based on ColE1 plasmid replication, our findings are likely relevant to other pol I replicative processes such as chromosomal replication and DNA repair, which differ from ColE1 replication mostly at the recruitment steps. This mutation footprinting approach should help establish the role of other prokaryotic or eukaryotic polymerases in vivo, and provides a tool to investigate how sequence topology, DNA damage, or interactions with protein partners may affect the function of individual DNA polymerases.


Assuntos
DNA Polimerase I/metabolismo , Replicação do DNA , Mutação , Plasmídeos/biossíntese , Sequência de Bases , DNA/metabolismo , Pegada de DNA , DNA Polimerase I/genética , DNA Polimerase I/fisiologia , Bases de Dados de Ácidos Nucleicos , Plasmídeos/química
3.
BMC Syst Biol ; 7: 118, 2013 Nov 01.
Artigo em Inglês | MEDLINE | ID: mdl-24182195

RESUMO

BACKGROUND: Reverse-engineering gene regulatory networks from expression data is difficult, especially without temporal measurements or interventional experiments. In particular, the causal direction of an edge is generally not statistically identifiable, i.e., cannot be inferred as a statistical parameter, even from an unlimited amount of non-time series observational mRNA expression data. Some additional evidence is required and high-throughput methylation data can viewed as a natural multifactorial gene perturbation experiment. RESULTS: We introduce IDEM (Identifying Direction from Expression and Methylation), a method for identifying the causal direction of edges by combining DNA methylation and mRNA transcription data. We describe the circumstances under which edge directions become identifiable and experiments with both real and synthetic data demonstrate that the accuracy of IDEM for inferring both edge placement and edge direction in gene regulatory networks is significantly improved relative to other methods. CONCLUSION: Reverse-engineering directed gene regulatory networks from static observational data becomes feasible by exploiting the context provided by high-throughput DNA methylation data.An implementation of the algorithm described is available at http://code.google.com/p/idem/.


Assuntos
Biologia Computacional/métodos , Metilação de DNA , Perfilação da Expressão Gênica , Redes Reguladoras de Genes , Teorema de Bayes , Técnicas de Silenciamento de Genes , Funções Verossimilhança , Cadeias de Markov , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Reprodutibilidade dos Testes
4.
PLoS One ; 7(11): e47836, 2012.
Artigo em Inglês | MEDLINE | ID: mdl-23144830

RESUMO

A major challenge in molecular biology is reverse-engineering the cis-regulatory logic that plays a major role in the control of gene expression. This program includes searching through DNA sequences to identify "motifs" that serve as the binding sites for transcription factors or, more generally, are predictive of gene expression across cellular conditions. Several approaches have been proposed for de novo motif discovery-searching sequences without prior knowledge of binding sites or nucleotide patterns. However, unbiased validation is not straightforward. We consider two approaches to unbiased validation of discovered motifs: testing the statistical significance of a motif using a DNA "background" sequence model to represent the null hypothesis and measuring performance in predicting membership in gene clusters. We demonstrate that the background models typically used are "too null," resulting in overly optimistic assessments of significance, and argue that performance in predicting TF binding or expression patterns from DNA motifs should be assessed by held-out data, as in predictive learning. Applying this criterion to common motif discovery methods resulted in universally poor performance, although there is a marked improvement when motifs are statistically significant against real background sequences. Moreover, on synthetic data where "ground truth" is known, discriminative performance of all algorithms is far below the theoretical upper bound, with pronounced "over-fitting" in training. A key conclusion from this work is that the failure of de novo discovery approaches to accurately identify motifs is basically due to statistical intractability resulting from the fixed size of co-regulated gene clusters, and thus such failures do not necessarily provide evidence that unfound motifs are not active biologically. Consequently, the use of prior knowledge to enhance motif discovery is not just advantageous but necessary. An implementation of the LR and ALR algorithms is available at http://code.google.com/p/likelihood-ratio-motifs/.


Assuntos
Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico , Análise de Sequência de DNA/métodos , Algoritmos , Área Sob a Curva , Teorema de Bayes , Perfilação da Expressão Gênica , Regulação da Expressão Gênica , Humanos , Modelos Logísticos , Cadeias de Markov , Método de Monte Carlo , Matrizes de Pontuação de Posição Específica , Curva ROC , Saccharomyces cerevisiae/genética , Transcriptoma
5.
Ann Biomed Eng ; 35(6): 1053-67, 2007 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-17377845

RESUMO

One of the goals of systems biology is the identification of regulatory mechanisms that govern an organism's response to external stimuli. Transcription factors have been hypothesized as a major contributor to an organism's response to various outside stimuli, and a great deal of work has been done to predict the set of transcription factors which regulate a given gene. Most of the current methods seek to identify possible binding sites from genomic sequence. Initial attempts at predicting transcription factors from genomic sequences suffered from the problem of false positives. Making the problem more difficult, it has also been shown that while predicted binding sites might be false positives, they can be shown to bind to their corresponding sequences in vitro. One method for rectifying this is through the use of phylogenetic analysis in which only regions which show high evolutionary conservation are analyzed. However such an approach may be too stringent because of the level of degeneracy shown in transcription factor binding site position weight matrices. Due to the degeneracy, there may be only a few bases that need to be conserved across species. Therefore, while a sequence may not show a high level of evolutionary conservation, these sequences may still show high affinity for the same transcription factor. In predicting transcription factor binding we explore the notion that "Co-expression implies co-regulation" [Allocco et al. BMC Bioinformatics 5:18, 2004]. With multiple genes requiring similar transcription factors binding sites, there exists a basis for eliminating false positives. This method allows for the selection of transcription factors binding sites that are active under a given experimental paradigm, thereby allowing us to indirectly incorporate the effects of chromosome and recognition site presentation upon transcription factor binding prediction. Rather than having to rationalize that a few transcription factors binding sites are over-represented in a cluster of genes, one can show that a few transcription factors are active in the cluster of genes that have been grouped together. Although the method focuses on predicting experiment-specific transcription factor binding sites, it is possible that if such a methodology were used in an iterative process where different experiments were analyzed, one could obtain a comprehensive set of transcription factors binding sites which regulate the various dynamic responses shown by biological systems under a variety of conditions hence building a more comprehensive model of transcriptional regulation.


Assuntos
Algoritmos , DNA/química , DNA/genética , Evolução Molecular , Modelos Genéticos , Sequências Reguladoras de Ácido Nucleico/genética , Análise de Sequência de DNA/métodos , Fatores de Transcrição/química , Fatores de Transcrição/genética , Sequência de Bases , Sítios de Ligação , Simulação por Computador , Dados de Sequência Molecular , Filogenia , Ligação Proteica , Alinhamento de Sequência/métodos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA