Pesquisa | Portal de Pesquisa da BVS

Novel ChIP-seq simulating program with superior versatility: isChIP.

Subkhankulova, Tatiana; Naumenko, Fedor; Tolmachov, Oleg E; Orlov, Yuriy L.

Brief Bioinform ; 22(4)2021 07 20.

Artigo em Inglês | MEDLINE | ID: mdl-33320934

RESUMO

Chromatin immunoprecipitation followed by next-generation sequencing (ChIP-seq) is recognized as an extremely powerful tool to study the interaction of numerous transcription factors and other chromatin-associated proteins with DNA. The core problem in the optimization of ChIP-seq protocol and the following computational data analysis is that a 'true' pattern of binding events for a given protein factor is unknown. Computer simulation of the ChIP-seq process based on 'a-priory known binding template' can contribute to a drastically reduce the number of wet lab experiments and finally help achieve radical optimization of the entire processing pipeline. We present a newly developed ChIP-sequencing simulation algorithm implemented in the novel software, in silico ChIP-seq (isChIP). We demonstrate that isChIP closely approximates real ChIP-seq protocols and is able to model data similar to those obtained from experimental sequencing. We validated isChIP using publicly available datasets generated for well-characterized transcription factors Oct4 and Sox2. Although the novel software is compatible with the Illumina protocols by default, it can also successfully perform simulations with a number of alternative sequencing platforms such as Roche454, Ion Torrent and SOLiD as well as model ChIP -Exo. The versatility of isChIP was demonstrated through modelling a wide range of binding events, including those of transcription factors and chromatin modifiers. We also performed a comparative analysis against a few existing ChIP-seq simulators and showed the fundamental superiority of our model. Due to its ability to utilize known binding templates, isChIP can potentially be employed to help investigators choose the most appropriate analytical software through benchmarking of available ChIP-seq programs and optimize the experimental parameters of ChIP-seq protocol. isChIP software is freely available at https://github.com/fnaumenko/isChIP.

Assuntos

Algoritmos , Sequenciamento de Cromatina por Imunoprecipitação , Simulação por Computador , Software

Novel read density distribution score shows possible aligner artefacts, when mapping a single chromosome.

Naumenko, Fedor M; Abnizova, Irina I; Beka, Nathan; Genaev, Mikhail A; Orlov, Yuriy L.

BMC Genomics ; 19(Suppl 3): 92, 2018 02 09.

Artigo em Inglês | MEDLINE | ID: mdl-29504893

RESUMO

BACKGROUND: The use of artificial data to evaluate the performance of aligners and peak callers not only improves its accuracy and reliability, but also makes it possible to reduce the computational time. One of the natural ways to achieve such time reduction is by mapping a single chromosome. RESULTS: We investigated whether a single chromosome mapping causes any artefacts in the alignments' performances. In this paper, we compared the accuracy of the performance of seven aligners on well-controlled simulated benchmark data which was sampled from a single chromosome and also from a whole genome. We found that commonly used statistical methods are insufficient to evaluate an aligner performance, and applied a novel measure of a read density distribution similarity, which allowed to reveal artefacts in aligners' performances. We also calculated some interesting mismatch statistics, and constructed mismatch frequency distributions along the read. CONCLUSIONS: The generation of artificial data by mapping of reads generated from a single chromosome to a reference chromosome is justified from the point of view of reducing the benchmarking time. The proposed quality assessment method allows to identify the inherent shortcoming of aligners that are not detected by conventional statistical methods, and can affect the quality of alignment of real data.

Assuntos

Artefatos , Mapeamento Cromossômico/métodos , Genômica

Statistical comparison of methods to estimate the error probability in short-read Illumina sequencing.

Abnizova, Irina; Skelly, Tom; Naumenko, Fedor; Whiteford, Nava; Brown, Clive; Cox, Tony.

J Bioinform Comput Biol ; 8(3): 579-91, 2010 Jun.

Artigo em Inglês | MEDLINE | ID: mdl-20556863

RESUMO

As was the case in the beginning of the sequencing era, the new generation of short-read sequencing technologies still requires both accuracy of data processing methods and reliable measures of that accuracy. Inspired by the classic of the genre, the Phred method, we generalized those findings in the area of base quality value calibration. We introduce a simple, straightforward statistically established way to measure the performance of a calibrator, and to find an optimal way to assess its reliability. We illustrate the method by assessing the performance of several calibrators/predictors for Illumina, Genome Analyser 2 (GA2) data. The choice of the best predictor is based on optimization of validity, discriminative ability and discrimination power for several candidate predictors. We applied the method on data from one experimental run for genome of the phage varphiX, and found the best predictor out of ten candidates to be 'Purity', a statistics derived from corrected cluster intensities. The source code for the comparison of the predictors is available from the authors by request.

Assuntos

Algoritmos , Artefatos , Mapeamento Cromossômico/métodos , Interpretação Estatística de Dados , Análise de Sequência de DNA/métodos , Software , Sequência de Bases , Dados de Sequência Molecular

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA