Búsqueda | BVS CLAP/SMR-OPS/OMS

Integrative DNA copy number detection and genotyping from sequencing and array-based platforms.

Zhou, Zilu; Wang, Weixin; Wang, Li-San; Zhang, Nancy Ruonan.

Bioinformatics ; 34(14): 2349-2355, 2018 07 15.

Artículo en Inglés | MEDLINE | ID: mdl-29992253

RESUMEN

Motivation: Copy number variations (CNVs) are gains and losses of DNA segments and have been associated with disease. Many large-scale genetic association studies are performing CNV analysis using whole exome sequencing (WES) and whole genome sequencing (WGS). In many of these studies, previous single-nucleotide polymorphism (SNP)-array data are available. An integrated cross-platform analysis is expected to improve resolution and accuracy, yet there is no tool for effectively combining data from sequencing and array platforms. The detection of CNVs using sequencing data alone can also be further improved by the utilization of allele-specific reads. Results: We propose a statistical framework, integrated CNV (iCNV) detection algorithm, which can be applied to multiple study designs: WES only, WGS only, SNP array only, or any combination of SNP and sequencing data. iCNV applies platform-specific normalization, utilizes allele specific reads from sequencing and integrates matched NGS and SNP-array data by a hidden Markov model. We compare integrated two-platform CNV detection using iCNV to naïve intersection or union of platforms and show that iCNV increases sensitivity and robustness. We also assess the accuracy of iCNV on WGS data only and show that the utilization of allele-specific reads improve CNV detection accuracy compared to existing methods. Availability and implementation: https://github.com/zhouzilu/iCNV. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Variaciones en el Número de Copia de ADN , Programas Informáticos , Secuenciación Completa del Genoma/métodos , Algoritmos , Alelos , Genómica/métodos , Humanos , Polimorfismo de Nucleótido Simple

Importance sampling of word patterns in DNA and protein sequences.

Chan, Hock Peng; Zhang, Nancy Ruonan; Chen, Louis H Y.

J Comput Biol ; 17(12): 1697-709, 2010 Dec.

Artículo en Inglés | MEDLINE | ID: mdl-21128856

RESUMEN

Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position-specific weight matrices (PSWMs), and co-occurrences of pairs of motifs.

Asunto(s)

Secuencias de Aminoácidos , Reconocimiento de Normas Patrones Automatizadas , Secuencias Reguladoras de Ácidos Nucleicos , Análisis de Secuencia/métodos , Secuencia de Aminoácidos , Secuencia de Bases , Secuencias Invertidas Repetidas , Método de Montecarlo , Posición Específica de Matrices de Puntuación

Ver mas detalles

ENVIAR RESULTADO:

Exportar

Imprimir

RSS

XML

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA