RESUMO
We reported HIVID (high-throughput Viral Integration Detection), a novel experimental and computational method to detect the location of Hepatitis B Virus (HBV) integration breakpoints in Hepatocellular Carcinoma (HCC) genome. In this method, the fragments with HBV sequence were enriched by a set of HBV probes and then processed to high-throughput sequencing. In order to evaluate the performance of HIVID, we compared the results of HIVID with that of whole genome sequencing method (WGS) in 28 HCC tumors. We detected a total of 246 HBV integration breakpoints in HCC genome, 113 out of which were within 400bp upstream or downstream of 125 breakpoints identified by WGS method, covering 89.3% (125/140) of total breakpoints. The integration was located in the gene TERT, MLL4, and CCNE1. In addition, we discovered 133 novel breakpoints missed by WGS method, with 66.7% (10/15) of validation rate. Our study shows HIVID is a cost-effective methodology with high specificity and sensitivity to identify viral integration in human genome.
Assuntos
Carcinoma Hepatocelular/genética , Carcinoma Hepatocelular/virologia , Vírus da Hepatite B/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias Hepáticas/genética , Neoplasias Hepáticas/virologia , Integração Viral , China , Ciclina E/genética , Quebras de DNA , Proteínas de Ligação a DNA/genética , Genoma Humano , Genoma Viral , Sequenciamento de Nucleotídeos em Larga Escala/economia , Histona-Lisina N-Metiltransferase , Humanos , Proteínas Oncogênicas/genética , Telomerase/genéticaRESUMO
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.