Pesquisa | Portal de Pesquisa da BVS Enfermagem

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies.

Edge, Peter; Bafna, Vineet; Bansal, Vikas.

Genome Res ; 27(5): 801-812, 2017 05.

Artigo em Inglês | MEDLINE | ID: mdl-27940952

RESUMO

Many tools have been developed for haplotype assembly-the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types-dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing-we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (â¼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Genômica/métodos , Haplótipos , Análise de Sequência de DNA/métodos , Software , Mapeamento de Sequências Contíguas/normas , Genoma Humano , Genômica/normas , Humanos , Análise de Sequência de DNA/normas

Ultraaccurate genome sequencing and haplotyping of single human cells.

Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun.

Proc Natl Acad Sci U S A ; 114(47): 12512-12517, 2017 11 21.

Artigo em Inglês | MEDLINE | ID: mdl-29078313

RESUMO

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs.

Assuntos

Mapeamento de Sequências Contíguas/métodos , Genoma Humano , Haplótipos , Técnicas Analíticas Microfluídicas/métodos , Análise de Célula Única/métodos , Sequenciamento Completo do Genoma/métodos , Alelos , Linhagem Celular , Mapeamento de Sequências Contíguas/estatística & dados numéricos , Fibroblastos/citologia , Fibroblastos/metabolismo , Antígenos HLA/genética , Antígenos HLA/metabolismo , Humanos , Técnicas Analíticas Microfluídicas/instrumentação , Mutação , Polimorfismo de Nucleotídeo Único , Análise de Célula Única/instrumentação , Sequenciamento Completo do Genoma/instrumentação

InPhaDel: integrative shotgun and proximity-ligation sequencing to phase deletions with single nucleotide polymorphisms.

Patel, Anand; Edge, Peter; Selvaraj, Siddarth; Bansal, Vikas; Bafna, Vineet.

Nucleic Acids Res ; 44(12): e111, 2016 07 08.

Artigo em Inglês | MEDLINE | ID: mdl-27105843

RESUMO

Phasing of single nucleotide (SNV), and structural variations into chromosome-wide haplotypes in humans has been challenging, and required either trio sequencing or restricting phasing to population-based haplotypes. Selvaraj et al demonstrated single individual SNV phasing is possible with proximity ligated (HiC) sequencing. Here, we demonstrate HiC can phase structural variants into phased scaffolds of SNVs. Since HiC data is noisy, and SV calling is challenging, we applied a range of supervised classification techniques, including Support Vector Machines and Random Forest, to phase deletions. Our approach was demonstrated on deletion calls and phasings on the NA12878 human genome. We used three NA12878 chromosomes and simulated chromosomes to train model parameters. The remaining NA12878 chromosomes withheld from training were used to evaluate phasing accuracy. Random Forest had the highest accuracy and correctly phased 86% of the deletions with allele-specific read evidence. Allele-specific read evidence was found for 76% of the deletions. HiC provides significant read evidence for accurately phasing 33% of the deletions. Also, eight of eight top ranked deletions phased by only HiC were validated using long range polymerase chain reaction and Sanger. Thus, deletions from a single individual can be accurately phased using a combination of shotgun and proximity ligation sequencing. InPhaDel software is available at: http://l337x911.github.io/inphadel/.

Assuntos

Variação Estrutural do Genoma , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Polimorfismo de Nucleotídeo Único/genética , Análise de Sequência de DNA/métodos , Algoritmos , Alelos , Genoma Humano , Haplótipos/genética , Humanos , Deleção de Sequência/genética , Software

Widespread contribution of transposable elements to the innovation of gene regulatory networks.

Sundaram, Vasavi; Cheng, Yong; Ma, Zhihai; Li, Daofeng; Xing, Xiaoyun; Edge, Peter; Snyder, Michael P; Wang, Ting.

Genome Res ; 24(12): 1963-76, 2014 Dec.

Artigo em Inglês | MEDLINE | ID: mdl-25319995

RESUMO

Transposable elements (TEs) have been shown to contain functional binding sites for certain transcription factors (TFs). However, the extent to which TEs contribute to the evolution of TF binding sites is not well known. We comprehensively mapped binding sites for 26 pairs of orthologous TFs in two pairs of human and mouse cell lines (representing two cell lineages), along with epigenomic profiles, including DNA methylation and six histone modifications. Overall, we found that 20% of binding sites were embedded within TEs. This number varied across different TFs, ranging from 2% to 40%. We further identified 710 TF-TE relationships in which genomic copies of a TE subfamily contributed a significant number of binding peaks for a TF, and we found that LTR elements dominated these relationships in human. Importantly, TE-derived binding peaks were strongly associated with open and active chromatin signatures, including reduced DNA methylation and increased enhancer-associated histone marks. On average, 66% of TE-derived binding events were cell type-specific with a cell type-specific epigenetic landscape. Most of the binding sites contributed by TEs were species-specific, but we also identified binding sites conserved between human and mouse, the functional relevance of which was supported by a signature of purifying selection on DNA sequences of these TEs. Interestingly, several TFs had significantly expanded binding site landscapes only in one species, which were linked to species-specific gene functions, suggesting that TEs are an important driving force for regulatory innovation. Taken together, our data suggest that TEs have significantly and continuously shaped gene regulatory networks during mammalian evolution.

Assuntos

Elementos de DNA Transponíveis , Redes Reguladoras de Genes , Animais , Sítios de Ligação , Linhagem Celular , Cromatina/genética , Cromatina/metabolismo , Imunoprecipitação da Cromatina , Epigenômica , Evolução Molecular , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Camundongos , Família Multigênica , Motivos de Nucleotídeos , Especificidade de Órgãos/genética , Matrizes de Pontuação de Posição Específica , Ligação Proteica , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo

Longshot enables accurate variant calling in diploid genomes from single-molecule long read sequencing.

Edge, Peter; Bansal, Vikas.

Nat Commun ; 10(1): 4660, 2019 10 11.

Artigo em Inglês | MEDLINE | ID: mdl-31604920

RESUMO

Whole-genome sequencing using sequencing technologies such as Illumina enables the accurate detection of small-scale variants but provides limited information about haplotypes and variants in repetitive regions of the human genome. Single-molecule sequencing (SMS) technologies such as Pacific Biosciences and Oxford Nanopore generate long reads that can potentially address the limitations of short-read sequencing. However, the high error rate of SMS reads makes it challenging to detect small-scale variants in diploid genomes. We introduce a variant calling method, Longshot, which leverages the haplotype information present in SMS reads to accurately detect and phase single-nucleotide variants (SNVs) in diploid genomes. We demonstrate that Longshot achieves very high accuracy for SNV detection using whole-genome Pacific Biosciences data, outperforms existing variant calling methods, and enables variant detection in duplicated regions of the genome that cannot be mapped using short reads.

Assuntos

Polimorfismo de Nucleotídeo Único , Software , Sequenciamento Completo do Genoma/métodos , Algoritmos , Diploide , Genoma Humano , Haplótipos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência de DNA/métodos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA