Search | VHL Regional Portal

1.

Phasing Diploid Genome Assembly Graphs with Single-Cell Strand Sequencing.

Henglin, Mir; Ghareghani, Maryam; Harvey, William; Porubsky, David; Koren, Sergey; Eichler, Evan E; Ebert, Peter; Marschall, Tobias.

bioRxiv ; 2024 Feb 16.

Article in English | MEDLINE | ID: mdl-38529499

ABSTRACT

Haplotype information is crucial for biomedical and population genetics research. However, current strategies to produce de-novo haplotype-resolved assemblies often require either difficult-to-acquire parental data or an intermediate haplotype-collapsed assembly. Here, we present Graphasing, a workflow which synthesizes the global phase signal of Strand-seq with assembly graph topology to produce chromosome-scale de-novo haplotypes for diploid genomes. Graphasing readily integrates with any assembly workflow that both outputs an assembly graph and has a haplotype assembly mode. Graphasing performs comparably to trio-phasing in contiguity, phasing accuracy, and assembly quality, outperforms Hi-C in phasing accuracy, and generates human assemblies with over 18 chromosome-spanning haplotypes.

2.

Systematic discovery of neoepitope-HLA pairs for neoantigens shared among patients and tumor types.

Gurung, Hem R; Heidersbach, Amy J; Darwish, Martine; Chan, Pamela Pui Fung; Li, Jenny; Beresini, Maureen; Zill, Oliver A; Wallace, Andrew; Tong, Ann-Jay; Hascall, Dan; Torres, Eric; Chang, Andy; Lou, Kenny 'Hei-Wai'; Abdolazimi, Yassan; Hammer, Christian; Xavier-Magalhães, Ana; Marcu, Ana; Vaidya, Samir; Le, Daniel D; Akhmetzyanova, Ilseyar; Oh, Soyoung A; Moore, Amanda J; Uche, Uzodinma N; Laur, Melanie B; Notturno, Richard J; Ebert, Peter J R; Blanchette, Craig; Haley, Benjamin; Rose, Christopher M.

Nat Biotechnol ; 2023 Oct 19.

Article in English | MEDLINE | ID: mdl-37857725

ABSTRACT

The broad application of precision cancer immunotherapies is limited by the number of validated neoepitopes that are common among patients or tumor types. To expand the known repertoire of shared neoantigen-human leukocyte antigen (HLA) complexes, we developed a high-throughput platform that coupled an in vitro peptide-HLA binding assay with engineered cellular models expressing individual HLA alleles in combination with a concatenated transgene harboring 47 common cancer neoantigens. From more than 24,000 possible neoepitope-HLA combinations, biochemical and computational assessment yielded 844 unique candidates, of which 86 were verified after immunoprecipitation mass spectrometry analyses of engineered, monoallelic cell lines. To evaluate the potential for immunogenicity, we identified T cell receptors that recognized select neoepitope-HLA pairs and elicited a response after introduction into human T cells. These cellular systems and our data on therapeutically relevant neoepitopes in their HLA contexts will aid researchers studying antigen processing as well as neoepitope targeting therapies.

3.

Assembly of 43 human Y chromosomes reveals extensive complexity and variation.

Hallast, Pille; Ebert, Peter; Loftus, Mark; Yilmaz, Feyza; Audano, Peter A; Logsdon, Glennis A; Bonder, Marc Jan; Zhou, Weichen; Höps, Wolfram; Kim, Kwondo; Li, Chong; Hoyt, Savannah J; Dishuck, Philip C; Porubsky, David; Tsetsos, Fotios; Kwon, Jee Young; Zhu, Qihui; Munson, Katherine M; Hasenfeld, Patrick; Harvey, William T; Lewis, Alexandra P; Kordosky, Jennifer; Hoekzema, Kendra; O'Neill, Rachel J; Korbel, Jan O; Tyler-Smith, Chris; Eichler, Evan E; Shi, Xinghua; Beck, Christine R; Marschall, Tobias; Konkel, Miriam K; Lee, Charles.

Nature ; 621(7978): 355-364, 2023 Sep.

Article in English | MEDLINE | ID: mdl-37612510

ABSTRACT

The prevalence of highly repetitive sequences within the human Y chromosome has prevented its complete assembly to date1 and led to its systematic omission from genomic analyses. Here we present de novo assemblies of 43 Y chromosomes spanning 182,900 years of human evolution and report considerable diversity in size and structure. Half of the male-specific euchromatic region is subject to large inversions with a greater than twofold higher recurrence rate compared with all other chromosomes2. Ampliconic sequences associated with these inversions show differing mutation rates that are sequence context dependent, and some ampliconic genes exhibit evidence for concerted evolution with the acquisition and purging of lineage-specific pseudogenes. The largest heterochromatic region in the human genome, Yq12, is composed of alternating repeat arrays that show extensive variation in the number, size and distribution, but retain a 1:1 copy-number ratio. Finally, our data suggest that the boundary between the recombining pseudoautosomal region 1 and the non-recombining portions of the X and Y chromosomes lies 500 kb away from the currently established1 boundary. The availability of fully sequence-resolved Y chromosomes from multiple individuals provides a unique opportunity for identifying new associations of traits with specific Y-chromosomal variants and garnering insights into the evolution and function of complex regions of the human genome.

Subject(s)

Chromosomes, Human, Y , Evolution, Molecular , Humans , Male , Chromosomes, Human, Y/genetics , Genome, Human/genetics , Genomics , Mutation Rate , Phenotype , Euchromatin/genetics , Pseudogenes , Genetic Variation/genetics , Chromosomes, Human, X/genetics , Pseudoautosomal Regions/genetics

4.

Multimodal, broadly neutralizing antibodies against SARS-CoV-2 identified by high-throughput native pairing of BCRs from bulk B cells.

Keitany, Gladys J; Rubin, Benjamin E R; Garrett, Meghan E; Musa, Andrea; Tracy, Jeff; Liang, Yu; Ebert, Peter; Moore, Amanda J; Guan, Jonathan; Eggers, Erica; Lescano, Ninnia; Brown, Ryan; Carbo, Adria; Al-Asadi, Hussein; Ching, Travers; Day, Austin; Harris, Rebecca; Linkem, Charles; Popov, Dimitry; Wilkins, Courtney; Li, Lianqu; Wang, Jiao; Liu, Chuanxin; Chen, Li; Dines, Jennifer N; Atyeo, Caroline; Alter, Galit; Baldo, Lance; Sherwood, Anna; Howie, Bryan; Klinger, Mark; Yusko, Erik; Robins, Harlan S; Benzeno, Sharon; Gilbert, Amy E.

Cell Chem Biol ; 30(11): 1377-1389.e8, 2023 11 16.

Article in English | MEDLINE | ID: mdl-37586370

ABSTRACT

TruAB Discovery is an approach that integrates cellular immunology, high-throughput immunosequencing, bioinformatics, and computational biology in order to discover naturally occurring human antibodies for prophylactic or therapeutic use. We adapted our previously described pairSEQ technology to pair B cell receptor heavy and light chains of SARS-CoV-2 spike protein-binding antibodies derived from enriched antigen-specific memory B cells and bulk antibody-secreting cells. We identified approximately 60,000 productive, in-frame, paired antibody sequences, from which 2,093 antibodies were selected for functional evaluation based on abundance, isotype and patterns of somatic hypermutation. The exceptionally diverse antibodies included RBD-binders with broad neutralizing activity against SARS-CoV-2 variants, and S2-binders with broad specificity against betacoronaviruses and the ability to block membrane fusion. A subset of these RBD- and S2-binding antibodies demonstrated robust protection against challenge in hamster and mouse models. This high-throughput approach can accelerate discovery of diverse, multifunctional antibodies against any target of interest.

Subject(s)

COVID-19 , SARS-CoV-2 , Animals , Mice , Humans , Antibodies, Neutralizing , Broadly Neutralizing Antibodies , Antibodies, Viral

5.

Alterations in the hepatocyte epigenetic landscape in steatosis.

Maji, Ranjan Kumar; Czepukojc, Beate; Scherer, Michael; Tierling, Sascha; Cadenas, Cristina; Gianmoena, Kathrin; Gasparoni, Nina; Nordström, Karl; Gasparoni, Gilles; Laggai, Stephan; Yang, Xinyi; Sinha, Anupam; Ebert, Peter; Falk-Paulsen, Maren; Kinkley, Sarah; Hoppstädter, Jessica; Chung, Ho-Ryun; Rosenstiel, Philip; Hengstler, Jan G; Walter, Jörn; Schulz, Marcel H; Kessler, Sonja M; Kiemer, Alexandra K.

Epigenetics Chromatin ; 16(1): 30, 2023 Jul 06.

Article in English | MEDLINE | ID: mdl-37415213

ABSTRACT

Fatty liver disease or the accumulation of fat in the liver, has been reported to affect the global population. This comes with an increased risk for the development of fibrosis, cirrhosis, and hepatocellular carcinoma. Yet, little is known about the effects of a diet containing high fat and alcohol towards epigenetic aging, with respect to changes in transcriptional and epigenomic profiles. In this study, we took up a multi-omics approach and integrated gene expression, methylation signals, and chromatin signals to study the epigenomic effects of a high-fat and alcohol-containing diet on mouse hepatocytes. We identified four relevant gene network clusters that were associated with relevant pathways that promote steatosis. Using a machine learning approach, we predict specific transcription factors that might be responsible to modulate the functionally relevant clusters. Finally, we discover four additional CpG loci and validate aging-related differential CpG methylation. Differential CpG methylation linked to aging showed minimal overlap with altered methylation in steatosis.

Subject(s)

Epigenomics , Hepatocytes , Mice , Animals , Hepatocytes/metabolism , Liver/metabolism , Ethanol , Epigenesis, Genetic , DNA Methylation

6.

Gaps and complex structurally variant loci in phased genome assemblies.

Porubsky, David; Vollger, Mitchell R; Harvey, William T; Rozanski, Allison N; Ebert, Peter; Hickey, Glenn; Hasenfeld, Patrick; Sanders, Ashley D; Stober, Catherine; Korbel, Jan O; Paten, Benedict; Marschall, Tobias; Eichler, Evan E.

Genome Res ; 33(4): 496-510, 2023 04.

Article in English | MEDLINE | ID: mdl-37164484

ABSTRACT

There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.

Subject(s)

DNA, Satellite , Polymorphism, Genetic , Humans , DNA, Satellite/genetics , Haplotypes , Segmental Duplications, Genomic , Sequence Analysis, DNA

7.

Whole-genome long-read sequencing downsampling and its effect on variant calling precision and recall.

Harvey, William T; Ebert, Peter; Ebler, Jana; Audano, Peter A; Munson, Katherine M; Hoekzema, Kendra; Porubsky, David; Beck, Christine R; Marschall, Tobias; Garimella, Kiran; Eichler, Evan E.

bioRxiv ; 2023 May 04.

Article in English | MEDLINE | ID: mdl-37205567

ABSTRACT

Advances in long-read sequencing (LRS) technology continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant calling precision and recall of Oxford Nanopore Technologies (ONT) and PacBio HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant calling precision and recall of SVs and indels in HiFi datasets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant callsets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

8.

Read-Based Phasing and Analysis of Phased Variants with WhatsHap.

Martin, Marcel; Ebert, Peter; Marschall, Tobias.

Methods Mol Biol ; 2590: 127-138, 2023.

Article in English | MEDLINE | ID: mdl-36335496

ABSTRACT

WhatsHap is a command-line tool for phasing and phasing-related tasks. It allows to infer haplotypes in diploid and polyploid samples based on (preferably long) reads covering at least two heterozygous variants. It offers additional tools for working with phased variant calls such as computing statistics, comparing different phasings and assigning reads in alignment files to their haplotype.

Subject(s)

Diploidy , Polyploidy , Humans , Sequence Analysis, DNA , Haplotypes/genetics , Heterozygote , Algorithms , Polymorphism, Single Nucleotide

9.

Whole-genome long-read sequencing downsampling and its effect on variant-calling precision and recall.

Harvey, William T; Ebert, Peter; Ebler, Jana; Audano, Peter A; Munson, Katherine M; Hoekzema, Kendra; Porubsky, David; Beck, Christine R; Marschall, Tobias; Garimella, Kiran; Eichler, Evan E.

Genome Res ; 33(12): 2029-2040, 2023 Dec 27.

Article in English | MEDLINE | ID: mdl-38190646

ABSTRACT

Advances in long-read sequencing (LRS) technologies continue to make whole-genome sequencing more complete, affordable, and accurate. LRS provides significant advantages over short-read sequencing approaches, including phased de novo genome assembly, access to previously excluded genomic regions, and discovery of more complex structural variants (SVs) associated with disease. Limitations remain with respect to cost, scalability, and platform-dependent read accuracy and the tradeoffs between sequence coverage and sensitivity of variant discovery are important experimental considerations for the application of LRS. We compare the genetic variant-calling precision and recall of Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio) HiFi platforms over a range of sequence coverages. For read-based applications, LRS sensitivity begins to plateau around 12-fold coverage with a majority of variants called with reasonable accuracy (F1 score above 0.5), and both platforms perform well for SV detection. Genome assembly increases variant-calling precision and recall of SVs and indels in HiFi data sets with HiFi outperforming ONT in quality as measured by the F1 score of assembly-based variant call sets. While both technologies continue to evolve, our work offers guidance to design cost-effective experimental strategies that do not compromise on discovering novel biology.

Subject(s)

Genomics , Nanopores , INDEL Mutation , Whole Genome Sequencing

10.

Benchmarking challenging small variants with linked and long reads.

Wagner, Justin; Olson, Nathan D; Harris, Lindsay; Khan, Ziad; Farek, Jesse; Mahmoud, Medhat; Stankovic, Ana; Kovacevic, Vladimir; Yoo, Byunggil; Miller, Neil; Rosenfeld, Jeffrey A; Ni, Bohan; Zarate, Samantha; Kirsche, Melanie; Aganezov, Sergey; Schatz, Michael C; Narzisi, Giuseppe; Byrska-Bishop, Marta; Clarke, Wayne; Evani, Uday S; Markello, Charles; Shafin, Kishwar; Zhou, Xin; Sidow, Arend; Bansal, Vikas; Ebert, Peter; Marschall, Tobias; Lansdorp, Peter; Hanlon, Vincent; Mattsson, Carl-Adam; Barrio, Alvaro Martinez; Fiddes, Ian T; Xiao, Chunlin; Fungtammasan, Arkarachai; Chin, Chen-Shan; Wenger, Aaron M; Rowell, William J; Sedlazeck, Fritz J; Carroll, Andrew; Salit, Marc; Zook, Justin M.

Cell Genom ; 2(5)2022 May.

Article in English | MEDLINE | ID: mdl-36452119

ABSTRACT

Genome in a Bottle benchmarks are widely used to help validate clinical sequencing pipelines and develop variant calling and sequencing methods. Here we use accurate linked and long reads to expand benchmarks in 7 samples to include difficult-to-map regions and segmental duplications that are challenging for short reads. These benchmarks add more than 300,000 SNVs and 50,000 insertions or deletions (indels) and include 16% more exonic variants, many in challenging, clinically relevant genes not covered previously, such as PMS2. For HG002, we include 92% of the autosomal GRCh38 assembly while excluding regions problematic for benchmarking small variants, such as copy number variants, that should not have been in the previous version, which included 85% of GRCh38. It identifies eight times more false negatives in a short read variant call set relative to our previous benchmark. We demonstrate that this benchmark reliably identifies false positives and false negatives across technologies, enabling ongoing methods development.

11.

Semi-automated assembly of high-quality diploid human reference genomes.

Jarvis, Erich D; Formenti, Giulio; Rhie, Arang; Guarracino, Andrea; Yang, Chentao; Wood, Jonathan; Tracey, Alan; Thibaud-Nissen, Francoise; Vollger, Mitchell R; Porubsky, David; Cheng, Haoyu; Asri, Mobin; Logsdon, Glennis A; Carnevali, Paolo; Chaisson, Mark J P; Chin, Chen-Shan; Cody, Sarah; Collins, Joanna; Ebert, Peter; Escalona, Merly; Fedrigo, Olivier; Fulton, Robert S; Fulton, Lucinda L; Garg, Shilpa; Gerton, Jennifer L; Ghurye, Jay; Granat, Anastasiya; Green, Richard E; Harvey, William; Hasenfeld, Patrick; Hastie, Alex; Haukness, Marina; Jaeger, Erich B; Jain, Miten; Kirsche, Melanie; Kolmogorov, Mikhail; Korbel, Jan O; Koren, Sergey; Korlach, Jonas; Lee, Joyce; Li, Daofeng; Lindsay, Tina; Lucas, Julian; Luo, Feng; Marschall, Tobias; Mitchell, Matthew W; McDaniel, Jennifer; Nie, Fan; Olsen, Hugh E; Olson, Nathan D.

Nature ; 611(7936): 519-531, 2022 Nov.

Article in English | MEDLINE | ID: mdl-36261518

ABSTRACT

The current human reference genome, GRCh38, represents over 20 years of effort to generate a high-quality assembly, which has benefitted society1,2. However, it still has many gaps and errors, and does not represent a biological genome as it is a blend of multiple individuals3,4. Recently, a high-quality telomere-to-telomere reference, CHM13, was generated with the latest long-read technologies, but it was derived from a hydatidiform mole cell line with a nearly homozygous genome5. To address these limitations, the Human Pangenome Reference Consortium formed with the goal of creating high-quality, cost-effective, diploid genome assemblies for a pangenome reference that represents human genetic diversity6. Here, in our first scientific report, we determined which combination of current genome sequencing and assembly approaches yield the most complete and accurate diploid genome assembly with minimal manual curation. Approaches that used highly accurate long reads and parent-child data with graph-based haplotype phasing during assembly outperformed those that did not. Developing a combination of the top-performing methods, we generated our first high-quality diploid reference assembly, containing only approximately four gaps per chromosome on average, with most chromosomes within ±1% of the length of CHM13. Nearly 48% of protein-coding genes have non-synonymous amino acid changes between haplotypes, and centromeric regions showed the highest diversity. Our findings serve as a foundation for assembling near-complete diploid human genomes at scale for a pangenome reference to capture global genetic variation from single nucleotides to structural rearrangements.

Subject(s)

Chromosome Mapping , Diploidy , Genome, Human , Genomics , Humans , Chromosome Mapping/standards , Genome, Human/genetics , Haplotypes/genetics , High-Throughput Nucleotide Sequencing/methods , High-Throughput Nucleotide Sequencing/standards , Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Reference Standards , Genomics/methods , Genomics/standards , Chromosomes, Human/genetics , Genetic Variation/genetics

12.

Recurrent inversion polymorphisms in humans associate with genetic instability and genomic disorders.

Porubsky, David; Höps, Wolfram; Ashraf, Hufsah; Hsieh, PingHsun; Rodriguez-Martin, Bernardo; Yilmaz, Feyza; Ebler, Jana; Hallast, Pille; Maria Maggiolini, Flavia Angela; Harvey, William T; Henning, Barbara; Audano, Peter A; Gordon, David S; Ebert, Peter; Hasenfeld, Patrick; Benito, Eva; Zhu, Qihui; Lee, Charles; Antonacci, Francesca; Steinrücken, Matthias; Beck, Christine R; Sanders, Ashley D; Marschall, Tobias; Eichler, Evan E; Korbel, Jan O.

Cell ; 185(11): 1986-2005.e26, 2022 05 26.

Article in English | MEDLINE | ID: mdl-35525246

ABSTRACT

Unlike copy number variants (CNVs), inversions remain an underexplored genetic variation class. By integrating multiple genomic technologies, we discover 729 inversions in 41 human genomes. Approximately 85% of inversions <2 kbp form by twin-priming during L1 retrotransposition; 80% of the larger inversions are balanced and affect twice as many nucleotides as CNVs. Balanced inversions show an excess of common variants, and 72% are flanked by segmental duplications (SDs) or retrotransposons. Since flanking repeats promote non-allelic homologous recombination, we developed complementary approaches to identify recurrent inversion formation. We describe 40 recurrent inversions encompassing 0.6% of the genome, showing inversion rates up to 2.7 × 10-4 per locus per generation. Recurrent inversions exhibit a sex-chromosomal bias and co-localize with genomic disorder critical regions. We propose that inversion recurrence results in an elevated number of heterozygous carriers and structural SD diversity, which increases mutability in the population and predisposes specific haplotypes to disease-causing CNVs.

Subject(s)

Chromosome Inversion , Segmental Duplications, Genomic , Chromosome Inversion/genetics , DNA Copy Number Variations/genetics , Genome, Human , Genomics , Humans

13.

Pangenome-based genome inference allows efficient and accurate genotyping across a wide spectrum of variant classes.

Ebler, Jana; Ebert, Peter; Clarke, Wayne E; Rausch, Tobias; Audano, Peter A; Houwaart, Torsten; Mao, Yafei; Korbel, Jan O; Eichler, Evan E; Zody, Michael C; Dilthey, Alexander T; Marschall, Tobias.

Nat Genet ; 54(4): 518-525, 2022 04.

Article in English | MEDLINE | ID: mdl-35410384

ABSTRACT

Typical genotyping workflows map reads to a reference genome before identifying genetic variants. Generating such alignments introduces reference biases and comes with substantial computational burden. Furthermore, short-read lengths limit the ability to characterize repetitive genomic regions, which are particularly challenging for fast k-mer-based genotypers. In the present study, we propose a new algorithm, PanGenie, that leverages a haplotype-resolved pangenome reference together with k-mer counts from short-read sequencing data to genotype a wide spectrum of genetic variation-a process we refer to as genome inference. Compared with mapping-based approaches, PanGenie is more than 4 times faster at 30-fold coverage and achieves better genotype concordances for almost all variant types and coverages tested. Improvements are especially pronounced for large insertions (≥50 bp) and variants in repetitive regions, enabling the inclusion of these classes of variants in genome-wide association studies. PanGenie efficiently leverages the increasing amount of haplotype-resolved assemblies to unravel the functional impact of previously inaccessible variants while being faster compared with alignment-based workflows.

Subject(s)

Genetic Variation , Genome, Human , Genomics , Algorithms , Genome, Human/genetics , Genome-Wide Association Study , Genomics/methods , Genotype , High-Throughput Nucleotide Sequencing , Humans , Sequence Analysis, DNA

14.

ASHLEYS: automated quality control for single-cell Strand-seq data.

Gros, Christina; Sanders, Ashley D; Korbel, Jan O; Marschall, Tobias; Ebert, Peter.

Bioinformatics ; 37(19): 3356-3357, 2021 Oct 11.

Article in English | MEDLINE | ID: mdl-33792647

ABSTRACT

SUMMARY: Single-cell DNA template strand sequencing (Strand-seq) enables chromosome length haplotype phasing, construction of phased assemblies, mapping sister-chromatid exchange events and structural variant discovery. The initial quality control of potentially thousands of single-cell libraries is still done manually by domain experts. ASHLEYS automates this tedious task, delivers near-expert performance and labels even large datasets in seconds. AVAILABILITY AND IMPLEMENTATION: github.com/friendsofstrandseq/ashleys-qc, MIT license. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

15.

Haplotype-resolved diverse human genomes and integrated analysis of structural variation.

Ebert, Peter; Audano, Peter A; Zhu, Qihui; Rodriguez-Martin, Bernardo; Porubsky, David; Bonder, Marc Jan; Sulovari, Arvis; Ebler, Jana; Zhou, Weichen; Serra Mari, Rebecca; Yilmaz, Feyza; Zhao, Xuefang; Hsieh, PingHsun; Lee, Joyce; Kumar, Sushant; Lin, Jiadong; Rausch, Tobias; Chen, Yu; Ren, Jingwen; Santamarina, Martin; Höps, Wolfram; Ashraf, Hufsah; Chuang, Nelson T; Yang, Xiaofei; Munson, Katherine M; Lewis, Alexandra P; Fairley, Susan; Tallon, Luke J; Clarke, Wayne E; Basile, Anna O; Byrska-Bishop, Marta; Corvelo, André; Evani, Uday S; Lu, Tsung-Yu; Chaisson, Mark J P; Chen, Junjie; Li, Chong; Brand, Harrison; Wenger, Aaron M; Ghareghani, Maryam; Harvey, William T; Raeder, Benjamin; Hasenfeld, Patrick; Regier, Allison A; Abel, Haley J; Hall, Ira M; Flicek, Paul; Stegle, Oliver; Gerstein, Mark B; Tubio, Jose M C.

Science ; 372(6537)2021 04 02.

Article in English | MEDLINE | ID: mdl-33632895

ABSTRACT

Long-read and strand-specific sequencing technologies together facilitate the de novo assembly of high-quality haplotype-resolved human genomes without parent-child trio data. We present 64 assembled haplotypes from 32 diverse human genomes. These highly contiguous haplotype assemblies (average minimum contig length needed to cover 50% of the genome: 26 million base pairs) integrate all forms of genetic variation, even across complex loci. We identified 107,590 structural variants (SVs), of which 68% were not discovered with short-read sequencing, and 278 SV hotspots (spanning megabases of gene-rich sequence). We characterized 130 of the most active mobile element source elements and found that 63% of all SVs arise through homology-mediated mechanisms. This resource enables reliable graph-based genotyping from short reads of up to 50,340 SVs, resulting in the identification of 1526 expression quantitative trait loci as well as SV candidates for adaptive selection within the human population.

Subject(s)

Genetic Variation , Genome, Human , Haplotypes , Female , Genotype , High-Throughput Nucleotide Sequencing , Humans , INDEL Mutation , Interspersed Repetitive Sequences , Male , Population Groups/genetics , Quantitative Trait Loci , Retroelements , Sequence Analysis, DNA , Sequence Inversion , Whole Genome Sequencing

16.

Fast detection of differential chromatin domains with SCIDDO.

Ebert, Peter; Schulz, Marcel H.

Bioinformatics ; 37(9): 1198-1205, 2021 06 09.

Article in English | MEDLINE | ID: mdl-33232443

ABSTRACT

MOTIVATION: The generation of genome-wide maps of histone modifications using chromatin immunoprecipitation sequencing is a standard approach to dissect the complexity of the epigenome. Interpretation and differential analysis of histone datasets remains challenging due to regulatory meaningful co-occurrences of histone marks and their difference in genomic spread. To ease interpretation, chromatin state segmentation maps are a commonly employed abstraction combining individual histone marks. We developed the tool SCIDDO as a fast, flexible and statistically sound method for the differential analysis of chromatin state segmentation maps. RESULTS: We demonstrate the utility of SCIDDO in a comparative analysis that identifies differential chromatin domains (DCD) in various regulatory contexts and with only moderate computational resources. We show that the identified DCDs correlate well with observed changes in gene expression and can recover a substantial number of differentially expressed genes (DEGs). We showcase SCIDDO's ability to directly interrogate chromatin dynamics, such as enhancer switches in downstream analysis, which simplifies exploring specific questions about regulatory changes in chromatin. By comparing SCIDDO to competing methods, we provide evidence that SCIDDO's performance in identifying DEGs via differential chromatin marking is more stable across a range of cell-type comparisons and parameter cut-offs. AVAILABILITY AND IMPLEMENTATION: The SCIDDO source code is openly available under github.com/ptrebert/sciddo. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Chromatin , Chromosomes , Chromatin Immunoprecipitation , Genome , Histone Code

17.

Fully phased human genome assembly without parental data using single-cell strand sequencing and long reads.

Porubsky, David; Ebert, Peter; Audano, Peter A; Vollger, Mitchell R; Harvey, William T; Marijon, Pierre; Ebler, Jana; Munson, Katherine M; Sorensen, Melanie; Sulovari, Arvis; Haukness, Marina; Ghareghani, Maryam; Lansdorp, Peter M; Paten, Benedict; Devine, Scott E; Sanders, Ashley D; Lee, Charles; Chaisson, Mark J P; Korbel, Jan O; Eichler, Evan E; Marschall, Tobias.

Nat Biotechnol ; 39(3): 302-308, 2021 03.

Article in English | MEDLINE | ID: mdl-33288906

ABSTRACT

Human genomes are typically assembled as consensus sequences that lack information on parental haplotypes. Here we describe a reference-free workflow for diploid de novo genome assembly that combines the chromosome-wide phasing and scaffolding capabilities of single-cell strand sequencing1,2 with continuous long-read or high-fidelity3 sequencing data. Employing this strategy, we produced a completely phased de novo genome assembly for each haplotype of an individual of Puerto Rican descent (HG00733) in the absence of parental data. The assemblies are accurate (quality value > 40) and highly contiguous (contig N50 > 23 Mbp) with low switch error rates (0.17%), providing fully phased single-nucleotide variants, indels and structural variants. A comparison of Oxford Nanopore Technologies and Pacific Biosciences phased assemblies identified 154 regions that are preferential sites of contig breaks, irrespective of sequencing technology or phasing algorithms.

Subject(s)

Genome, Human , High-Throughput Nucleotide Sequencing/methods , Parents , Sequence Analysis, DNA/methods , Single-Cell Analysis/methods , Algorithms , Haplotypes , Humans , Puerto Rico/ethnology

18.

Magnitude and Dynamics of the T-Cell Response to SARS-CoV-2 Infection at Both Individual and Population Levels.

Snyder, Thomas M; Gittelman, Rachel M; Klinger, Mark; May, Damon H; Osborne, Edward J; Taniguchi, Ruth; Zahid, H Jabran; Kaplan, Ian M; Dines, Jennifer N; Noakes, Matthew T; Pandya, Ravi; Chen, Xiaoyu; Elasady, Summer; Svejnoha, Emily; Ebert, Peter; Pesesky, Mitchell W; De Almeida, Patricia; O'Donnell, Hope; DeGottardi, Quinn; Keitany, Gladys; Lu, Jennifer; Vong, Allen; Elyanow, Rebecca; Fields, Paul; Greissl, Julia; Baldo, Lance; Semprini, Simona; Cerchione, Claudio; Nicolini, Fabio; Mazza, Massimiliano; Delmonte, Ottavia M; Dobbs, Kerry; Laguna-Goya, Rocio; Carreño-Tarragona, Gonzalo; Barrio, Santiago; Imberti, Luisa; Sottini, Alessandra; Quiros-Roldan, Eugenia; Rossi, Camillo; Biondi, Andrea; Bettini, Laura Rachele; D'Angio, Mariella; Bonfanti, Paolo; Tompkins, Miranda F; Alba, Camille; Dalgard, Clifton; Sambri, Vittorio; Martinelli, Giovanni; Goldman, Jason D; Heath, James R.

medRxiv ; 2020 Sep 17.

Article in English | MEDLINE | ID: mdl-32793919

ABSTRACT

T cells are involved in the early identification and clearance of viral infections and also support the development of antibodies by B cells. This central role for T cells makes them a desirable target for assessing the immune response to SARS-CoV-2 infection. Here, we combined two high-throughput immune profiling methods to create a quantitative picture of the T-cell response to SARS-CoV-2. First, at the individual level, we deeply characterized 3 acutely infected and 58 recovered COVID-19 subjects by experimentally mapping their CD8 T-cell response through antigen stimulation to 545 Human Leukocyte Antigen (HLA) class I presented viral peptides (class II data in a forthcoming study). Then, at the population level, we performed T-cell repertoire sequencing on 1,815 samples (from 1,521 COVID-19 subjects) as well as 3,500 controls to identify shared "public" T-cell receptors (TCRs) associated with SARS-CoV-2 infection from both CD8 and CD4 T cells. Collectively, our data reveal that CD8 T-cell responses are often driven by a few immunodominant, HLA-restricted epitopes. As expected, the T-cell response to SARS-CoV-2 peaks about one to two weeks after infection and is detectable for at least several months after recovery. As an application of these data, we trained a classifier to diagnose SARS-CoV-2 infection based solely on TCR sequencing from blood samples, and observed, at 99.8% specificity, high early sensitivity soon after diagnosis (Day 3-7 = 85.1% [95% CI = 79.9-89.7]; Day 8-14 = 94.8% [90.7-98.4]) as well as lasting sensitivity after recovery (Day 29+/convalescent = 95.4% [92.1-98.3]). These results demonstrate an approach to reliably assess the adaptive immune response both soon after viral antigenic exposure (before antibodies are typically detectable) as well as at later time points. This blood-based molecular approach to characterizing the cellular immune response has applications in clinical diagnostics as well as in vaccine development and monitoring.

19.

An environment for sustainable research software in Germany and beyond: current state, open challenges, and call for action.

Anzt, Hartwig; Bach, Felix; Druskat, Stephan; Löffler, Frank; Loewe, Axel; Renard, Bernhard Y; Seemann, Gunnar; Struck, Alexander; Achhammer, Elke; Aggarwal, Piush; Appel, Franziska; Bader, Michael; Brusch, Lutz; Busse, Christian; Chourdakis, Gerasimos; Dabrowski, Piotr Wojciech; Ebert, Peter; Flemisch, Bernd; Friedl, Sven; Fritzsch, Bernadette; Funk, Maximilian D; Gast, Volker; Goth, Florian; Grad, Jean-Noël; Hegewald, Jan; Hermann, Sibylle; Hohmann, Florian; Janosch, Stephan; Kutra, Dominik; Linxweiler, Jan; Muth, Thilo; Peters-Kottig, Wolfgang; Rack, Fabian; Raters, Fabian H C; Rave, Stephan; Reina, Guido; Reißig, Malte; Ropinski, Timo; Schaarschmidt, Joerg; Seibold, Heidi; Thiele, Jan P; Uekermann, Benjamin; Unger, Stefan; Weeber, Rudolf.

F1000Res ; 9: 295, 2020.

Article in English | MEDLINE | ID: mdl-33552475

ABSTRACT

Research software has become a central asset in academic research. It optimizes existing and enables new research methods, implements and embeds research knowledge, and constitutes an essential research product in itself. Research software must be sustainable in order to understand, replicate, reproduce, and build upon existing research or conduct new research effectively. In other words, software must be available, discoverable, usable, and adaptable to new needs, both now and in the future. Research software therefore requires an environment that supports sustainability. Hence, a change is needed in the way research software development and maintenance are currently motivated, incentivized, funded, structurally and infrastructurally supported, and legally treated. Failing to do so will threaten the quality and validity of research. In this paper, we identify challenges for research software sustainability in Germany and beyond, in terms of motivation, selection, research software engineering personnel, funding, infrastructure, and legal aspects. Besides researchers, we specifically address political and academic decision-makers to increase awareness of the importance and needs of sustainable research software practices. In particular, we recommend strategies and measures to create an environment for sustainable research software, with the ultimate goal to ensure that software-driven research is valid, reproducible and sustainable, and that software is recognized as a first class citizen in research. This paper is the outcome of two workshops run in Germany in 2019, at deRSE19 - the first International Conference of Research Software Engineers in Germany - and a dedicated DFG-supported follow-up workshop in Berlin.

Subject(s)

Knowledge , Research Personnel , Software , Forecasting , Germany , Humans

20.

Unique and assay specific features of NOMe-, ATAC- and DNase I-seq data.

Nordström, Karl J V; Schmidt, Florian; Gasparoni, Nina; Salhab, Abdulrahman; Gasparoni, Gilles; Kattler, Kathrin; Müller, Fabian; Ebert, Peter; Costa, Ivan G; Pfeifer, Nico; Lengauer, Thomas; Schulz, Marcel H; Walter, Jörn.

Nucleic Acids Res ; 47(20): 10580-10596, 2019 11 18.

Article in English | MEDLINE | ID: mdl-31584093

ABSTRACT

Chromatin accessibility maps are important for the functional interpretation of the genome. Here, we systematically analysed assay specific differences between DNase I-seq, ATAC-seq and NOMe-seq in a side by side experimental and bioinformatic setup. We observe that most prominent nucleosome depleted regions (NDRs, e.g. in promoters) are roboustly called by all three or at least two assays. However, we also find a high proportion of assay specific NDRs that are often 'called' by only one of the assays. We show evidence that these assay specific NDRs are indeed genuine open chromatin sites and contribute important information for accurate gene expression prediction. While technically ATAC-seq and DNase I-seq provide a superb high NDR calling rate for relatively low sequencing costs in comparison to NOMe-seq, NOMe-seq singles out for its genome-wide coverage allowing to not only detect NDRs but also endogenous DNA methylation and as we show here genome wide segmentation into heterochromatic B domains and local phasing of nucleosomes outside of NDRs. In summary, our comparisons strongly suggest to consider assay specific differences for the experimental design and for generalized and comparative functional interpretations.

Subject(s)

Chromatin Immunoprecipitation Sequencing/methods , Chromatin Immunoprecipitation Sequencing/standards , Hep G2 Cells , Humans , Nucleosomes/chemistry , Nucleosomes/metabolism , Promoter Regions, Genetic

ABSTRACT

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL