Search | VHL Search Portal

1.

Distinct Classes of Complex Structural Variation Uncovered across Thousands of Cancer Genome Graphs.

Hadi, Kevin; Yao, Xiaotong; Behr, Julie M; Deshpande, Aditya; Xanthopoulakis, Charalampos; Tian, Huasong; Kudman, Sarah; Rosiene, Joel; Darmofal, Madison; DeRose, Joseph; Mortensen, Rick; Adney, Emily M; Shaiber, Alon; Gajic, Zoran; Sigouros, Michael; Eng, Kenneth; Wala, Jeremiah A; Wrzeszczynski, Kazimierz O; Arora, Kanika; Shah, Minita; Emde, Anne-Katrin; Felice, Vanessa; Frank, Mayu O; Darnell, Robert B; Ghandi, Mahmoud; Huang, Franklin; Dewhurst, Sally; Maciejowski, John; de Lange, Titia; Setton, Jeremy; Riaz, Nadeem; Reis-Filho, Jorge S; Powell, Simon; Knowles, David A; Reznik, Ed; Mishra, Bud; Beroukhim, Rameen; Zody, Michael C; Robine, Nicolas; Oman, Kenji M; Sanchez, Carissa A; Kuhner, Mary K; Smith, Lucian P; Galipeau, Patricia C; Paulson, Thomas G; Reid, Brian J; Li, Xiaohong; Wilkes, David; Sboner, Andrea; Mosquera, Juan Miguel.

Cell ; 183(1): 197-210.e32, 2020 10 01.

Article in English | MEDLINE | ID: mdl-33007263

ABSTRACT

Cancer genomes often harbor hundreds of somatic DNA rearrangement junctions, many of which cannot be easily classified into simple (e.g., deletion) or complex (e.g., chromothripsis) structural variant classes. Applying a novel genome graph computational paradigm to analyze the topology of junction copy number (JCN) across 2,778 tumor whole-genome sequences, we uncovered three novel complex rearrangement phenomena: pyrgo, rigma, and tyfonas. Pyrgo are "towers" of low-JCN duplications associated with early-replicating regions, superenhancers, and breast or ovarian cancers. Rigma comprise "chasms" of low-JCN deletions enriched in late-replicating fragile sites and gastrointestinal carcinomas. Tyfonas are "typhoons" of high-JCN junctions and fold-back inversions associated with expressed protein-coding fusions, breakend hypermutation, and acral, but not cutaneous, melanomas. Clustering of tumors according to genome graph-derived features identified subgroups associated with DNA repair defects and poor prognosis.

Subject(s)

Genomic Structural Variation/genetics , Genomics/methods , Neoplasms/genetics , Chromosome Inversion/genetics , Chromothripsis , DNA Copy Number Variations/genetics , Gene Rearrangement/genetics , Genome, Human/genetics , Humans , Mutation/genetics , Whole Genome Sequencing/methods

2.

Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program.

Taliun, Daniel; Harris, Daniel N; Kessler, Michael D; Carlson, Jedidiah; Szpiech, Zachary A; Torres, Raul; Taliun, Sarah A Gagliano; Corvelo, André; Gogarten, Stephanie M; Kang, Hyun Min; Pitsillides, Achilleas N; LeFaive, Jonathon; Lee, Seung-Been; Tian, Xiaowen; Browning, Brian L; Das, Sayantan; Emde, Anne-Katrin; Clarke, Wayne E; Loesch, Douglas P; Shetty, Amol C; Blackwell, Thomas W; Smith, Albert V; Wong, Quenna; Liu, Xiaoming; Conomos, Matthew P; Bobo, Dean M; Aguet, François; Albert, Christine; Alonso, Alvaro; Ardlie, Kristin G; Arking, Dan E; Aslibekyan, Stella; Auer, Paul L; Barnard, John; Barr, R Graham; Barwick, Lucas; Becker, Lewis C; Beer, Rebecca L; Benjamin, Emelia J; Bielak, Lawrence F; Blangero, John; Boehnke, Michael; Bowden, Donald W; Brody, Jennifer A; Burchard, Esteban G; Cade, Brian E; Casella, James F; Chalazan, Brandon; Chasman, Daniel I; Chen, Yii-Der Ida.

Nature ; 590(7845): 290-299, 2021 02.

Article in English | MEDLINE | ID: mdl-33568819

ABSTRACT

The Trans-Omics for Precision Medicine (TOPMed) programme seeks to elucidate the genetic architecture and biology of heart, lung, blood and sleep disorders, with the ultimate goal of improving diagnosis, treatment and prevention of these diseases. The initial phases of the programme focused on whole-genome sequencing of individuals with rich phenotypic data and diverse backgrounds. Here we describe the TOPMed goals and design as well as the available resources and early insights obtained from the sequence data. The resources include a variant browser, a genotype imputation server, and genomic and phenotypic data that are available through dbGaP (Database of Genotypes and Phenotypes)1. In the first 53,831 TOPMed samples, we detected more than 400 million single-nucleotide and insertion or deletion variants after alignment with the reference genome. Additional previously undescribed variants were detected through assembly of unmapped reads and customized analysis in highly variable loci. Among the more than 400 million detected variants, 97% have frequencies of less than 1% and 46% are singletons that are present in only one individual (53% among unrelated individuals). These rare variants provide insights into mutational processes and recent human evolutionary history. The extensive catalogue of genetic variation in TOPMed studies provides unique opportunities for exploring the contributions of rare and noncoding sequence variants to phenotypic variation. Furthermore, combining TOPMed haplotypes with modern imputation methods improves the power and reach of genome-wide association studies to include variants down to a frequency of approximately 0.01%.

Subject(s)

Genetic Variation/genetics , Genome, Human/genetics , Genomics , National Heart, Lung, and Blood Institute (U.S.) , Precision Medicine , Cytochrome P-450 CYP2D6/genetics , Haplotypes/genetics , Heterozygote , Humans , INDEL Mutation , Loss of Function Mutation , Mutagenesis , Phenotype , Polymorphism, Single Nucleotide , Population Density , Precision Medicine/standards , Quality Control , Sample Size , United States , Whole Genome Sequencing/standards

3.

Diverse tumorigenic consequences of human papillomavirus integration in primary oropharyngeal cancers.

Symer, David E; Akagi, Keiko; Geiger, Heather M; Song, Yang; Li, Gaiyun; Emde, Anne-Katrin; Xiao, Weihong; Jiang, Bo; Corvelo, André; Toussaint, Nora C; Li, Jingfeng; Agrawal, Amit; Ozer, Enver; El-Naggar, Adel K; Du, Zoe; Shewale, Jitesh B; Stache-Crain, Birgit; Zucker, Mark; Robine, Nicolas; Coombes, Kevin R; Gillison, Maura L.

Genome Res ; 32(1): 55-70, 2022 01.

Article in English | MEDLINE | ID: mdl-34903527

ABSTRACT

Human papillomavirus (HPV) causes 5% of all cancers and frequently integrates into host chromosomes. The HPV oncoproteins E6 and E7 are necessary but insufficient for cancer formation, indicating that additional secondary genetic events are required. Here, we investigate potential oncogenic impacts of virus integration. Analysis of 105 HPV-positive oropharyngeal cancers by whole-genome sequencing detects virus integration in 77%, revealing five statistically significant sites of recurrent integration near genes that regulate epithelial stem cell maintenance (i.e., SOX2, TP63, FGFR, MYC) and immune evasion (i.e., CD274). Genomic copy number hyperamplification is enriched 16-fold near HPV integrants, and the extent of focal host genomic instability increases with their local density. The frequency of genes expressed at extreme outlier levels is increased 86-fold within ±150 kb of integrants. Across 95% of tumors with integration, host gene transcription is disrupted via intragenic integrants, chimeric transcription, outlier expression, gene breaking, and/or de novo expression of noncoding or imprinted genes. We conclude that virus integration can contribute to carcinogenesis in a large majority of HPV-positive oropharyngeal cancers by inducing extensive disruption of host genome structure and gene expression.

Subject(s)

Alphapapillomavirus , Oncogene Proteins, Viral , Oropharyngeal Neoplasms , Alphapapillomavirus/metabolism , Carcinogenesis , Humans , Oncogene Proteins, Viral/genetics , Oropharyngeal Neoplasms/genetics , Papillomaviridae/genetics , Papillomaviridae/metabolism , Papillomavirus E7 Proteins/genetics , Papillomavirus E7 Proteins/metabolism , Virus Integration/genetics

4.

Human papillomavirus and the landscape of secondary genetic alterations in oral cancers.

Gillison, Maura L; Akagi, Keiko; Xiao, Weihong; Jiang, Bo; Pickard, Robert K L; Li, Jingfeng; Swanson, Benjamin J; Agrawal, Amit D; Zucker, Mark; Stache-Crain, Birgit; Emde, Anne-Katrin; Geiger, Heather M; Robine, Nicolas; Coombes, Kevin R; Symer, David E.

Genome Res ; 29(1): 1-17, 2019 01.

Article in English | MEDLINE | ID: mdl-30563911

ABSTRACT

Human papillomavirus (HPV) is a necessary but insufficient cause of a subset of oral squamous cell carcinomas (OSCCs) that is increasing markedly in frequency. To identify contributory, secondary genetic alterations in these cancers, we used comprehensive genomics methods to compare 149 HPV-positive and 335 HPV-negative OSCC tumor/normal pairs. Different behavioral risk factors underlying the two OSCC types were reflected in distinctive genomic mutational signatures. In HPV-positive OSCCs, the signatures of APOBEC cytosine deaminase editing, associated with anti-viral immunity, were strongly linked to overall mutational burden. In contrast, in HPV-negative OSCCs, T>C substitutions in the sequence context 5'-ATN-3' correlated with tobacco exposure. Universal expression of HPV E6*1 and E7 oncogenes was a sine qua non of HPV-positive OSCCs. Significant enrichment of somatic mutations was confirmed or newly identified in PIK3CA, KMT2D, FGFR3, FBXW7, DDX3X, PTEN, TRAF3, RB1, CYLD, RIPK4, ZNF750, EP300, CASZ1, TAF5, RBL1, IFNGR1, and NFKBIA Of these, many affect host pathways already targeted by HPV oncoproteins, including the p53 and pRB pathways, or disrupt host defenses against viral infections, including interferon (IFN) and nuclear factor kappa B signaling. Frequent copy number changes were associated with concordant changes in gene expression. Chr 11q (including CCND1) and 14q (including DICER1 and AKT1) were recurrently lost in HPV-positive OSCCs, in contrast to their gains in HPV-negative OSCCs. High-ranking variant allele fractions implicated ZNF750, PIK3CA, and EP300 mutations as candidate driver events in HPV-positive cancers. We conclude that virus-host interactions cooperatively shape the unique genetic features of these cancers, distinguishing them from their HPV-negative counterparts.

Subject(s)

Carcinoma, Squamous Cell , Mouth Neoplasms , Neoplasm Proteins , Oncogene Proteins, Viral , Papillomavirus Infections , Carcinoma, Squamous Cell/genetics , Carcinoma, Squamous Cell/metabolism , Carcinoma, Squamous Cell/pathology , Carcinoma, Squamous Cell/virology , Female , Humans , Male , Mouth Neoplasms/genetics , Mouth Neoplasms/metabolism , Mouth Neoplasms/pathology , Mouth Neoplasms/virology , Mutation , Neoplasm Proteins/biosynthesis , Neoplasm Proteins/genetics , Oncogene Proteins, Viral/biosynthesis , Oncogene Proteins, Viral/genetics , Papillomaviridae/genetics , Papillomaviridae/metabolism

5.

Mid-pass whole genome sequencing enables biomedical genetic studies of diverse populations.

Emde, Anne-Katrin; Phipps-Green, Amanda; Cadzow, Murray; Gallagher, C Scott; Major, Tanya J; Merriman, Marilyn E; Topless, Ruth K; Takei, Riku; Dalbeth, Nicola; Murphy, Rinki; Stamp, Lisa K; de Zoysa, Janak; Wilcox, Philip L; Fox, Keolu; Wasik, Kaja A; Merriman, Tony R; Castel, Stephane E.

BMC Genomics ; 22(1): 666, 2021 Nov 01.

Article in English | MEDLINE | ID: mdl-34719381

ABSTRACT

BACKGROUND: Historically, geneticists have relied on genotyping arrays and imputation to study human genetic variation. However, an underrepresentation of diverse populations has resulted in arrays that poorly capture global genetic variation, and a lack of reference panels. This has contributed to deepening global health disparities. Whole genome sequencing (WGS) better captures genetic variation but remains prohibitively expensive. Thus, we explored WGS at "mid-pass" 1-7x coverage. RESULTS: Here, we developed and benchmarked methods for mid-pass sequencing. When applied to a population without an existing genomic reference panel, 4x mid-pass performed consistently well across ethnicities, with high recall (98%) and precision (97.5%). CONCLUSION: Compared to array data imputed into 1000 Genomes, mid-pass performed better across all metrics and identified novel population-specific variants with potential disease relevance. We hope our work will reduce financial barriers for geneticists from underrepresented populations to characterize their genomes prior to biomedical genetic applications.

Subject(s)

Genome-Wide Association Study , Polymorphism, Single Nucleotide , Genome , Genome, Human , Genomics , Genotype , Humans , Whole Genome Sequencing

6.

Gustaf: Detecting and correctly classifying SVs in the NGS twilight zone.

Trappe, Kathrin; Emde, Anne-Katrin; Ehrlich, Hans-Christian; Reinert, Knut.

Bioinformatics ; 30(24): 3484-90, 2014 Dec 15.

Article in English | MEDLINE | ID: mdl-25028727

ABSTRACT

MOTIVATION: The landscape of structural variation (SV) including complex duplication and translocation patterns is far from resolved. SV detection tools usually exhibit low agreement, are often geared toward certain types or size ranges of variation and struggle to correctly classify the type and exact size of SVs. RESULTS: We present Gustaf (Generic mUlti-SpliT Alignment Finder), a sound generic multi-split SV detection tool that detects and classifies deletions, inversions, dispersed duplications and translocations of ≥ 30 bp. Our approach is based on a generic multi-split alignment strategy that can identify SV breakpoints with base pair resolution. We show that Gustaf correctly identifies SVs, especially in the range from 30 to 100 bp, which we call the next-generation sequencing (NGS) twilight zone of SVs, as well as larger SVs >500 bp. Gustaf performs better than similar tools in our benchmark and is furthermore able to correctly identify size and location of dispersed duplications and translocations, which otherwise might be wrongly classified, for example, as large deletions.

Subject(s)

Genomic Structural Variation , High-Throughput Nucleotide Sequencing/methods , Sequence Analysis, DNA/methods , Humans , Sequence Alignment , Sequence Deletion , Software , Translocation, Genetic

7.

The gout epidemic in French Polynesia: a modelling study of data from the Ma'i u'u epidemiological survey.

Pascart, Tristan; Wasik, Kaja A; Preda, Cristian; Chune, Valérie; Torterat, Jérémie; Prud'homme, Nicolas; Nassih, Maryline; Martin, Agathe; Le Masson, Julien; Rodière, Vahinetua; Frogier, Sylvain; Canova, Georges; Pescheux, Jean-Paul; Shan Sei Fan, Charles; Jauffret, Charlotte; Claeys, Patrick; von Baeyer, Sarah LeBaron; Castel, Stephane E; Emde, Anne-Katrin; Yerges-Armstrong, Laura; Fox, Keolu; Leask, Megan; Vitagliano, Jean-Jacques; Graf, Sahara; Norberciak, Laurène; Raynal, Jacques; Dalbeth, Nicola; Merriman, Tony; Bardin, Thomas; Oehler, Erwan.

Lancet Glob Health ; 12(4): e685-e696, 2024 Apr.

Article in English | MEDLINE | ID: mdl-38485432

ABSTRACT

BACKGROUND: Gout is the most common cause of inflammatory arthritis worldwide, particularly in Pacific regions. We aimed to establish the prevalence of gout and hyperuricaemia in French Polynesia, their associations with dietary habits, their comorbidities, the prevalence of the HLA-B*58:01 allele, and current management of the disease. METHODS: The Ma'i u'u survey was epidemiological, prospective, cross-sectional, and gout-focused and included a random sample of adults from the general adult population of French Polynesia. It was conducted and data were collected between April 13 and Aug 16, 2021. Participants were randomly selected to represent the general adult population of French Polynesia on the basis of housing data collected during the 2017 territorial census. Each selected household was visited by a research nurse from the Ma'i u'u survey who collected data via guided, 1-h interviews with participants. In each household, the participant was the individual older than 18 years with the closest upcoming birthday. To estimate the frequency of HLA-B*58:01, we estimated HLA-B haplotypes on individuals who had whole-genome sequencing to approximately 5× average coverage (mid-pass sequencing). A subset of individuals who self-reported Polynesian ancestry and not European, Chinese, or other ancestry were used to estimate Polynesian-ancestry specific allele frequencies. Bivariate associations were reported for weighted participants; effect sizes were estimated through the odds ratio (OR) of the association calculated on the basis of a logistic model fitted with weighted observations. FINDINGS: Among the random sample of 2000 households, 896 participants were included, 140 individuals declined, and 964 households could not be contacted. 22 participants could not be weighted due to missing data, so the final weighted analysis included 874 participants (449 [51·4%] were female and 425 [48·6%] were male) representing the 196â630 adults living in French Polynesia. The estimated prevalence of gout was 14·5% (95% CI 9·9-19·2), representing 28â561 French Polynesian adults, that is 25·5% (18·2-32·8) of male individuals and 3·5% (1·0-6·0) of female individuals. The prevalence of hyperuricaemia was estimated at 71·6% (66·7-76·6), representing 128â687 French Polynesian adults. In multivariable analysis, age (OR 1·5, 95% CI 1·2-1·8 per year), male sex (10·3, 1·8-60·7), serum urate (1·6, 1·3-2·0 per 1 mg/dL), uraturia (0·8, 0·8-0·8 per 100 mg/L), type 2 diabetes (2·1, 1·4-3·1), BMI more than 30 kg/m2 (1·1, 1·0-1·2 per unit), and percentage of visceral fat (1·7, 1·1-2·7 per 1% increase) were associated with gout. There were seven heterozygous HLA-B*58:01 carriers in the full cohort of 833 individuals (seven [0·4%] of 1666 total alleles) and two heterozygous carriers in a subset of 696 individuals of Polynesian ancestry (two [0·1%]). INTERPRETATION: French Polynesia has an estimated high prevalence of gout and hyperuricaemia, with gout affecting almost 15% of adults. Territorial measures that focus on increasing access to effective urate-lowering therapies are warranted to control this major public health problem. FUNDING: Variant Bio, the French Polynesian Health Administration, Lille Catholic University Hospitals, French Society of Rheumatology, and Novartis.

Subject(s)

Diabetes Mellitus, Type 2 , Gout , Hyperuricemia , Adult , Humans , Male , Female , Hyperuricemia/epidemiology , Hyperuricemia/genetics , Uric Acid , Cross-Sectional Studies , Prospective Studies , Gout/epidemiology , Gout/genetics , Polynesia/epidemiology , HLA-B Antigens

8.

Breakpointer: using local mapping artifacts to support sequence breakpoint discovery from single-end reads.

Sun, Ruping; Love, Michael I; Zemojtel, Tomasz; Emde, Anne-Katrin; Chung, Ho-Ryun; Vingron, Martin; Haas, Stefan A.

Bioinformatics ; 28(7): 1024-5, 2012 Apr 01.

Article in English | MEDLINE | ID: mdl-22302574

ABSTRACT

SUMMARY: We developed Breakpointer, a fast algorithm to locate breakpoints of structural variants (SVs) from single-end reads produced by next-generation sequencing. By taking advantage of local non-uniform read distribution and misalignments created by SVs, Breakpointer scans the alignment of single-end reads to identify regions containing potential breakpoints. The detection of such breakpoints can indicate insertions longer than the read length and SVs located in repetitve regions which might be missd by other methods. Thus, Breakpointer complements existing methods to locate SVs from single-end reads. AVAILABILITY: https://github.com/ruping/Breakpointer CONTACT: ruping@molgen.mpg.de SUPPLEMENTARY INFORMATION: Supplementary material is available at Bioinformatics online.

Subject(s)

Algorithms , Computational Biology/methods , Genomic Structural Variation , Sequence Analysis, DNA/methods , Artifacts , Humans

9.

Detecting genomic indel variants with exact breakpoints in single- and paired-end sequencing data using SplazerS.

Emde, Anne-Katrin; Schulz, Marcel H; Weese, David; Sun, Ruping; Vingron, Martin; Kalscheuer, Vera M; Haas, Stefan A; Reinert, Knut.

Bioinformatics ; 28(5): 619-27, 2012 Mar 01.

Article in English | MEDLINE | ID: mdl-22238266

ABSTRACT

MOTIVATION: The reliable detection of genomic variation in resequencing data is still a major challenge, especially for variants larger than a few base pairs. Sequencing reads crossing boundaries of structural variation carry the potential for their identification, but are difficult to map. RESULTS: Here we present a method for 'split' read mapping, where prefix and suffix match of a read may be interrupted by a longer gap in the read-to-reference alignment. We use this method to accurately detect medium-sized insertions and long deletions with precise breakpoints in genomic resequencing data. Compared with alternative split mapping methods, SplazerS significantly improves sensitivity for detecting large indel events, especially in variant-rich regions. Our method is robust in the presence of sequencing errors as well as alignment errors due to genomic mutations/divergence, and can be used on reads of variable lengths. Our analysis shows that SplazerS is a versatile tool applicable to unanchored or single-end as well as anchored paired-end reads. In addition, application of SplazerS to targeted resequencing data led to the interesting discovery of a complete, possibly functional gene retrocopy variant. AVAILABILITY: SplazerS is available from http://www.seqan.de/projects/ splazers. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Subject(s)

Genomics/methods , INDEL Mutation , Sequence Analysis, DNA , Algorithms , Humans

10.

RazerS--fast read mapping with sensitivity control.

Weese, David; Emde, Anne-Katrin; Rausch, Tobias; Döring, Andreas; Reinert, Knut.

Genome Res ; 19(9): 1646-54, 2009 Sep.

Article in English | MEDLINE | ID: mdl-19592482

ABSTRACT

Second-generation sequencing technologies deliver DNA sequence data at unprecedented high throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. Due to the large amounts of data, efficient algorithms and implementations are crucial for this task. We present an efficient read mapping tool called RazerS. It allows the user to align sequencing reads of arbitrary length using either the Hamming distance or the edit distance. Our tool can work either lossless or with a user-defined loss rate at higher speeds. Given the loss rate, we present an approach that guarantees not to lose more reads than specified. This enables the user to adapt to the problem at hand and provides a seamless tradeoff between sensitivity and running time.

Subject(s)

Chromosome Mapping/methods , Sequence Analysis, DNA/methods , Software , Algorithms , Animals , Drosophila melanogaster/genetics , Genome, Human/genetics , Genome, Insect/genetics , Humans , Sensitivity and Specificity , Sequence Alignment , Time Factors , User-Computer Interface

11.

Somatic whole genome dynamics of precancer in Barrett's esophagus reveals features associated with disease progression.

Paulson, Thomas G; Galipeau, Patricia C; Oman, Kenji M; Sanchez, Carissa A; Kuhner, Mary K; Smith, Lucian P; Hadi, Kevin; Shah, Minita; Arora, Kanika; Shelton, Jennifer; Johnson, Molly; Corvelo, Andre; Maley, Carlo C; Yao, Xiaotong; Sanghvi, Rashesh; Venturini, Elisa; Emde, Anne-Katrin; Hubert, Benjamin; Imielinski, Marcin; Robine, Nicolas; Reid, Brian J; Li, Xiaohong.

Nat Commun ; 13(1): 2300, 2022 04 28.

Article in English | MEDLINE | ID: mdl-35484108

ABSTRACT

While the genomes of normal tissues undergo dynamic changes over time, little is understood about the temporal-spatial dynamics of genomes in premalignant tissues that progress to cancer compared to those that remain cancer-free. Here we use whole genome sequencing to contrast genomic alterations in 427 longitudinal samples from 40 patients with stable Barrett's esophagus compared to 40 Barrett's patients who progressed to esophageal adenocarcinoma (ESAD). We show the same somatic mutational processes are active in Barrett's tissue regardless of outcome, with high levels of mutation, ESAD gene and focal chromosomal alterations, and similar mutational signatures. The critical distinction between stable Barrett's versus those who progress to cancer is acquisition and expansion of TP53-/- cell populations having complex structural variants and high-level amplifications, which are detectable up to six years prior to a cancer diagnosis. These findings reveal the timing of common somatic genome dynamics in stable Barrett's esophagus and define key genomic features specific to progression to esophageal adenocarcinoma, both of which are critical for cancer prevention and early detection strategies.

Subject(s)

Adenocarcinoma , Barrett Esophagus , Esophageal Neoplasms , Adenocarcinoma/pathology , Barrett Esophagus/genetics , Barrett Esophagus/pathology , Disease Progression , Esophageal Neoplasms/pathology , Humans

12.

A novel and well-defined benchmarking method for second generation read mapping.

Holtgrewe, Manuel; Emde, Anne-Katrin; Weese, David; Reinert, Knut.

BMC Bioinformatics ; 12: 210, 2011 May 26.

Article in English | MEDLINE | ID: mdl-21615913

ABSTRACT

BACKGROUND: Second generation sequencing technologies yield DNA sequence data at ultra high-throughput. Common to most biological applications is a mapping of the reads to an almost identical or highly similar reference genome. The assessment of the quality of read mapping results is not straightforward and has not been formalized so far. Hence, it has not been easy to compare different read mapping approaches in a unified way and to determine which program is the best for what task. RESULTS: We present a new benchmark method, called Rabema (Read Alignment BEnchMArk), for read mappers. It consists of a strict definition of the read mapping problem and of tools to evaluate the result of arbitrary read mappers supporting the SAM output format. CONCLUSIONS: We show the usefulness of the benchmark program by performing a comparison of popular read mappers. The tools supporting the benchmark are licensed under the GPL and available from http://www.seqan.de/projects/rabema.html.

Subject(s)

Sequence Analysis, DNA/methods , Sequence Analysis, DNA/standards , Algorithms , Animals , High-Throughput Nucleotide Sequencing , Humans

13.

MicroRazerS: rapid alignment of small RNA reads.

Emde, Anne-Katrin; Grunert, Marcel; Weese, David; Reinert, Knut; Sperling, Silke R.

Bioinformatics ; 26(1): 123-4, 2010 Jan 01.

Article in English | MEDLINE | ID: mdl-19880369

ABSTRACT

MOTIVATION: Deep sequencing has become the method of choice for determining the small RNA content of a cell. Mapping the sequenced reads onto their reference genome serves as the basis for all further analyses, namely for identification and quantification. A method frequently used is Mega BLAST followed by several filtering steps, even though it is slow and inefficient for this task. Also, none of the currently available short read aligners has established itself for the particular task of small RNA mapping. RESULTS: We present MicroRazerS, a tool optimized for mapping small RNAs onto a reference genome. It is an order of magnitude faster than Mega BLAST and comparable in speed with other short read mapping tools. In addition, it is more sensitive and easy to handle and adjust. AVAILABILITY: MicroRazerS is part of the SeqAn C++ library and can be downloaded from http://www.seqan.de/projects/MicroRazerS.html.

Subject(s)

Algorithms , MicroRNAs/genetics , Sequence Alignment/methods , Sequence Analysis, RNA/methods , Software , Base Sequence , Molecular Sequence Data

14.

A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads.

Rausch, Tobias; Koren, Sergey; Denisov, Gennady; Weese, David; Emde, Anne-Katrin; Döring, Andreas; Reinert, Knut.

Bioinformatics ; 25(9): 1118-24, 2009 May 01.

Article in English | MEDLINE | ID: mdl-19269990

ABSTRACT

MOTIVATION: Novel high-throughput sequencing technologies pose new algorithmic challenges in handling massive amounts of short-read, high-coverage data. A robust and versatile consensus tool is of particular interest for such data since a sound multi-read alignment is a prerequisite for variation analyses, accurate genome assemblies and insert sequencing. RESULTS: A multi-read alignment algorithm for de novo or reference-guided genome assembly is presented. The program identifies segments shared by multiple reads and then aligns these segments using a consistency-enhanced alignment graph. On real de novo sequencing data obtained from the newly established NCBI Short Read Archive, the program performs similarly in quality to other comparable programs. On more challenging simulated datasets for insert sequencing and variation analyses, our program outperforms the other tools. AVAILABILITY: The consensus program can be downloaded from http://www.seqan.de/projects/consensus.html. It can be used stand-alone or in conjunction with the Celera Assembler. Both application scenarios as well as the usage of the tool are described in the documentation.

Subject(s)

Algorithms , Sequence Alignment/methods , Base Sequence , Computational Biology/methods , Internet , Molecular Sequence Data , Sequence Analysis, DNA/methods

15.

Author Correction: PGBD5 promotes site-specific oncogenic mutations in human tumors.

Henssen, Anton G; Koche, Richard; Zhuang, Jiali; Jiang, Eileen; Reed, Casie; Eisenberg, Amy; Still, Eric; MacArthur, Ian C; Rodríguez-Fos, Elias; Gonzalez, Santiago; Puiggròs, Montserrat; Blackford, Andrew N; Mason, Christopher E; de Stanchina, Elisa; Gönen, Mithat; Emde, Anne-Katrin; Shah, Minita; Arora, Kanika; Reeves, Catherine; Socci, Nicholas D; Perlman, Elizabeth; Antonescu, Cristina R; Roberts, Charles W M; Steen, Hanno; Mullen, Elizabeth; Jackson, Stephen P; Torrents, David; Weng, Zhiping; Armstrong, Scott A; Kentsis, Alex.

Nat Genet ; 52(11): 1265, 2020 Nov.

Article in English | MEDLINE | ID: mdl-32918070

ABSTRACT

An amendment to this paper has been published and can be accessed via a link at the top of the paper.

16.

Segment-based multiple sequence alignment.

Rausch, Tobias; Emde, Anne-Katrin; Weese, David; Döring, Andreas; Notredame, Cedric; Reinert, Knut.

Bioinformatics ; 24(16): i187-92, 2008 Aug 15.

Article in English | MEDLINE | ID: mdl-18689823

ABSTRACT

MOTIVATION: Many multiple sequence alignment tools have been developed in the past, progressing either in speed or alignment accuracy. Given the importance and wide-spread use of alignment tools, progress in both categories is a contribution to the community and has driven research in the field so far. RESULTS: We introduce a graph-based extension to the consistency-based, progressive alignment strategy. We apply the consistency notion to segments instead of single characters. The main problem we solve in this context is to define segments of the sequences in such a way that a graph-based alignment is possible. We implemented the algorithm using the SeqAn library and report results on amino acid and DNA sequences. The benefit of our approach is threefold: (1) sequences with conserved blocks can be rapidly aligned, (2) the implementation is conceptually easy, generic and fast and (3) the consistency idea can be extended to align multiple genomic sequences. AVAILABILITY: The segment-based multiple sequence alignment tool can be downloaded from http://www.seqan.de/projects/msa.html. A novel version of T-Coffee interfaced with the tool is available from http://www.tcoffee.org. The usage of the tool is described in both documentations.

Subject(s)

Algorithms , Sequence Alignment/methods , Sequence Analysis/methods , Software

17.

Genetic mechanisms of primary chemotherapy resistance in pediatric acute myeloid leukemia.

McNeer, Nicole A; Philip, John; Geiger, Heather; Ries, Rhonda E; Lavallée, Vincent-Philippe; Walsh, Michael; Shah, Minita; Arora, Kanika; Emde, Anne-Katrin; Robine, Nicolas; Alonzo, Todd A; Kolb, E Anders; Gamis, Alan S; Smith, Malcolm; Gerhard, Daniela Se; Guidry-Auvil, Jaime; Meshinchi, Soheil; Kentsis, Alex.

Leukemia ; 33(8): 1934-1943, 2019 08.

Article in English | MEDLINE | ID: mdl-30760869

ABSTRACT

Acute myeloid leukemias (AML) are characterized by mutations of tumor suppressor and oncogenes, involving distinct genes in adults and children. While certain mutations have been associated with the increased risk of AML relapse, the genomic landscape of primary chemotherapy-resistant AML is not well defined. As part of the TARGET initiative, we performed whole-genome DNA and transcriptome RNA and miRNA sequencing analysis of pediatric AML with failure of induction chemotherapy. We identified at least three genetic groups of patients with induction failure, including those with NUP98 rearrangements, somatic mutations of WT1 in the absence of apparent NUP98 mutations, and additional recurrent variants including those in KMT2C and MLLT10. Comparison of specimens before and after chemotherapy revealed distinct and invariant gene expression programs. While exhibiting overt therapy resistance, these leukemias nonetheless showed diverse forms of clonal evolution upon chemotherapy exposure. This included selection for mutant alleles of FRMD8, DHX32, PIK3R1, SHANK3, MKLN1, as well as persistence of WT1 and TP53 mutant clones, and elimination of FLT3, PTPN11, and NRAS mutant clones. These findings delineate genetic mechanisms of primary chemotherapy resistance in pediatric AML, which should inform improved approaches for its diagnosis and therapy.

Subject(s)

Leukemia, Myeloid, Acute/drug therapy , Mutation , Child , Drug Resistance, Neoplasm , Genes, Wilms Tumor , Genes, p53 , Humans , Leukemia, Myeloid, Acute/genetics

18.

Correction to: Sequencing and curation strategies for identifying candidate glioblastoma treatments.

Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Geiger, Heather; Felice, Vanessa; Dikoglu, Esra; Rahman, Sadia; Fang, Xiaolan; Vacic, Vladimir; Bergmann, Ewa A; Moore Vogel, Julia L; Reeves, Catherine; Khaira, Depinder; Calabro, Anthony; Kim, Duyang; Lamendola-Essel, Michelle F; Esteves, Cecilia; Agius, Phaedra; Stolte, Christian; Boockvar, John; Demopoulos, Alexis; Placantonakis, Dimitris G; Golfinos, John G; Brennan, Cameron; Bruce, Jeffrey; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Daras, Mariza; Diamond, Eli; Omuro, Antonio; Pentsova, Elena; Orange, Dana E; Harvey, Stephen J; Posner, Jerome B; Michelini, Vanessa V; Jobanputra, Vaidehi; Zody, Michael C; Kelly, John; Parida, Laxmi; Wrzeszczynski, Kazimierz O; Royyuru, Ajay K; Darnell, Robert B.

BMC Med Genomics ; 12(1): 114, 2019 Aug 02.

Article in English | MEDLINE | ID: mdl-31375115

ABSTRACT

Following publication of the original article [1], it was reported that the given name of the fourteenth author was incorrectly published. The incorrect and the correct names are given below.

19.

Sequencing and curation strategies for identifying candidate glioblastoma treatments.

Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Geiger, Heather; Felice, Vanessa; Dikoglu, Esra; Rahman, Sadia; Fang, Alice; Vacic, Vladimir; Bergmann, Ewa A; Vogel, Julia L Moore; Reeves, Catherine; Khaira, Depinder; Calabro, Anthony; Kim, Duyang; Lamendola-Essel, Michelle F; Esteves, Cecilia; Agius, Phaedra; Stolte, Christian; Boockvar, John; Demopoulos, Alexis; Placantonakis, Dimitris G; Golfinos, John G; Brennan, Cameron; Bruce, Jeffrey; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Daras, Mariza; Diamond, Eli; Omuro, Antonio; Pentsova, Elena; Orange, Dana E; Harvey, Stephen J; Posner, Jerome B; Michelini, Vanessa V; Jobanputra, Vaidehi; Zody, Michael C; Kelly, John; Parida, Laxmi; Wrzeszczynski, Kazimierz O; Royyuru, Ajay K; Darnell, Robert B.

BMC Med Genomics ; 12(1): 56, 2019 04 25.

Article in English | MEDLINE | ID: mdl-31023376

ABSTRACT

BACKGROUND: Prompted by the revolution in high-throughput sequencing and its potential impact for treating cancer patients, we initiated a clinical research study to compare the ability of different sequencing assays and analysis methods to analyze glioblastoma tumors and generate real-time potential treatment options for physicians. METHODS: A consortium of seven institutions in New York City enrolled 30 patients with glioblastoma and performed tumor whole genome sequencing (WGS) and RNA sequencing (RNA-seq; collectively WGS/RNA-seq); 20 of these patients were also analyzed with independent targeted panel sequencing. We also compared results of expert manual annotations with those from an automated annotation system, Watson Genomic Analysis (WGA), to assess the reliability and time required to identify potentially relevant pharmacologic interventions. RESULTS: WGS/RNAseq identified more potentially actionable clinical results than targeted panels in 90% of cases, with an average of 16-fold more unique potentially actionable variants identified per individual; 84 clinically actionable calls were made using WGS/RNA-seq that were not identified by panels. Expert annotation and WGA had good agreement on identifying variants [mean sensitivity = 0.71, SD = 0.18 and positive predictive value (PPV) = 0.80, SD = 0.20] and drug targets when the same variants were called (mean sensitivity = 0.74, SD = 0.34 and PPV = 0.79, SD = 0.23) across patients. Clinicians used the information to modify their treatment plan 10% of the time. CONCLUSION: These results present the first comprehensive comparison of technical and machine augmented analysis of targeted panel and WGS/RNA-seq to identify potential cancer treatments.

Subject(s)

Glioblastoma/drug therapy , Glioblastoma/genetics , Whole Genome Sequencing , Adult , Aged , Aged, 80 and over , Female , High-Throughput Nucleotide Sequencing , Humans , Male , Middle Aged , Molecular Targeted Therapy , Ploidies , Reproducibility of Results

20.

Genome-wide somatic variant calling using localized colored de Bruijn graphs.

Narzisi, Giuseppe; Corvelo, André; Arora, Kanika; Bergmann, Ewa A; Shah, Minita; Musunuri, Rajeeva; Emde, Anne-Katrin; Robine, Nicolas; Vacic, Vladimir; Zody, Michael C.

Commun Biol ; 1: 20, 2018.

Article in English | MEDLINE | ID: mdl-30271907

ABSTRACT

Reliable detection of somatic variations is of critical importance in cancer research. Here we present Lancet, an accurate and sensitive somatic variant caller, which detects SNVs and indels by jointly analyzing reads from tumor and matched normal samples using colored de Bruijn graphs. We demonstrate, through extensive experimental comparison on synthetic and real whole-genome sequencing datasets, that Lancet has better accuracy, especially for indel detection, than widely used somatic callers, such as MuTect, MuTect2, LoFreq, Strelka, and Strelka2. Lancet features a reliable variant scoring system, which is essential for variant prioritization, and detects low-frequency mutations without sacrificing the sensitivity to call longer insertions and deletions empowered by the local-assembly engine. In addition to genome-wide analysis, Lancet allows inspection of somatic variants in graph space, which augments the traditional read alignment visualization to help confirm a variant of interest. Lancet is available as an open-source program at https://github.com/nygenome/lancet.

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

Subject(s)

ABSTRACT

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL