Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 20 de 163
Filtrar
1.
Cell ; 166(3): 755-765, 2016 Jul 28.
Artículo en Inglés | MEDLINE | ID: mdl-27372738

RESUMEN

To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass-spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSCs). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease, such as how different copy-number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, and the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC. VIDEO ABSTRACT.


Asunto(s)
Proteínas de Neoplasias/genética , Neoplasias Quísticas, Mucinosas y Serosas/genética , Neoplasias Ováricas/genética , Proteoma , Acetilación , Inestabilidad Cromosómica , Reparación del ADN , ADN de Neoplasias , Femenino , Dosificación de Gen , Humanos , Espectrometría de Masas , Fosfoproteínas/genética , Procesamiento Proteico-Postraduccional , Análisis de Supervivencia
2.
Nature ; 616(7958): 798-805, 2023 04.
Artículo en Inglés | MEDLINE | ID: mdl-37046089

RESUMEN

Oncogene amplification on extrachromosomal DNA (ecDNA) drives the evolution of tumours and their resistance to treatment, and is associated with poor outcomes for patients with cancer1-6. At present, it is unclear whether ecDNA is a later manifestation of genomic instability, or whether it can be an early event in the transition from dysplasia to cancer. Here, to better understand the development of ecDNA, we analysed whole-genome sequencing (WGS) data from patients with oesophageal adenocarcinoma (EAC) or Barrett's oesophagus. These data included 206 biopsies in Barrett's oesophagus surveillance and EAC cohorts from Cambridge University. We also analysed WGS and histology data from biopsies that were collected across multiple regions at 2 time points from 80 patients in a case-control study at the Fred Hutchinson Cancer Center. In the Cambridge cohorts, the frequency of ecDNA increased between Barrett's-oesophagus-associated early-stage (24%) and late-stage (43%) EAC, suggesting that ecDNA is formed during cancer progression. In the cohort from the Fred Hutchinson Cancer Center, 33% of patients who developed EAC had at least one oesophageal biopsy with ecDNA before or at the diagnosis of EAC. In biopsies that were collected before cancer diagnosis, higher levels of ecDNA were present in samples from patients who later developed EAC than in samples from those who did not. We found that ecDNAs contained diverse collections of oncogenes and immunomodulatory genes. Furthermore, ecDNAs showed increases in copy number and structural complexity at more advanced stages of disease. Our findings show that ecDNA can develop early in the transition from high-grade dysplasia to cancer, and that ecDNAs progressively form and evolve under positive selection.


Asunto(s)
Adenocarcinoma , Esófago de Barrett , Carcinogénesis , ADN , Progresión de la Enfermedad , Detección Precoz del Cáncer , Neoplasias Esofágicas , Humanos , Adenocarcinoma/genética , Adenocarcinoma/patología , Esófago de Barrett/genética , Esófago de Barrett/patología , Estudios de Casos y Controles , ADN/genética , Neoplasias Esofágicas/genética , Neoplasias Esofágicas/patología , Carcinogénesis/genética , Secuenciación Completa del Genoma , Estudios de Cohortes , Biopsia , Oncogenes , Inmunomodulación , Variaciones en el Número de Copia de ADN , Amplificación de Genes , Detección Precoz del Cáncer/métodos
3.
Nature ; 619(7968): 176-183, 2023 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-37286593

RESUMEN

Chromosomal instability (CIN) and epigenetic alterations are characteristics of advanced and metastatic cancers1-4, but whether they are mechanistically linked is unknown. Here we show that missegregation of mitotic chromosomes, their sequestration in micronuclei5,6 and subsequent rupture of the micronuclear envelope7 profoundly disrupt normal histone post-translational modifications (PTMs), a phenomenon conserved across humans and mice, as well as in cancer and non-transformed cells. Some of the changes in histone PTMs occur because of the rupture of the micronuclear envelope, whereas others are inherited from mitotic abnormalities before the micronucleus is formed. Using orthogonal approaches, we demonstrate that micronuclei exhibit extensive differences in chromatin accessibility, with a strong positional bias between promoters and distal or intergenic regions, in line with observed redistributions of histone PTMs. Inducing CIN causes widespread epigenetic dysregulation, and chromosomes that transit in micronuclei experience heritable abnormalities in their accessibility long after they have been reincorporated into the primary nucleus. Thus, as well as altering genomic copy number, CIN promotes epigenetic reprogramming and heterogeneity in cancer.


Asunto(s)
Inestabilidad Cromosómica , Segregación Cromosómica , Cromosomas , Epigénesis Genética , Micronúcleos con Defecto Cromosómico , Neoplasias , Animales , Humanos , Ratones , Cromatina/genética , Inestabilidad Cromosómica/genética , Cromosomas/genética , Cromosomas/metabolismo , Histonas/química , Histonas/metabolismo , Neoplasias/genética , Neoplasias/patología , Mitosis , Variaciones en el Número de Copia de ADN , Procesamiento Proteico-Postraduccional
4.
Nature ; 602(7897): 510-517, 2022 02.
Artículo en Inglés | MEDLINE | ID: mdl-35140399

RESUMEN

Clustered somatic mutations are common in cancer genomes and previous analyses reveal several types of clustered single-base substitutions, which include doublet- and multi-base substitutions1-5, diffuse hypermutation termed omikli6, and longer strand-coordinated events termed kataegis3,7-9. Here we provide a comprehensive characterization of clustered substitutions and clustered small insertions and deletions (indels) across 2,583 whole-genome-sequenced cancers from 30 types of cancer10. Clustered mutations were highly enriched in driver genes and associated with differential gene expression and changes in overall survival. Several distinct mutational processes gave rise to clustered indels, including signatures that were enriched in tobacco smokers and homologous-recombination-deficient cancers. Doublet-base substitutions were caused by at least 12 mutational processes, whereas most multi-base substitutions were generated by either tobacco smoking or exposure to ultraviolet light. Omikli events, which have previously been attributed to APOBEC3 activity6, accounted for a large proportion of clustered substitutions; however, only 16.2% of omikli matched APOBEC3 patterns. Kataegis was generated by multiple mutational processes, and 76.1% of all kataegic events exhibited mutational patterns that are associated with the activation-induced deaminase (AID) and APOBEC3 family of deaminases. Co-occurrence of APOBEC3 kataegis and extrachromosomal DNA (ecDNA), termed kyklonas (Greek for cyclone), was found in 31% of samples with ecDNA. Multiple distinct kyklonic events were observed on most mutated ecDNA. ecDNA containing known cancer genes exhibited both positive selection and kyklonic hypermutation. Our results reveal the diversity of clustered mutational processes in human cancer and the role of APOBEC3 in recurrently mutating and fuelling the evolution of ecDNA.


Asunto(s)
Neoplasias , Desaminasas APOBEC/genética , Genoma , Humanos , Mutación INDEL , Mutagénesis/genética , Mutación , Neoplasias/genética
5.
Nature ; 600(7890): 731-736, 2021 12.
Artículo en Inglés | MEDLINE | ID: mdl-34819668

RESUMEN

Extrachromosomal DNA (ecDNA) is prevalent in human cancers and mediates high expression of oncogenes through gene amplification and altered gene regulation1. Gene induction typically involves cis-regulatory elements that contact and activate genes on the same chromosome2,3. Here we show that ecDNA hubs-clusters of around 10-100 ecDNAs within the nucleus-enable intermolecular enhancer-gene interactions to promote oncogene overexpression. ecDNAs that encode multiple distinct oncogenes form hubs in diverse cancer cell types and primary tumours. Each ecDNA is more likely to transcribe the oncogene when spatially clustered with additional ecDNAs. ecDNA hubs are tethered by the bromodomain and extraterminal domain (BET) protein BRD4 in a MYC-amplified colorectal cancer cell line. The BET inhibitor JQ1 disperses ecDNA hubs and preferentially inhibits ecDNA-derived-oncogene transcription. The BRD4-bound PVT1 promoter is ectopically fused to MYC and duplicated in ecDNA, receiving promiscuous enhancer input to drive potent expression of MYC. Furthermore, the PVT1 promoter on an exogenous episome suffices to mediate gene activation in trans by ecDNA hubs in a JQ1-sensitive manner. Systematic silencing of ecDNA enhancers by CRISPR interference reveals intermolecular enhancer-gene activation among multiple oncogene loci that are amplified on distinct ecDNAs. Thus, protein-tethered ecDNA hubs enable intermolecular transcriptional regulation and may serve as units of oncogene function and cooperative evolution and as potential targets for cancer therapy.


Asunto(s)
Neoplasias , Proteínas Nucleares , Azepinas/farmacología , Proteínas de Ciclo Celular/genética , Línea Celular Tumoral , Amplificación de Genes , Regulación Neoplásica de la Expresión Génica , Humanos , Neoplasias/genética , Proteínas Nucleares/genética , Oncogenes/genética , Factores de Transcripción/genética
6.
Proc Natl Acad Sci U S A ; 120(20): e2210991120, 2023 05 16.
Artículo en Inglés | MEDLINE | ID: mdl-37155843

RESUMEN

In 2021, the World Health Organization reclassified glioblastoma, the most common form of adult brain cancer, into isocitrate dehydrogenase (IDH)-wild-type glioblastomas and grade IV IDH mutant (G4 IDHm) astrocytomas. For both tumor types, intratumoral heterogeneity is a key contributor to therapeutic failure. To better define this heterogeneity, genome-wide chromatin accessibility and transcription profiles of clinical samples of glioblastomas and G4 IDHm astrocytomas were analyzed at single-cell resolution. These profiles afforded resolution of intratumoral genetic heterogeneity, including delineation of cell-to-cell variations in distinct cell states, focal gene amplifications, as well as extrachromosomal circular DNAs. Despite differences in IDH mutation status and significant intratumoral heterogeneity, the profiled tumor cells shared a common chromatin structure defined by open regions enriched for nuclear factor 1 transcription factors (NFIA and NFIB). Silencing of NFIA or NFIB suppressed in vitro and in vivo growths of patient-derived glioblastomas and G4 IDHm astrocytoma models. These findings suggest that despite distinct genotypes and cell states, glioblastoma/G4 astrocytoma cells share dependency on core transcriptional programs, yielding an attractive platform for addressing therapeutic challenges associated with intratumoral heterogeneity.


Asunto(s)
Astrocitoma , Neoplasias Encefálicas , Glioblastoma , Adulto , Humanos , Glioblastoma/genética , Glioblastoma/patología , Cromatina/genética , Transcriptoma , Astrocitoma/genética , Astrocitoma/patología , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patología , Mutación , Isocitrato Deshidrogenasa/genética , Isocitrato Deshidrogenasa/metabolismo
7.
Annu Rev Genomics Hum Genet ; 23: 29-52, 2022 08 31.
Artículo en Inglés | MEDLINE | ID: mdl-35609926

RESUMEN

In cancer, complex genome rearrangements and other structural alterations, including the amplification of oncogenes on circular extrachromosomal DNA (ecDNA) elements, drive the formation and progression of tumors. ecDNA is a particularly challenging structural alteration. By untethering oncogenes from chromosomal constraints, it elevates oncogene copy number, drives intratumoral genetic heterogeneity, promotes rapid tumor evolution, and results in treatment resistance. The profound changes in DNA shape and nuclear architecture generated by ecDNA alter the transcriptional landscape of tumors by catalyzing new types of regulatory interactions that do not occur on chromosomes. The current suite of tools for interrogating cancer genomes is well suited for deciphering sequence but has limited ability to resolve the complex changes in DNA structure and dynamics that ecDNA generates. Here, we review the challenges of resolving ecDNA form and function and discuss the emerging tool kit for deciphering ecDNA architecture and spatial organization, including what has been learned to date about how this dramatic change in shape alters tumor development, progression, and drug resistance.


Asunto(s)
Neoplasias , Oncogenes , Cromosomas , ADN/genética , Humanos , Neoplasias/genética , Neoplasias/patología
8.
Nature ; 569(7757): 570-575, 2019 05.
Artículo en Inglés | MEDLINE | ID: mdl-31019297

RESUMEN

Precision oncology hinges on linking tumour genotype with molecularly targeted drugs1; however, targeting the frequently dysregulated metabolic landscape of cancer has proven to be a major challenge2. Here we show that tissue context is the major determinant of dependence on the nicotinamide adenine dinucleotide (NAD) metabolic pathway in cancer. By analysing more than 7,000 tumours and 2,600 matched normal samples of 19 tissue types, coupled with mathematical modelling and extensive in vitro and in vivo analyses, we identify a simple and actionable set of 'rules'. If the rate-limiting enzyme of de novo NAD synthesis, NAPRT, is highly expressed in a normal tissue type, cancers that arise from that tissue will have a high frequency of NAPRT amplification and be completely and irreversibly dependent on NAPRT for survival. By contrast, tumours that arise from normal tissues that do not express NAPRT highly are entirely dependent on the NAD salvage pathway for survival. We identify the previously unknown enhancer that underlies this dependence. Amplification of NAPRT is shown to generate a pharmacologically actionable tumour cell dependence for survival. Dependence on another rate-limiting enzyme of the NAD synthesis pathway, NAMPT, as a result of enhancer remodelling is subject to resistance by NMRK1-dependent synthesis of NAD. These results identify a central role for tissue context in determining the choice of NAD biosynthetic pathway, explain the failure of NAMPT inhibitors, and pave the way for more effective treatments.


Asunto(s)
Elementos de Facilitación Genéticos/genética , Amplificación de Genes , NAD/metabolismo , Neoplasias/genética , Neoplasias/metabolismo , Animales , Ligasas de Carbono-Nitrógeno con Glutamina como Donante de Amida-N/metabolismo , Muerte Celular , Línea Celular Tumoral , Citocinas/antagonistas & inhibidores , Citocinas/genética , Citocinas/metabolismo , Epigénesis Genética , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Ratones , Neoplasias/enzimología , Nicotinamida Fosforribosiltransferasa/antagonistas & inhibidores , Nicotinamida Fosforribosiltransferasa/genética , Nicotinamida Fosforribosiltransferasa/metabolismo , Pentosiltransferasa/genética , Pentosiltransferasa/metabolismo , Fosfotransferasas (Aceptor de Grupo Alcohol)/metabolismo
9.
Nature ; 575(7784): 699-703, 2019 11.
Artículo en Inglés | MEDLINE | ID: mdl-31748743

RESUMEN

Oncogenes are commonly amplified on particles of extrachromosomal DNA (ecDNA) in cancer1,2, but our understanding of the structure of ecDNA and its effect on gene regulation is limited. Here, by integrating ultrastructural imaging, long-range optical mapping and computational analysis of whole-genome sequencing, we demonstrate the structure of circular ecDNA. Pan-cancer analyses reveal that oncogenes encoded on ecDNA are among the most highly expressed genes in the transcriptome of the tumours, linking increased copy number with high transcription levels. Quantitative assessment of the chromatin state reveals that although ecDNA is packaged into chromatin with intact domain structure, it lacks higher-order compaction that is typical of chromosomes and displays significantly enhanced chromatin accessibility. Furthermore, ecDNA is shown to have a significantly greater number of ultra-long-range interactions with active chromatin, which provides insight into how the structure of circular ecDNA affects oncogene function, and connects ecDNA biology with modern cancer genomics and epigenetics.


Asunto(s)
Cromatina/genética , ADN Circular/metabolismo , Regulación Neoplásica de la Expresión Génica/genética , Neoplasias/genética , Oncogenes/genética , Línea Celular Tumoral , Cromatina/química , ADN Circular/genética , Humanos , Microscopía Electrónica de Rastreo , Neoplasias/fisiopatología
10.
Hum Mol Genet ; 31(7): 1130-1140, 2022 03 31.
Artículo en Inglés | MEDLINE | ID: mdl-34718575

RESUMEN

The molecular mechanisms leading to high-altitude pulmonary hypertension (HAPH) remains poorly understood. We previously analyzed the whole genome sequence of Kyrgyz highland population and identified eight genomic intervals having a potential role in HAPH. Tropomodulin 3 gene (TMOD3), which encodes a protein that binds and caps the pointed ends of actin filaments and inhibits cell migration, was one of the top candidates. Here we systematically sought additional evidence to validate the functional role of TMOD3. In-silico analysis reveals that some of the SNPs in HAPH associated genomic intervals were positioned in a regulatory region that could result in alternative splicing of TMOD3. In order to functionally validate the role of TMOD3 in HAPH, we exposed Tmod3-/+ mice to 4 weeks of constant hypoxia, i.e. 10% O2 and analyzed both functional (hemodynamic measurements) and structural (angiography) parameters related to HAPH. The hemodynamic measurements, such as right ventricular systolic pressure, a surrogate measure for pulmonary arterial systolic pressure, and right ventricular contractility (RV- ± dP/dt), increases with hypoxia did not separate between Tmod3-/+ and control mice. Remarkably, there was a significant increase in the number of lung vascular branches and total length of pulmonary vascular branches (P < 0.001) in Tmod3-/+ after 4 weeks of constant hypoxia as compared with controls. Notably, the Tmod3-/+ endothelial cells migration was also significantly higher than that from the wild-type littermates. Our results indicate that, under chronic hypoxia, lower levels of Tmod3 play an important role in the maintenance or neo-vascularization of pulmonary arteries.


Asunto(s)
Células Endoteliales , Tropomodulina/metabolismo , Citoesqueleto de Actina/metabolismo , Animales , Células Endoteliales/metabolismo , Hipoxia/genética , Hipoxia/metabolismo , Pulmón/metabolismo , Ratones , Tropomodulina/química , Tropomodulina/genética
11.
Nature ; 543(7643): 122-125, 2017 03 02.
Artículo en Inglés | MEDLINE | ID: mdl-28178237

RESUMEN

Human cells have twenty-three pairs of chromosomes. In cancer, however, genes can be amplified in chromosomes or in circular extrachromosomal DNA (ecDNA), although the frequency and functional importance of ecDNA are not understood. We performed whole-genome sequencing, structural modelling and cytogenetic analyses of 17 different cancer types, including analysis of the structure and function of chromosomes during metaphase of 2,572 dividing cells, and developed a software package called ECdetect to conduct unbiased, integrated ecDNA detection and analysis. Here we show that ecDNA was found in nearly half of human cancers; its frequency varied by tumour type, but it was almost never found in normal cells. Driver oncogenes were amplified most commonly in ecDNA, thereby increasing transcript level. Mathematical modelling predicted that ecDNA amplification would increase oncogene copy number and intratumoural heterogeneity more effectively than chromosomal amplification. We validated these predictions by quantitative analyses of cancer samples. The results presented here suggest that ecDNA contributes to accelerated evolution in cancer.


Asunto(s)
Variaciones en el Número de Copia de ADN/genética , Evolución Molecular , Amplificación de Genes/genética , Heterogeneidad Genética , Modelos Genéticos , Neoplasias/genética , Oncogenes/genética , Cromosomas Humanos/genética , Análisis Citogenético , Análisis Mutacional de ADN , Genoma Humano/genética , Humanos , Metafase/genética , Neoplasias/clasificación , ARN Mensajero/análisis , ARN Neoplásico/genética , Reproducibilidad de los Resultados , Programas Informáticos
12.
PLoS Comput Biol ; 17(11): e1009449, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34780468

RESUMEN

The cost of sequencing the genome is dropping at a much faster rate compared to assembling and finishing the genome. The use of lightly sampled genomes (genome-skims) could be transformative for genomic ecology, and results using k-mers have shown the advantage of this approach in identification and phylogenetic placement of eukaryotic species. Here, we revisit the basic question of estimating genomic parameters such as genome length, coverage, and repeat structure, focusing specifically on estimating the k-mer repeat spectrum. We show using a mix of theoretical and empirical analysis that there are fundamental limitations to estimating the k-mer spectra due to ill-conditioned systems, and that has implications for other genomic parameters. We get around this problem using a novel constrained optimization approach (Spline Linear Programming), where the constraints are learned empirically. On reads simulated at 1X coverage from 66 genomes, our method, REPeat SPECTra Estimation (RESPECT), had 2.2% error in length estimation compared to 27% error previously achieved. In shotgun sequenced read samples with contaminants, RESPECT length estimates had median error 4%, in contrast to other methods that had median error 80%. Together, the results suggest that low-pass genomic sequencing can yield reliable estimates of the length and repeat content of the genome. The RESPECT software will be publicly available at https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_shahab-2Dsarmashghi_RESPECT.git&d=DwIGAw&c=-35OiAkTchMrZOngvJPOeA&r=ZozViWvD1E8PorCkfwYKYQMVKFoEcqLFm4Tg49XnPcA&m=f-xS8GMHKckknkc7Xpp8FJYw_ltUwz5frOw1a5pJ81EpdTOK8xhbYmrN4ZxniM96&s=717o8hLR1JmHFpRPSWG6xdUQTikyUjicjkipjFsKG4w&e=.


Asunto(s)
Algoritmos , Genoma , Genómica/estadística & datos numéricos , Secuencias Repetitivas de Ácidos Nucleicos , Programas Informáticos , Animales , Biología Computacional , Simulación por Computador , Bases de Datos Genéticas/estadística & datos numéricos , Humanos , Invertebrados/clasificación , Invertebrados/genética , Análisis de los Mínimos Cuadrados , Modelos Lineales , Mamíferos/clasificación , Mamíferos/genética , Modelos Genéticos , Filogenia , Plantas/clasificación , Plantas/genética , Vertebrados/clasificación , Vertebrados/genética
13.
Genome Res ; 28(11): 1709-1719, 2018 11.
Artículo en Inglés | MEDLINE | ID: mdl-30352806

RESUMEN

Whole-genome sequencing is increasingly used to identify Mendelian variants in clinical pipelines. These pipelines focus on single-nucleotide variants (SNVs) and also structural variants, while ignoring more complex repeat sequence variants. Here, we consider the problem of genotyping Variable Number Tandem Repeats (VNTRs), composed of inexact tandem duplications of short (6-100 bp) repeating units. VNTRs span 3% of the human genome, are frequently present in coding regions, and have been implicated in multiple Mendelian disorders. Although existing tools recognize VNTR carrying sequence, genotyping VNTRs (determining repeat unit count and sequence variation) from whole-genome sequencing reads remains challenging. We describe a method, adVNTR, that uses hidden Markov models to model each VNTR, count repeat units, and detect sequence variation. adVNTR models can be developed for short-read (Illumina) and single-molecule (Pacific Biosciences [PacBio]) whole-genome and whole-exome sequencing, and show good results on multiple simulated and real data sets.


Asunto(s)
Técnicas de Genotipaje/métodos , Repeticiones de Minisatélite , Genoma Humano , Humanos , Cadenas de Markov , Polimorfismo Genético
14.
Nat Methods ; 15(4): 279-282, 2018 04.
Artículo en Inglés | MEDLINE | ID: mdl-29457793

RESUMEN

Most approaches that capture signatures of selective sweeps in population genomics data do not identify the specific mutation favored by selection. We present iSAFE (for "integrated selection of allele favored by evolution"), a method that enables researchers to accurately pinpoint the favored mutation in a large region (∼5 Mbp) by using a statistic derived solely from population genetics signals. iSAFE does not require knowledge of demography, the phenotype under selection, or functional annotations of mutations.


Asunto(s)
Genómica , Técnicas de Amplificación de Ácido Nucleico , Hibridación de Ácido Nucleico/métodos , Alelos , Evolución Biológica , Haplotipos , Humanos , Mutación
15.
Genome Res ; 27(5): 801-812, 2017 05.
Artículo en Inglés | MEDLINE | ID: mdl-27940952

RESUMEN

Many tools have been developed for haplotype assembly-the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types-dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing-we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (∼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies.


Asunto(s)
Mapeo Contig/métodos , Genómica/métodos , Haplotipos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Mapeo Contig/normas , Genoma Humano , Genómica/normas , Humanos , Análisis de Secuencia de ADN/normas
16.
Mol Ecol ; 29(14): 2521-2534, 2020 07.
Artículo en Inglés | MEDLINE | ID: mdl-32542933

RESUMEN

Genetic tools are increasingly used to identify and discriminate between species. One key transition in this process was the recognition of the potential of the ca 658bp fragment of the organelle cytochrome c oxidase I (COI) as a barcode region, which revolutionized animal bioidentification and lead, among others, to the instigation of the Barcode of Life Database (BOLD), containing currently barcodes from >7.9 million specimens. Following this discovery, suggestions for other organellar regions and markers, and the primers with which to amplify them, have been continuously proposed. Most recently, the field has taken the leap from PCR-based generation of DNA references into shotgun sequencing-based "genome skimming" alternatives, with the ultimate goal of assembling organellar reference genomes. Unfortunately, in genome skimming approaches, much of the nuclear genome (as much as 99% of the sequence data) is discarded, which is not only wasteful, but can also limit the power of discrimination at, or below, the species level. Here, we advocate that the full shotgun sequence data can be used to assign an identity (that we term for convenience its "DNA-mark") for both voucher and query samples, without requiring any computationally intensive pretreatment (e.g. assembly) of reads. We argue that if reference databases are populated with such "DNA-marks," it will enable future DNA-based taxonomic identification to complement, or even replace PCR of barcodes with genome skimming, and we discuss how such methodology ultimately could enable identification to population, or even individual, level.


Asunto(s)
Código de Barras del ADN Taxonómico , ADN , Genómica/métodos , Animales , Cartilla de ADN , Bases de Datos Genéticas , Reacción en Cadena de la Polimerasa
17.
Nucleic Acids Res ; 46(7): 3309-3325, 2018 04 20.
Artículo en Inglés | MEDLINE | ID: mdl-29579309

RESUMEN

The integration of viral sequences into the host genome is an important driver of tumorigenesis in many viral mediated cancers, notably cervical cancer and hepatocellular carcinoma. We present ViFi, a computational method that combines phylogenetic methods with reference-based read mapping to detect viral integrations. In contrast with read-based reference mapping approaches, ViFi is faster, and shows high precision and sensitivity on both simulated and biological data, even when the integrated virus is a novel strain or highly mutated. We applied ViFi to matched genomic and mRNA data from 68 cervical cancer samples from TCGA and found high concordance between the two. Surprisingly, viral integration resulted in a dramatic transcriptional upregulation in all proximal elements, including LINEs and LTRs that are not normally transcribed. This upregulation is highly correlated with the presence of a viral gene fused with a downstream human element. Moreover, genomic rearrangements suggest the formation of apparent circular extrachromosomal (ecDNA) human-viral structures. Our results suggest the presence of apparent small circular fusion viral/human ecDNA, which correlates with indiscriminate and unregulated expression of proximal genomic elements, potentially contributing to the pathogenesis of HPV-associated cervical cancers. ViFi is available at https://github.com/namphuon/ViFi.


Asunto(s)
ADN Circular/química , Papillomaviridae/genética , Neoplasias del Cuello Uterino/genética , Integración Viral/genética , Biología Computacional/instrumentación , ADN Circular/genética , ADN Viral/química , ADN Viral/genética , Femenino , Regulación Neoplásica de la Expresión Génica , Genoma Humano/genética , Humanos , Elementos de Nucleótido Esparcido Largo/genética , Papillomaviridae/patogenicidad , ARN Mensajero/química , ARN Mensajero/genética , Secuencias Repetidas Terminales/genética , Transcripción Genética , Neoplasias del Cuello Uterino/patología , Neoplasias del Cuello Uterino/virología
18.
Proc Natl Acad Sci U S A ; 114(47): 12512-12517, 2017 11 21.
Artículo en Inglés | MEDLINE | ID: mdl-29078313

RESUMEN

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10-8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs.


Asunto(s)
Mapeo Contig/métodos , Genoma Humano , Haplotipos , Técnicas Analíticas Microfluídicas/métodos , Análisis de la Célula Individual/métodos , Secuenciación Completa del Genoma/métodos , Alelos , Línea Celular , Mapeo Contig/estadística & datos numéricos , Fibroblastos/citología , Fibroblastos/metabolismo , Antígenos HLA/genética , Antígenos HLA/metabolismo , Humanos , Técnicas Analíticas Microfluídicas/instrumentación , Mutación , Polimorfismo de Nucleótido Simple , Análisis de la Célula Individual/instrumentación , Secuenciación Completa del Genoma/instrumentación
19.
J Proteome Res ; 18(6): 2433-2445, 2019 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-31020842

RESUMEN

A high-quality genome annotation greatly facilitates successful cell line engineering. Standard draft genome annotation pipelines are based largely on de novo gene prediction, homology, and RNA-Seq data. However, draft annotations can suffer from incorrect predictions of translated sequence, inaccurate splice isoforms, and missing genes. Here, we generated a draft annotation for the newly assembled Chinese hamster genome and used RNA-Seq, proteomics, and Ribo-Seq to experimentally annotate the genome. We identified 3529 new proteins compared to the hamster RefSeq protein annotation and 2256 novel translational events (e.g., alternative splices, mutations, and novel splices). Finally, we used this pipeline to identify the source of translated retroviruses contaminating recombinant products from Chinese hamster ovary (CHO) cell lines, including 119 type-C retroviruses, thus enabling future efforts to eliminate retroviruses to reduce the costs incurred with retroviral particle clearance. In summary, the improved annotation provides a more accurate resource for CHO cell line engineering, by facilitating the interpretation of omics data, defining of cellular pathways, and engineering of complex phenotypes.


Asunto(s)
Cricetulus/genética , Genoma/genética , Proteogenómica , Proteómica/métodos , Animales , Células CHO , Cricetinae , Anotación de Secuencia Molecular/métodos , RNA-Seq/métodos , Análisis de Secuencia de ARN/métodos
20.
Mol Cell Proteomics ; 16(12): 2111-2124, 2017 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-29046389

RESUMEN

Immunotherapy is becoming increasingly important in the fight against cancers, using and manipulating the body's immune response to treat tumors. Understanding the immune repertoire-the collection of immunological proteins-of treated and untreated cells is possible at the genomic, but technically difficult at the protein level. Standard protein databases do not include the highly divergent sequences of somatic rearranged immunoglobulin genes, and may lead to miss identifications in a mass spectrometry search. We introduce a novel proteogenomic approach, AbScan, to identify these highly variable antibody peptides, by developing a customized antibody database construction method using RNA-seq reads aligned to immunoglobulin (Ig) genes.AbScan starts by filtering transcript (RNA-seq) reads that match the template for Ig genes. The retained reads are used to construct a repertoire graph using the "split" de Bruijn graph: a graph structure that improves on the standard de Bruijn graph to capture the high diversity of Ig genes in a compact manner. AbScan corrects for sequencing errors, and converts the graph to a format suitable for searching with MS/MS search tools. We used AbScan to create an antibody database from 90 RNA-seq colorectal tumor samples. Next, we used proteogenomic analysis to search MS/MS spectra of matched colorectal samples from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) against the AbScan generated database. AbScan identified 1,940 distinct antibody peptides. Correlating with previously identified Single Amino-Acid Variants (SAAVs) in the tumor samples, we identified 163 pairs (antibody peptide, SAAV) with significant cooccurrence pattern in the 90 samples. The presence of coexpressed antibody and mutated peptides was correlated with survival time of the individuals. Our results suggest that AbScan (https://github.com/csw407/AbScan.git) is an effective tool for a proteomic exploration of the immune response in cancers.


Asunto(s)
Neoplasias Colorrectales/inmunología , Genómica/métodos , Inmunoglobulinas/química , Péptidos/genética , Proteómica/métodos , Algoritmos , Línea Celular Tumoral , Neoplasias Colorrectales/genética , Bases de Datos Genéticas , Bases de Datos de Proteínas , Humanos , Inmunoglobulinas/genética , Péptidos/química , Análisis de Secuencia de ARN , Espectrometría de Masas en Tándem
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda