RESUMEN
Large panels of comprehensively characterized human cancer models, including the Cancer Cell Line Encyclopedia (CCLE), have provided a rigorous framework with which to study genetic variants, candidate targets, and small-molecule and biological therapeutics and to identify new marker-driven cancer dependencies. To improve our understanding of the molecular features that contribute to cancer phenotypes, including drug responses, here we have expanded the characterizations of cancer cell lines to include genetic, RNA splicing, DNA methylation, histone H3 modification, microRNA expression and reverse-phase protein array data for 1,072 cell lines from individuals of various lineages and ethnicities. Integration of these data with functional characterizations such as drug-sensitivity, short hairpin RNA knockdown and CRISPR-Cas9 knockout data reveals potential targets for cancer drugs and associated biomarkers. Together, this dataset and an accompanying public data portal provide a resource for the acceleration of cancer research using model cancer cell lines.
Asunto(s)
Línea Celular Tumoral , Neoplasias/genética , Neoplasias/patología , Antineoplásicos/farmacología , Biomarcadores de Tumor , Metilación de ADN , Resistencia a Antineoplásicos , Etnicidad/genética , Edición Génica , Histonas/metabolismo , Humanos , MicroARNs/genética , Terapia Molecular Dirigida , Neoplasias/metabolismo , Análisis por Matrices de Proteínas , Empalme del ARNRESUMEN
Type 2 innate lymphoid cells (ILC2s) both contribute to mucosal homeostasis and initiate pathologic inflammation in allergic asthma. However, the signals that direct ILC2s to promote homeostasis versus inflammation are unclear. To identify such molecular cues, we profiled mouse lung-resident ILCs using single-cell RNA sequencing at steady state and after in vivo stimulation with the alarmin cytokines IL-25 and IL-33. ILC2s were transcriptionally heterogeneous after activation, with subpopulations distinguished by expression of proliferative, homeostatic and effector genes. The neuropeptide receptor Nmur1 was preferentially expressed by ILC2s at steady state and after IL-25 stimulation. Neuromedin U (NMU), the ligand of NMUR1, activated ILC2s in vitro, and in vivo co-administration of NMU with IL-25 strongly amplified allergic inflammation. Loss of NMU-NMUR1 signalling reduced ILC2 frequency and effector function, and altered transcriptional programs following allergen challenge in vivo. Thus, NMUR1 signalling promotes inflammatory ILC2 responses, highlighting the importance of neuro-immune crosstalk in allergic inflammation at mucosal surfaces.
Asunto(s)
Hipersensibilidad/inmunología , Hipersensibilidad/patología , Inflamación/inmunología , Inflamación/patología , Pulmón/patología , Linfocitos/inmunología , Neuropéptidos/metabolismo , Animales , Femenino , Regulación de la Expresión Génica , Inmunidad Innata/inmunología , Interleucina-17/inmunología , Interleucina-33/inmunología , Ligandos , Pulmón/inmunología , Masculino , Ratones , Ratones Endogámicos C57BL , Receptores de Neurotransmisores/biosíntesis , Receptores de Neurotransmisores/genética , Receptores de Neurotransmisores/metabolismo , Mucosa Respiratoria/inmunología , Mucosa Respiratoria/patología , Transducción de Señal , Transcripción GenéticaRESUMEN
This corrects the article DOI: 10.1038/nature24029.
RESUMEN
A fundamental goal of genomics is to identify the complete set of expressed proteins. Automated annotation strategies rely on assumptions about protein-coding sequences (CDSs), e.g., they are conserved, do not overlap, and exceed a minimum length. However, an increasing number of newly discovered proteins violate these rules. Here we present an experimental and analytical framework, based on ribosome profiling and linear regression, for systematic identification and quantification of translation. Application of this approach to lipopolysaccharide-stimulated mouse dendritic cells and HCMV-infected human fibroblasts identifies thousands of novel CDSs, including micropeptides and variants of known proteins, that bear the hallmarks of canonical translation and exhibit translation levels and dynamics comparable to that of annotated CDSs. Remarkably, many translation events are identified in both mouse and human cells even when the peptide sequence is not conserved. Our work thus reveals an unexpected complexity to mammalian translation suited to provide both conserved regulatory or protein-based functions.
Asunto(s)
Proteoma/metabolismo , Proteómica/métodos , Ribosomas/metabolismo , Secuencia de Aminoácidos , Animales , Células Cultivadas , Secuencia Conservada , Células Dendríticas/efectos de los fármacos , Humanos , Lipopolisacáridos/farmacología , Ratones , Sistemas de Lectura Abierta , Análisis de RegresiónRESUMEN
Regenerative ability varies tremendously across species. A common feature of regeneration of appendages such as limbs, fins, antlers, and tails is the formation of a blastema-a transient structure that houses a pool of progenitor cells that can regenerate the missing tissue. We have identified the expression of von Willebrand factor D and EGF domains (vwde) as a common feature of blastemas capable of regenerating limbs and fins in a variety of highly regenerative species, including axolotl (Ambystoma mexicanum), lungfish (Lepidosiren paradoxa), and Polpyterus (Polypterus senegalus). Further, vwde expression is tightly linked to the ability to regenerate appendages in Xenopus laevis. Functional experiments demonstrate a requirement for vwde in regeneration and indicate that Vwde is a potent growth factor in the blastema. These data identify a key role for vwde in regenerating blastemas and underscore the power of an evolutionarily informed approach for identifying conserved genetic components of regeneration.
Asunto(s)
Ambystoma mexicanum/fisiología , Aletas de Animales/fisiología , Extremidades/fisiología , Peces/fisiología , Regeneración , Factor de von Willebrand/metabolismo , Animales , Evolución Biológica , Factor D del Complemento/metabolismo , Factor de Crecimiento Epidérmico/metabolismo , Evolución Molecular , Femenino , Masculino , Regeneración/genéticaRESUMEN
Humans and other mammals are limited in their natural abilities to regenerate lost body parts. By contrast, many salamanders are highly regenerative and can spontaneously replace lost limbs even as adults. Because salamander limbs are anatomically similar to human limbs, knowing how they regenerate should provide important clues for regenerative medicine. Although interest in understanding the mechanics of this process has never wavered, until recently researchers have been vexed by seemingly impenetrable logistics of working with these creatures at a molecular level. Chief among the problems has been the very large size of salamander genomes, and not a single salamander genome has been fully sequenced to date. Recently the enormous gap in sequence information has been bridged by approaches that leverage mRNA as the starting point. Together with functional experimentation, these data are rapidly enabling researchers to finally uncover the molecular mechanisms underpinning the astonishing biological process of limb regeneration.
Asunto(s)
Ambystoma mexicanum/fisiología , Extremidades/fisiología , Regeneración/genética , Ambystoma mexicanum/genética , Animales , Genoma , ARN Mensajero/genéticaRESUMEN
Both intrinsic cell state changes and variations in the composition of stem cell populations have been implicated as contributors to aging. We used single-cell RNA-seq to dissect variability in hematopoietic stem cell (HSC) and hematopoietic progenitor cell populations from young and old mice from two strains. We found that cell cycle dominates the variability within each population and that there is a lower frequency of cells in the G1 phase among old compared with young long-term HSCs, suggesting that they traverse through G1 faster. Moreover, transcriptional changes in HSCs during aging are inversely related to those upon HSC differentiation, such that old short-term (ST) HSCs resemble young long-term (LT-HSCs), suggesting that they exist in a less differentiated state. Our results indicate both compositional changes and intrinsic, population-wide changes with age and are consistent with a model where a relationship between cell cycle progression and self-renewal versus differentiation of HSCs is affected by aging and may contribute to the functional decline of old HSCs.
Asunto(s)
Ciclo Celular/genética , Diferenciación Celular/genética , Senescencia Celular/genética , Regulación de la Expresión Génica , Células Madre Hematopoyéticas/citología , Células Madre Hematopoyéticas/metabolismo , Factores de Edad , Animales , Biomarcadores , Análisis por Conglomerados , Biología Computacional/métodos , Femenino , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Ratones , Modelos Biológicos , Células Madre Multipotentes/citología , Células Madre Multipotentes/metabolismo , Especificidad de Órganos/genética , Fenotipo , Análisis de Secuencia de ARN , Análisis de la Célula Individual , Transcripción Genética , TranscriptomaRESUMEN
Oviparous reptile embryos are expected to breach their critical thermal maxima if temperatures reach those predicted under current climate change models due to the lack of the maternal buffering processes and parental care. Heat-shock proteins (HSPs) are integral in the molecular response to thermal stress, and their expression is heritable, but the roles of other candidate families such as the heat-shock factors (HSFs) have not been determined in reptiles. Here, we subject embryonic sea turtles (Caretta caretta) to a biologically realistic thermal stress and employ de novo transcriptomic profiling of brain tissue to investigate the underlying molecular response. From a reference transcriptome of 302 293 transcripts, 179 were identified as differentially expressed between treatments. As anticipated, genes enriched in the heat-shock treatment were primarily associated with the Hsp families, or were genes whose products play similar protein editing and chaperone functions (e.g. bag3, MYOC and serpinh1). Unexpectedly, genes encoding the HSFs were not significantly upregulated under thermal stress, indicating their presence in unstressed cells in an inactive state. Genes that were downregulated under thermal stress were less well functionally defined but were associated with stress response, development and cellular organization, suggesting that developmental processes may be compromised at realistically high temperatures. These results confirm that genes from the Hsp families play vital roles in the thermal tolerance of developing reptile embryos and, in addition with a number of other genes, should be targets for evaluating the capacity of oviparous reptiles to respond adaptively to the effects of climate change.
Asunto(s)
Regulación del Desarrollo de la Expresión Génica , Proteínas de Choque Térmico/genética , Respuesta al Choque Térmico/genética , Tortugas/embriología , Tortugas/genética , Animales , Cambio Climático , Genes del Desarrollo , CalorRESUMEN
Oomycetes in the class Saprolegniomycetidae of the Eukaryotic kingdom Stramenopila have evolved as severe pathogens of amphibians, crustaceans, fish and insects, resulting in major losses in aquaculture and damage to aquatic ecosystems. We have sequenced the 63 Mb genome of the fresh water fish pathogen, Saprolegnia parasitica. Approximately 1/3 of the assembled genome exhibits loss of heterozygosity, indicating an efficient mechanism for revealing new variation. Comparison of S. parasitica with plant pathogenic oomycetes suggests that during evolution the host cellular environment has driven distinct patterns of gene expansion and loss in the genomes of plant and animal pathogens. S. parasitica possesses one of the largest repertoires of proteases (270) among eukaryotes that are deployed in waves at different points during infection as determined from RNA-Seq data. In contrast, despite being capable of living saprotrophically, parasitism has led to loss of inorganic nitrogen and sulfur assimilation pathways, strikingly similar to losses in obligate plant pathogenic oomycetes and fungi. The large gene families that are hallmarks of plant pathogenic oomycetes such as Phytophthora appear to be lacking in S. parasitica, including those encoding RXLR effectors, Crinkler's, and Necrosis Inducing-Like Proteins (NLP). S. parasitica also has a very large kinome of 543 kinases, 10% of which is induced upon infection. Moreover, S. parasitica encodes several genes typical of animals or animal-pathogens and lacking from other oomycetes, including disintegrins and galactose-binding lectins, whose expression and evolutionary origins implicate horizontal gene transfer in the evolution of animal pathogenesis in S. parasitica.
Asunto(s)
Transferencia de Gen Horizontal , Interacciones Huésped-Parásitos/genética , Oomicetos/genética , Saprolegnia/genética , Virulencia/genética , Secuencia de Aminoácidos , Animales , Secuencia de Bases , Evolución Molecular , Peces/genética , Peces/parasitología , Genoma , Oomicetos/clasificación , Oomicetos/patogenicidad , Filogenia , Plantas/parasitología , Saprolegnia/clasificación , Saprolegnia/patogenicidadRESUMEN
Schistosoma mansoni is responsible for the neglected tropical disease schistosomiasis that affects 210 million people in 76 countries. Here we present analysis of the 363 megabase nuclear genome of the blood fluke. It encodes at least 11,809 genes, with an unusual intron size distribution, and new families of micro-exon genes that undergo frequent alternative splicing. As the first sequenced flatworm, and a representative of the Lophotrochozoa, it offers insights into early events in the evolution of the animals, including the development of a body pattern with bilateral symmetry, and the development of tissues into organs. Our analysis has been informed by the need to find new drug targets. The deficits in lipid metabolism that make schistosomes dependent on the host are revealed, and the identification of membrane receptors, ion channels and more than 300 proteases provide new insights into the biology of the life cycle and new targets. Bioinformatics approaches have identified metabolic chokepoints, and a chemogenomic screen has pinpointed schistosome proteins for which existing drugs may be active. The information generated provides an invaluable resource for the research community to develop much needed new control tools for the treatment and eradication of this important and neglected disease.
Asunto(s)
Genoma de los Helmintos/genética , Schistosoma mansoni/genética , Animales , Evolución Biológica , Exones/genética , Genes de Helminto/genética , Interacciones Huésped-Parásitos/genética , Intrones/genética , Datos de Secuencia Molecular , Mapeo Físico de Cromosoma , Schistosoma mansoni/efectos de los fármacos , Schistosoma mansoni/embriología , Schistosoma mansoni/fisiología , Esquistosomiasis mansoni/tratamiento farmacológico , Esquistosomiasis mansoni/parasitologíaRESUMEN
Phytophthora infestans is the most destructive pathogen of potato and a model organism for the oomycetes, a distinct lineage of fungus-like eukaryotes that are related to organisms such as brown algae and diatoms. As the agent of the Irish potato famine in the mid-nineteenth century, P. infestans has had a tremendous effect on human history, resulting in famine and population displacement. To this day, it affects world agriculture by causing the most destructive disease of potato, the fourth largest food crop and a critical alternative to the major cereal crops for feeding the world's population. Current annual worldwide potato crop losses due to late blight are conservatively estimated at $6.7 billion. Management of this devastating pathogen is challenged by its remarkable speed of adaptation to control strategies such as genetically resistant cultivars. Here we report the sequence of the P. infestans genome, which at approximately 240 megabases (Mb) is by far the largest and most complex genome sequenced so far in the chromalveolates. Its expansion results from a proliferation of repetitive DNA accounting for approximately 74% of the genome. Comparison with two other Phytophthora genomes showed rapid turnover and extensive expansion of specific families of secreted disease effector proteins, including many genes that are induced during infection or are predicted to have activities that alter host physiology. These fast-evolving effector genes are localized to highly dynamic and expanded regions of the P. infestans genome. This probably plays a crucial part in the rapid adaptability of the pathogen to host plants and underpins its evolutionary potential.
Asunto(s)
Genoma/genética , Phytophthora infestans/genética , Enfermedades de las Plantas/microbiología , Solanum tuberosum/microbiología , Proteínas Algáceas/genética , Elementos Transponibles de ADN/genética , ADN Intergénico/genética , Evolución Molecular , Interacciones Huésped-Patógeno/genética , Humanos , Irlanda , Datos de Secuencia Molecular , Necrosis , Fenotipo , Phytophthora infestans/patogenicidad , Enfermedades de las Plantas/inmunología , Solanum tuberosum/inmunología , InaniciónRESUMEN
The degree to which molecular epidemiology reveals information about the sources and transmission patterns of an outbreak depends on the resolution of the technology used and the samples studied. Isolates of Escherichia coli O104:H4 from the outbreak centered in Germany in May-July 2011, and the much smaller outbreak in southwest France in June 2011, were indistinguishable by standard tests. We report a molecular epidemiological analysis using multiplatform whole-genome sequencing and analysis of multiple isolates from the German and French outbreaks. Isolates from the German outbreak showed remarkably little diversity, with only two single nucleotide polymorphisms (SNPs) found in isolates from four individuals. Surprisingly, we found much greater diversity (19 SNPs) in isolates from seven individuals infected in the French outbreak. The German isolates form a clade within the more diverse French outbreak strains. Moreover, five isolates derived from a single infected individual from the French outbreak had extremely limited diversity. The striking difference in diversity between the German and French outbreak samples is consistent with several hypotheses, including a bottleneck that purged diversity in the German isolates, variation in mutation rates in the two E. coli outbreak populations, or uneven distribution of diversity in the seed populations that led to each outbreak.
Asunto(s)
Brotes de Enfermedades/estadística & datos numéricos , Infecciones por Escherichia coli/epidemiología , Infecciones por Escherichia coli/microbiología , Escherichia coli/genética , Escherichia coli/aislamiento & purificación , Infecciones por Escherichia coli/genética , Europa (Continente)/epidemiología , Humanos , Modelos Genéticos , Filogenia , Polimorfismo de Nucleótido Simple/genéticaRESUMEN
BACKGROUND: Genomic duplications constitute major events in the evolution of species, allowing paralogous copies of genes to take on fine-tuned biological roles. Unambiguously identifying the orthology relationship between copies across multiple genomes can be resolved by synteny, i.e. the conserved order of genomic sequences. However, a comprehensive analysis of duplication events and their contributions to evolution would require all-to-all genome alignments, which increases at N2 with the number of available genomes, N. RESULTS: Here, we introduce Kraken, software that omits the all-to-all requirement by recursively traversing a graph of pairwise alignments and dynamically re-computing orthology. Kraken scales linearly with the number of targeted genomes, N, which allows for including large numbers of genomes in analyses. We first evaluated the method on the set of 12 Drosophila genomes, finding that orthologous correspondence computed indirectly through a graph of multiple synteny maps comes at minimal cost in terms of sensitivity, but reduces overall computational runtime by an order of magnitude. We then used the method on three well-annotated mammalian genomes, human, mouse, and rat, and show that up to 93% of protein coding transcripts have unambiguous pairwise orthologous relationships across the genomes. On a nucleotide level, 70 to 83% of exons match exactly at both splice junctions, and up to 97% on at least one junction. We last applied Kraken to an RNA-sequencing dataset from multiple vertebrates and diverse tissues, where we confirmed that brain-specific gene family members, i.e. one-to-many or many-to-many homologs, are more highly correlated across species than single-copy (i.e. one-to-one homologous) genes. Not limited to protein coding genes, Kraken also identifies thousands of newly identified transcribed loci, likely non-coding RNAs that are consistently transcribed in human, chimpanzee and gorilla, and maintain significant correlation of expression levels across species. CONCLUSIONS: Kraken is a computational genome coordinate translator that facilitates cross-species comparisons, distinguishes orthologs from paralogs, and does not require costly all-to-all whole genome mappings. Kraken is freely available under LPGL from http://github.com/nedaz/kraken.
Asunto(s)
Genómica/métodos , Programas Informáticos , Animales , Mapeo Cromosómico , Drosophila melanogaster/genética , Evolución Molecular , Genoma/genética , Humanos , Ratones , Anotación de Secuencia Molecular , Ratas , Sintenía/genética , Transcripción GenéticaRESUMEN
Bacterial diversity among environmental samples is commonly assessed with PCR-amplified 16S rRNA gene (16S) sequences. Perceived diversity, however, can be influenced by sample preparation, primer selection, and formation of chimeric 16S amplification products. Chimeras are hybrid products between multiple parent sequences that can be falsely interpreted as novel organisms, thus inflating apparent diversity. We developed a new chimera detection tool called Chimera Slayer (CS). CS detects chimeras with greater sensitivity than previous methods, performs well on short sequences such as those produced by the 454 Life Sciences (Roche) Genome Sequencer, and can scale to large data sets. By benchmarking CS performance against sequences derived from a controlled DNA mixture of known organisms and a simulated chimera set, we provide insights into the factors that affect chimera formation such as sequence abundance, the extent of similarity between 16S genes, and PCR conditions. Chimeras were found to reproducibly form among independent amplifications and contributed to false perceptions of sample diversity and the false identification of novel taxa, with less-abundant species exhibiting chimera rates exceeding 70%. Shotgun metagenomic sequences of our mock community appear to be devoid of 16S chimeras, supporting a role for shotgun metagenomics in validating novel organisms discovered in targeted sequence surveys.
Asunto(s)
Artefactos , Bacterias/genética , ARN Ribosómico 16S/análisis , Bacterias/clasificación , Secuencia de Bases , Quimera/genética , ADN Bacteriano/análisis , ADN Bacteriano/genética , ADN Ribosómico/genética , Genómica , Datos de Secuencia Molecular , Técnicas de Amplificación de Ácido Nucleico/métodos , Reacción en Cadena de la Polimerasa/métodos , ARN Bacteriano/genética , Análisis de Secuencia de ADN/métodosRESUMEN
MOTIVATION: Kinases of the eukaryotic protein kinase superfamily are key regulators of most aspects eukaryotic cellular behavior and have provided several drug targets including kinases dysregulated in cancers. The rapid increase in the number of genomic sequences has created an acute need to identify and classify members of this important class of enzymes efficiently and accurately. RESULTS: Kinannote produces a draft kinome and comparative analyses for a predicted proteome using a single line command, and it is currently the only tool that automatically classifies protein kinases using the controlled vocabulary of Hanks and Hunter [Hanks and Hunter (1995)]. A hidden Markov model in combination with a position-specific scoring matrix is used by Kinannote to identify kinases, which are subsequently classified using a BLAST comparison with a local version of KinBase, the curated protein kinase dataset from www.kinase.com. Kinannote was tested on the predicted proteomes from four divergent species. The average sensitivity and precision for kinome retrieval from the test species are 94.4 and 96.8%. The ability of Kinannote to classify identified kinases was also evaluated, and the average sensitivity and precision for full classification of conserved kinases are 71.5 and 82.5%, respectively. Kinannote has had a significant impact on eukaryotic genome annotation, providing protein kinase annotations for 36 genomes made public by the Broad Institute in the period spanning 2009 to the present. AVAILABILITY: Kinannote is freely available at http://sourceforge.net/projects/kinannote.
Asunto(s)
Células Eucariotas/enzimología , Proteínas Quinasas/clasificación , Algoritmos , Genoma , Internet , Posición Específica de Matrices de Puntuación , Proteínas Quinasas/genética , Proteínas Quinasas/metabolismo , Proteoma/genética , Diseño de SoftwareRESUMEN
Paracoccidioides is a fungal pathogen and the cause of paracoccidioidomycosis, a health-threatening human systemic mycosis endemic to Latin America. Infection by Paracoccidioides, a dimorphic fungus in the order Onygenales, is coupled with a thermally regulated transition from a soil-dwelling filamentous form to a yeast-like pathogenic form. To better understand the genetic basis of growth and pathogenicity in Paracoccidioides, we sequenced the genomes of two strains of Paracoccidioides brasiliensis (Pb03 and Pb18) and one strain of Paracoccidioides lutzii (Pb01). These genomes range in size from 29.1 Mb to 32.9 Mb and encode 7,610 to 8,130 genes. To enable genetic studies, we mapped 94% of the P. brasiliensis Pb18 assembly onto five chromosomes. We characterized gene family content across Onygenales and related fungi, and within Paracoccidioides we found expansions of the fungal-specific kinase family FunK1. Additionally, the Onygenales have lost many genes involved in carbohydrate metabolism and fewer genes involved in protein metabolism, resulting in a higher ratio of proteases to carbohydrate active enzymes in the Onygenales than their relatives. To determine if gene content correlated with growth on different substrates, we screened the non-pathogenic onygenale Uncinocarpus reesii, which has orthologs for 91% of Paracoccidioides metabolic genes, for growth on 190 carbon sources. U. reesii showed growth on a limited range of carbohydrates, primarily basic plant sugars and cell wall components; this suggests that Onygenales, including dimorphic fungi, can degrade cellulosic plant material in the soil. In addition, U. reesii grew on gelatin and a wide range of dipeptides and amino acids, indicating a preference for proteinaceous growth substrates over carbohydrates, which may enable these fungi to also degrade animal biomass. These capabilities for degrading plant and animal substrates suggest a duality in lifestyle that could enable pathogenic species of Onygenales to transfer from soil to animal hosts.
Asunto(s)
Onygenales/genética , Paracoccidioides/genética , Paracoccidioidomicosis/microbiología , Proteínas Quinasas/genética , Metabolismo de los Hidratos de Carbono/genética , Sistemas de Liberación de Medicamentos , Evolución Molecular , Genoma Fúngico , Genoma Mitocondrial/genética , Humanos , Familia de Multigenes/genética , Onygenales/enzimología , Paracoccidioides/enzimología , Filogenia , Proteolisis , Secuencias Repetitivas de Ácidos Nucleicos/genética , Análisis de Secuencia de ADNRESUMEN
In cancer, genetic and transcriptomic variations generate clonal heterogeneity, possibly leading to treatment resistance. Long-read single-cell RNA sequencing (LR scRNA-seq) has the potential to detect genetic and transcriptomic variations simultaneously. Here, we present LongSom, a computational workflow leveraging LR scRNA-seq data to call de novo somatic single-nucleotide variants (SNVs), copy-number alterations (CNAs), and gene fusions to reconstruct the tumor clonal heterogeneity. For SNV calling, LongSom distinguishes somatic SNVs from germline polymorphisms by reannotating marker gene expression-based cell types using called variants and applying strict filters. Applying LongSom to ovarian cancer samples, we detected clinically relevant somatic SNVs that were validated against single-cell and bulk panel DNA-seq data and could not be detected with short-read (SR) scRNA-seq. Leveraging somatic SNVs and fusions, LongSom found subclones with different predicted treatment outcomes. In summary, LongSom enables de novo SNVs, CNAs, and fusions detection, thus enabling the study of cancer evolution, clonal heterogeneity, and treatment resistance.
RESUMEN
Gene fusions are found as cancer drivers in diverse adult and pediatric cancers. Accurate detection of fusion transcripts is essential in cancer clinical diagnostics, prognostics, and for guiding therapeutic development. Most currently available methods for fusion transcript detection are compatible with Illumina RNA-seq involving highly accurate short read sequences. Recent advances in long read isoform sequencing enable the detection of fusion transcripts at unprecedented resolution in bulk and single cell samples. Here we developed a new computational tool CTAT-LR-fusion to detect fusion transcripts from long read RNA-seq with or without companion short reads, with applications to bulk or single cell transcriptomes. We demonstrate that CTAT-LR-fusion exceeds fusion detection accuracy of alternative methods as benchmarked with simulated and real long read RNA-seq. Using short and long read RNA-seq, we further apply CTAT-LR-fusion to bulk transcriptomes of nine tumor cell lines, and to tumor single cells derived from a melanoma sample and three metastatic high grade serous ovarian carcinoma samples. In both bulk and in single cell RNA-seq, long isoform reads yielded higher sensitivity for fusion detection than short reads with notable exceptions. By combining short and long reads in CTAT-LR-fusion, we are able to further maximize detection of fusion splicing isoforms and fusion-expressing tumor cells. CTAT-LR-fusion is available at https://github.com/TrinityCTAT/CTAT-LR-fusion/wiki.
RESUMEN
HPV infections are associated with a fraction of vulvar cancers. Through hybridization capture and DNA sequencing, HPV DNA was detected in five of thirteen vulvar cancers. HPV16 DNA was integrated into human DNA in three of the five. The insertions were in introns of human NCKAP1, C5orf67, and LRP1B. Integrations in NCKAP1 and C5orf67 were flanked by short direct repeats in the human DNA, consistent with HPV DNA insertions at sites of abortive, staggered, endonucleolytic incisions. The insertion in C5orf67 was present as a 36 kbp, human-HPV-hetero-catemeric DNA as either an extrachromosomal circle or a tandem repeat within the human genome. The human circularization/repeat junction was defined at single nucleotide resolution. The integrated viral DNA segments all retained an intact upstream regulatory region and the adjacent viral E6 and E7 oncogenes. RNA sequencing revealed that the only HPV genes consistently transcribed from the integrated viral DNAs were E7 and E6*I. The other two HPV DNA+ tumors had coinfections, but no evidence for integration. HPV-positive and HPV-negative vulvar cancers exhibited contrasting human, global gene expression patterns partially overlapping with previously observed differences between HPV-positive and HPV-negative cervical and oropharyngeal cancers. A substantial fraction of the differentially expressed genes involved immune system function. Thus, transcription and HPV DNA integration in vulvar cancers resemble those in other HPV-positive cancers. This study emphasizes the power of hybridization capture coupled with DNA and RNA sequencing to identify a broad spectrum of HPV types, determine human genome integration status of viral DNAs, and elucidate their structures.
RESUMEN
As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.