RESUMO
RNA has the intrinsic property to base pair, forming complex structures fundamental to its diverse functions. Here, we develop PARIS, a method based on reversible psoralen crosslinking for global mapping of RNA duplexes with near base-pair resolution in living cells. PARIS analysis in three human and mouse cell types reveals frequent long-range structures, higher-order architectures, and RNA-RNA interactions in trans across the transcriptome. PARIS determines base-pairing interactions on an individual-molecule level, revealing pervasive alternative conformations. We used PARIS-determined helices to guide phylogenetic analysis of RNA structures and discovered conserved long-range and alternative structures. XIST, a long noncoding RNA (lncRNA) essential for X chromosome inactivation, folds into evolutionarily conserved RNA structural domains that span many kilobases. XIST A-repeat forms complex inter-repeat duplexes that nucleate higher-order assembly of the key epigenetic silencing protein SPEN. PARIS is a generally applicable and versatile method that provides novel insights into the RNA structurome and interactome. VIDEO ABSTRACT.
Assuntos
Ficusina/química , RNA de Cadeia Dupla/química , Animais , Pareamento de Bases , Células HEK293 , Células HeLa , Humanos , Camundongos , Células-Tronco Embrionárias Murinas , RNA Longo não Codificante/químicaRESUMO
The human mitochondrial genome comprises a distinct genetic system transcribed as precursor polycistronic transcripts that are subsequently cleaved to generate individual mRNAs, tRNAs, and rRNAs. Here, we provide a comprehensive analysis of the human mitochondrial transcriptome across multiple cell lines and tissues. Using directional deep sequencing and parallel analysis of RNA ends, we demonstrate wide variation in mitochondrial transcript abundance and precisely resolve transcript processing and maturation events. We identify previously undescribed transcripts, including small RNAs, and observe the enrichment of several nuclear RNAs in mitochondria. Using high-throughput in vivo DNaseI footprinting, we establish the global profile of DNA-binding protein occupancy across the mitochondrial genome at single-nucleotide resolution, revealing regulatory features at mitochondrial transcription initiation sites and functional insights into disease-associated variants. This integrated analysis of the mitochondrial transcriptome reveals unexpected complexity in the regulation, expression, and processing of mitochondrial RNA and provides a resource for future studies of mitochondrial function (accessed at http://mitochondria.matticklab.com).
Assuntos
Perfilação da Expressão Gênica , Mitocôndrias/genética , RNA/análise , Núcleo Celular/metabolismo , Pegada de DNA , Proteínas de Ligação a DNA/análise , Desoxirribonuclease I/metabolismo , Regulação da Expressão Gênica , Genoma Mitocondrial , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Região de Controle de Locus Gênico , Proteínas Mitocondriais/análise , Conformação de Ácido Nucleico , RNA/metabolismo , RNA Mitocondrial , Análise de Sequência de RNARESUMO
BACKGROUND: The Leishmania genome harbors formerly active short interspersed degenerated retroposons (SIDERs) representing the largest family of repetitive elements among trypanosomatids. Their substantial expansion in Leishmania is a strong predictor of important biological functions. In this study, we combined multilevel bioinformatic predictions with high-throughput genomic and transcriptomic analyses to gain novel insights into the diversified roles retroposons of the SIDER2 subfamily play in Leishmania genome evolution and expression. RESULTS: We show that SIDER2 retroposons form various evolutionary divergent clusters, each harboring homologous SIDER2 sequences usually located nearby in the linear sequence of chromosomes. This intriguing genomic organization underscores the importance of SIDER2 proximity in shaping chromosome dynamics and co-regulation. Accordingly, we show that transcripts belonging to the same SIDER2 cluster can display similar levels of expression. SIDER2 retroposons are mostly transcribed as part of 3'UTRs and account for 13% of the Leishmania transcriptome. Genome-wide expression profiling studies underscore SIDER2 association generally with low mRNA expression. The remarkable link of SIDER2 retroposons with downregulation of gene expression supports their co-option as major regulators of mRNA abundance. SIDER2 sequences also add to the diversification of the Leishmania gene expression repertoire since ~ 35% of SIDER2-containing transcripts can be differentially regulated throughout the parasite development, with a few encoding key virulence factors. In addition, we provide evidence for a functional bias of SIDER2-containing transcripts with protein kinase and transmembrane transporter activities being most represented. CONCLUSIONS: Altogether, these findings provide important conceptual advances into evolutionary innovations of transcribed extinct retroposons acting as major RNA cis-regulators.
Assuntos
Evolução Molecular , Leishmania , RNA Mensageiro , Retroelementos , Retroelementos/genética , Leishmania/genética , RNA Mensageiro/genética , RNA Mensageiro/metabolismo , Genoma de Protozoário , Regulação da Expressão Gênica no Desenvolvimento , RNA de Protozoário/genética , RNA de Protozoário/metabolismoRESUMO
IMPORTANCE: The lack of a reliable method to accurately detect when replication-competent HIV has been cleared is a major challenge in developing a cure. This study introduces a new approach called the HIVepsilon-seq (HIVε-seq) assay, which uses long-read sequencing technology and bioinformatics to scrutinize the HIV genome at the nucleotide level, distinguishing between defective and intact HIV. This study included 30 participants on antiretroviral therapy, including 17 women, and was able to discriminate between defective and genetically intact viruses at the single DNA strand level. The HIVε-seq assay is an improvement over previous methods, as it requires minimal sample, less specialized lab equipment, and offers a shorter turnaround time. The HIVε-seq assay offers a promising new tool for researchers to measure the intact HIV reservoir, advancing efforts towards finding a cure for this devastating disease.
Assuntos
Infecções por HIV , HIV , Provírus , Feminino , Humanos , Linfócitos T CD4-Positivos , DNA Viral/genética , Infecções por HIV/tratamento farmacológico , Infecções por HIV/epidemiologia , Infecções por HIV/virologia , Nucleotídeos , Provírus/genética , Carga Viral , Análise de Sequência de DNA , Masculino , Fatores Sexuais , HIV/genéticaRESUMO
Nanopore sequencing enables direct measurement of RNA molecules without conversion to cDNA, thus opening the gates to a new era for RNA biology. However, the lack of molecular barcoding of direct RNA nanopore sequencing data sets severely affects the applicability of this technology to biological samples, where RNA availability is often limited. Here, we provide the first experimental protocol and associated algorithm to barcode and demultiplex direct RNA nanopore sequencing data sets. Specifically, we present a novel and robust approach to accurately classify raw nanopore signal data by transforming current intensities into images or arrays of pixels, followed by classification using a deep learning algorithm. We demonstrate the power of this strategy by developing the first experimental protocol for barcoding and demultiplexing direct RNA sequencing libraries. Our method, DeePlexiCon, can classify 93% of reads with 95.1% accuracy or 60% of reads with 99.9% accuracy. The availability of an efficient and simple multiplexing strategy for native RNA sequencing will improve the cost-effectiveness of this technology, as well as facilitate the analysis of lower-input biological samples. Overall, our work exemplifies the power, simplicity, and robustness of signal-to-image conversion for nanopore data analysis using deep learning.
Assuntos
Aprendizado Profundo , Sequenciamento por Nanoporos/métodos , Análise de Sequência de RNA/métodos , AlgoritmosRESUMO
In vitro selection of remdesivir-resistant severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) revealed the emergence of a V166L substitution, located outside of the polymerase active site of the Nsp12 protein, after 9 passages of a single lineage. V166L remained the only Nsp12 substitution after 17 passages (10 µM remdesivir), conferring a 2.3-fold increase in 50% effective concentration (EC50). When V166L was introduced into a recombinant SARS-CoV-2 virus, a 1.5-fold increase in EC50 was observed, indicating a high in vitro barrier to remdesivir resistance.
Assuntos
Tratamento Farmacológico da COVID-19 , SARS-CoV-2 , Monofosfato de Adenosina/análogos & derivados , Monofosfato de Adenosina/química , Alanina/análogos & derivados , Alanina/metabolismo , Antivirais/química , HumanosRESUMO
Noncoding RNA has a proven ability to direct and regulate chromatin modifications by acting as scaffolds between DNA and histone-modifying complexes. However, it is unknown if ncRNA plays any role in DNA replication and epigenome maintenance, including histone eviction and reinstallment of histone modifications after genome duplication. Isolation of nascent chromatin has identified a large number of RNA-binding proteins in addition to unknown components of the replication and epigenetic maintenance machinery. Here, we isolated and characterized long and short RNAs associated with nascent chromatin at active replication forks and track RNA composition during chromatin maturation across the cell cycle. Shortly after fork passage, GA-rich-, alpha- and TElomeric Repeat-containing RNAs (TERRA) are associated with replicated DNA. These repeat containing RNAs arise from loci undergoing replication, suggesting an interaction in cis. Post-replication during chromatin maturation, and even after mitosis in G1, the repeats remain enriched on DNA. This suggests that specific types of repeat RNAs are transcribed shortly after DNA replication and stably associate with their loci of origin throughout the cell cycle. The presented method and data enable studies of RNA interactions with replication forks and post-replicative chromatin and provide insights into how repeat RNAs and their engagement with chromatin are regulated with respect to DNA replication and across the cell cycle.
Assuntos
Replicação do DNA/genética , DNA/genética , Processamento de Proteína Pós-Traducional/genética , RNA/genética , Ciclo Celular/genética , Linhagem Celular Tumoral , Cromatina/genética , Células HeLa , Histonas/genética , HumanosRESUMO
In hypersaline environments, Nanohaloarchaeota (Diapherotrites, Parvarchaeota, Aenigmarchaeota, Nanoarchaeota, Nanohaloarchaeota [DPANN] superphylum) are thought to be free-living microorganisms. We report cultivation of 2 strains of Antarctic Nanohaloarchaeota and show that they require the haloarchaeon Halorubrum lacusprofundi for growth. By performing growth using enrichments and fluorescence-activated cell sorting, we demonstrated successful cultivation of Candidatus Nanohaloarchaeum antarcticus, purification of Ca. Nha. antarcticus away from other species, and growth and verification of Ca. Nha. antarcticus with Hrr. lacusprofundi; these findings are analogous to those required for fulfilling Koch's postulates. We use fluorescent in situ hybridization and transmission electron microscopy to assess cell structures and interactions; metagenomics to characterize enrichment taxa, generate metagenome assembled genomes, and interrogate Antarctic communities; and proteomics to assess metabolic pathways and speculate about the roles of certain proteins. Metagenome analysis indicates the presence of a single species, which is endemic to Antarctic hypersaline systems that support the growth of haloarchaea. The presence of unusually large proteins predicted to function in attachment and invasion of hosts plus the absence of key biosynthetic pathways (e.g., lipids) in metagenome assembled genomes of globally distributed Nanohaloarchaeota indicate that all members of the lineage have evolved as symbionts. Our work expands the range of archaeal symbiotic lifestyles and provides a genetically tractable model system for advancing understanding of the factors controlling microbial symbiotic relationships.
Assuntos
Halorubrum/fisiologia , Metagenoma , Nanoarchaeota/fisiologia , Simbiose/fisiologia , Regiões Antárticas , DNA Arqueal/genética , DNA Arqueal/isolamento & purificação , Citometria de Fluxo , Genoma Arqueal/genética , Halorubrum/ultraestrutura , Metagenômica , Microscopia Eletrônica de Transmissão , Nanoarchaeota/ultraestrutura , Filogenia , SalinidadeRESUMO
BACKGROUND: Hepatitis C (HCV) and many other RNA viruses exist as rapidly mutating quasi-species populations in a single infected host. High throughput characterization of full genome, within-host variants is still not possible despite advances in next generation sequencing. This limitation constrains viral genomic studies that depend on accurate identification of hemi-genome or whole genome, within-host variants, especially those occurring at low frequencies. With the advent of third generation long read sequencing technologies, including Oxford Nanopore Technology (ONT) and PacBio platforms, this problem is potentially surmountable. ONT is particularly attractive in this regard due to the portable nature of the MinION sequencer, which makes real-time sequencing in remote and resource-limited locations possible. However, this technology (termed here 'nanopore sequencing') has a comparatively high technical error rate. The present study aimed to assess the utility, accuracy and cost-effectiveness of nanopore sequencing for HCV genomes. We also introduce a new bioinformatics tool (Nano-Q) to differentiate within-host variants from nanopore sequencing. RESULTS: The Nanopore platform, when the coverage exceeded 300 reads, generated comparable consensus sequences to Illumina sequencing. Using HCV Envelope plasmids (~ 1800 nt) mixed in known proportions, the capacity of nanopore sequencing to reliably identify variants with an abundance as low as 0.1% was demonstrated, provided the autologous reference sequence was available to identify the matching reads. Successful pooling and nanopore sequencing of 52 samples from patients with HCV infection demonstrated its cost effectiveness (AUD$ 43 per sample with nanopore sequencing versus $100 with paired-end short read technology). The Nano-Q tool successfully separated between-host sequences, including those from the same subtype, by bulk sorting and phylogenetic clustering without an autologous reference sequence (using only a subtype-specific generic reference). The pipeline also identified within-host viral variants and their abundance when the parameters were appropriately adjusted. CONCLUSION: Cost effective HCV whole genome sequencing and within-host variant identification without haplotype reconstruction are potential advantages of nanopore sequencing.
Assuntos
Hepatite C , Nanoporos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Filogenia , Análise de Sequência de DNA , Tecnologia , Sequenciamento Completo do GenomaRESUMO
BACKGROUND: Nanopore sequencing enables portable, real-time sequencing applications, including point-of-care diagnostics and in-the-field genotyping. Achieving these outcomes requires efficient bioinformatic algorithms for the analysis of raw nanopore signal data. However, comparing raw nanopore signals to a biological reference sequence is a computationally complex task. The dynamic programming algorithm called Adaptive Banded Event Alignment (ABEA) is a crucial step in polishing sequencing data and identifying non-standard nucleotides, such as measuring DNA methylation. Here, we parallelise and optimise an implementation of the ABEA algorithm (termed f5c) to efficiently run on heterogeneous CPU-GPU architectures. RESULTS: By optimising memory, computations and load balancing between CPU and GPU, we demonstrate how f5c can perform â¼3-5 × faster than an optimised version of the original CPU-only implementation of ABEA in the Nanopolish software package. We also show that f5c enables DNA methylation detection on-the-fly using an embedded System on Chip (SoC) equipped with GPUs. CONCLUSIONS: Our work not only demonstrates that complex genomics analyses can be performed on lightweight computing systems, but also benefits High-Performance Computing (HPC). The associated source code for f5c along with GPU optimised ABEA is available at https://github.com/hasindu2008/f5c .
Assuntos
Gráficos por Computador , Nanoporos , Processamento de Sinais Assistido por Computador , Algoritmos , Biologia Computacional , Bases de Dados como Assunto , Genoma Humano , Humanos , Análise de SequênciaRESUMO
Over 1250 mutations in SCN1A, the Nav1.1 voltage-gated sodium channel gene, are associated with seizure disorders including GEFS+. To evaluate how a specific mutation, independent of genetic background, causes seizure activity we generated two pairs of isogenic human iPSC lines by CRISPR/Cas9 gene editing. One pair is a control line from an unaffected sibling, and the mutated control carrying the GEFS+ K1270T SCN1A mutation. The second pair is a GEFS+ patient line with the K1270T mutation, and the corrected patient line. By comparing the electrophysiological properties in inhibitory and excitatory iPSC-derived neurons from these pairs, we found the K1270T mutation causes cell type-specific alterations in sodium current density and evoked firing, resulting in hyperactive neural networks. We also identified differences associated with genetic background and interaction between the mutation and genetic background. Comparisons within and between dual pairs of isogenic iPSC-derived neuronal cultures provide a novel platform for evaluating cellular mechanisms underlying a disease phenotype and for developing patient-specific anti-seizure therapies.
Assuntos
Epilepsia/genética , Canal de Sódio Disparado por Voltagem NAV1.1/genética , Neurônios , Genótipo , Humanos , Células-Tronco Pluripotentes Induzidas , Mutação , Fenótipo , Convulsões Febris/genéticaRESUMO
SUMMARY: The management of raw nanopore sequencing data poses a challenge that must be overcome to facilitate the creation of new bioinformatics algorithms predicated on signal analysis. SquiggleKit is a toolkit for manipulating and interrogating nanopore data that simplifies file handling, data extraction, visualization and signal processing. AVAILABILITY AND IMPLEMENTATION: SquiggleKit is cross platform and freely available from GitHub at (https://github.com/Psy-Fer/SquiggleKit). Detailed documentation can be found at (https://psy-fer.github.io/SquiggleKitDocs/). All tools have been designed to operate in python 2.7+, with minimal additional libraries. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Nanoporos , Algoritmos , Sequenciamento por Nanoporos , SoftwareRESUMO
RNA modifications have been historically considered as fine-tuning chemo-structural features of infrastructural RNAs, such as rRNAs, tRNAs, and snoRNAs. This view has changed dramatically in recent years, to a large extent as a result of systematic efforts to map and quantify various RNA modifications in a transcriptome-wide manner, revealing that RNA modifications are reversible, dynamically regulated, far more widespread than originally thought, and involved in major biological processes, including cell differentiation, sex determination, and stress responses. Here we summarize the state of knowledge and provide a catalog of RNA modifications and their links to neurological disorders, cancers, and other diseases. With the advent of direct RNA-sequencing technologies, we expect that this catalog will help prioritize those RNA modifications for transcriptome-wide maps.
Assuntos
Doença/genética , Processamento Pós-Transcricional do RNA , RNA/química , Animais , HumanosRESUMO
SUMMARY: The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. AVAILABILITY AND IMPLEMENTATION: Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. CONTACT: Denis.Bauer@csiro.au SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Automação Laboratorial , Humanos , SoftwareRESUMO
Evolutionarily conserved RNA secondary structures are a robust indicator of purifying selection and, consequently, molecular function. Evaluating their genome-wide occurrence through comparative genomics has consistently been plagued by high false-positive rates and divergent predictions. We present a novel benchmarking pipeline aimed at calibrating the precision of genome-wide scans for consensus RNA structure prediction. The benchmarking data obtained from two refined structure prediction algorithms, RNAz and SISSIz, were then analyzed to fine-tune the parameters of an optimized workflow for genomic sliding window screens. When applied to consistency-based multiple genome alignments of 35 mammals, our approach confidently identifies >4 million evolutionarily constrained RNA structures using a conservative sensitivity threshold that entails historically low false discovery rates for such analyses (5-22%). These predictions comprise 13.6% of the human genome, 88% of which fall outside any known sequence-constrained element, suggesting that a large proportion of the mammalian genome is functional. As an example, our findings identify both known and novel conserved RNA structure motifs in the long noncoding RNA MALAT1. This study provides an extensive set of functional transcriptomic annotations that will assist researchers in uncovering the precise mechanisms underlying the developmental ontologies of higher eukaryotes.
Assuntos
Algoritmos , Genômica/métodos , RNA Longo não Codificante/química , Sequência de Bases , Evolução Molecular , Genoma Humano , Humanos , Anotação de Sequência Molecular , Dados de Sequência Molecular , Conformação de Ácido NucleicoRESUMO
Pregnancy-induced noncoding RNA (PINC) and retinoblastoma-associated protein 46 (RbAp46) are upregulated in alveolar cells of the mammary gland during pregnancy and persist in alveolar cells that remain in the regressed lobules following involution. The cells that survive involution are thought to function as alveolar progenitor cells that rapidly differentiate into milk-producing cells in subsequent pregnancies, but it is unknown whether PINC and RbAp46 are involved in maintaining this progenitor population. Here, we show that, in the post-pubertal mouse mammary gland, mPINC is enriched in luminal and alveolar progenitors. mPINC levels increase throughout pregnancy and then decline in early lactation, when alveolar cells undergo terminal differentiation. Accordingly, mPINC expression is significantly decreased when HC11 mammary epithelial cells are induced to differentiate and produce milk proteins. This reduction in mPINC levels may be necessary for lactation, as overexpression of mPINC in HC11 cells blocks lactogenic differentiation, while knockdown of mPINC enhances differentiation. Finally, we demonstrate that mPINC interacts with RbAp46, as well as other members of the polycomb repressive complex 2 (PRC2), and identify potential targets of mPINC that are differentially expressed following modulation of mPINC expression levels. Taken together, our data suggest that mPINC inhibits terminal differentiation of alveolar cells during pregnancy to prevent abundant milk production and secretion until parturition. Additionally, a PRC2 complex that includes mPINC and RbAp46 may confer epigenetic modifications that maintain a population of mammary epithelial cells committed to the alveolar fate in the involuted gland.
Assuntos
Diferenciação Celular , Glândulas Mamárias Animais/metabolismo , Gravidez/metabolismo , RNA não Traduzido/metabolismo , Proteínas Repressoras/metabolismo , Proteína 7 de Ligação ao Retinoblastoma/metabolismo , Animais , Feminino , Técnicas de Silenciamento de Genes , Glândulas Mamárias Animais/citologia , Camundongos , Camundongos Endogâmicos BALB C , Proteínas do Grupo Polycomb , RNA não Traduzido/genética , RatosRESUMO
Hundreds of mutations in the SCN1A sodium channel gene confer a wide spectrum of epileptic disorders, requiring efficient model systems to study cellular mechanisms and identify potential therapeutic targets. We recently demonstrated that Drosophila knock-in flies carrying the K1270T SCN1A mutation known to cause a form of genetic epilepsy with febrile seizures plus (GEFS+) exhibit a heat-induced increase in sodium current activity and seizure phenotype. To determine whether different SCN1A mutations cause distinct phenotypes in Drosophila as they do in humans, this study focuses on a knock-in line carrying a mutation that causes a more severe seizure disorder termed Dravet syndrome (DS). Introduction of the DS SCN1A mutation (S1231R) into the Drosophila sodium channel gene para results in flies that exhibit spontaneous and heat-induced seizures with distinct characteristics and lower onset temperature than the GEFS+ flies. Electrophysiological studies of GABAergic interneurons in the brains of adult DS flies reveal, for the first time in an in vivo model system, that a missense DS mutation causes a constitutive and conditional reduction in sodium current activity and repetitive firing. In addition, feeding with the serotonin precursor 5-HTP suppresses heat-induced seizures in DS but not GEFS+ flies. The distinct alterations of sodium currents in DS and GEFS+ GABAergic interneurons demonstrate that both loss- and gain-of-function alterations in sodium currents are capable of causing reduced repetitive firing and seizure phenotypes. The mutation-specific effects of 5-HTP on heat-induced seizures suggest the serotonin pathway as a potential therapeutic target for DS.
Assuntos
Potenciais de Ação , Epilepsias Mioclônicas/genética , Canal de Sódio Disparado por Voltagem NAV1.1/genética , Sódio/metabolismo , 5-Hidroxitriptofano/metabolismo , Animais , Encéfalo/citologia , Encéfalo/metabolismo , Encéfalo/fisiopatologia , Drosophila/genética , Drosophila/metabolismo , Drosophila/fisiologia , Epilepsias Mioclônicas/metabolismo , Neurônios GABAérgicos/metabolismo , Neurônios GABAérgicos/fisiologia , Interneurônios/metabolismo , Interneurônios/fisiologia , Mutação de Sentido Incorreto , Canal de Sódio Disparado por Voltagem NAV1.1/metabolismo , Fenótipo , Serotonina/metabolismoRESUMO
Convolutional Neural Networks (CNNs) have been central to the Deep Learning revolution and played a key role in initiating the new age of Artificial Intelligence. However, in recent years newer architectures such as Transformers have dominated both research and practical applications. While CNNs still play critical roles in many of the newer developments such as Generative AI, they are far from being thoroughly understood and utilised to their full potential. Here we show that CNNs can recognise patterns in images with scattered pixels and can be used to analyse complex datasets by transforming them into pseudo images with minimal processing for any high dimensional dataset, representing a more general approach to the application of CNNs to datasets such as in molecular biology, text, and speech. We introduce a pipeline called DeepMapper, which allows analysis of very high dimensional datasets without intermediate filtering and dimension reduction, thus preserving the full texture of the data, enabling detection of small variations normally deemed 'noise'. We demonstrate that DeepMapper can identify very small perturbations in large datasets with mostly random variables, and that it is superior in speed and on par in accuracy to prior work in processing large datasets with large numbers of features.
RESUMO
ABSTRACT: Acute megakaryoblastic leukemia (AMKL) is a rare, developmentally restricted, and highly lethal cancer of early childhood. The paucity and hypocellularity (due to myelofibrosis) of primary patient samples hamper the discovery of cell- and genotype-specific treatments. AMKL is driven by mutually exclusive chimeric fusion oncogenes in two-thirds of the cases, with CBFA2T3::GLIS2 (CG2) and NUP98 fusions (NUP98r) representing the highest-fatality subgroups. We established CD34+ cord blood-derived CG2 models (n = 6) that sustain serial transplantation and recapitulate human leukemia regarding immunophenotype, leukemia-initiating cell frequencies, comutational landscape, and gene expression signature, with distinct upregulation of the prosurvival factor B-cell lymphoma 2 (BCL2). Cell membrane proteomic analyses highlighted CG2 surface markers preferentially expressed on leukemic cells compared with CD34+ cells (eg, NCAM1 and CD151). AMKL differentiation block in the mega-erythroid progenitor space was confirmed by single-cell profiling. Although CG2 cells were rather resistant to BCL2 genetic knockdown or selective pharmacological inhibition with venetoclax, they were vulnerable to strategies that target the megakaryocytic prosurvival factor BCL-XL (BCL2L1), including in vitro and in vivo treatment with BCL2/BCL-XL/BCL-W inhibitor navitoclax and DT2216, a selective BCL-XL proteolysis-targeting chimera degrader developed to limit thrombocytopenia in patients. NUP98r AMKL were also sensitive to BCL-XL inhibition but not the NUP98r monocytic leukemia, pointing to a lineage-specific dependency. Navitoclax or DT2216 treatment in combination with low-dose cytarabine further reduced leukemic burden in mice. This work extends the cellular and molecular diversity set of human AMKL models and uncovers BCL-XL as a therapeutic vulnerability in CG2 and NUP98r AMKL.