RESUMO
INTRODUCTION: Adamantinoma-like Ewing sarcoma (ALES) is a rare aggressive malignancy occasionally diagnosed in the thyroid gland. ALES shows basaloid cytomorphology, expresses keratins, p63, p40, frequently CD99, and harbours the t(11;22) EWSR1::FLI1 translocation. There is debate on whether ALES resembles more sarcoma or carcinoma. METHODS: We performed RNA sequencing from two ALES cases and compared findings with skeletal Ewing's sarcomas and nonneoplastic thyroid tissue. ALES was investigated by in situ hybridization (ISH) for high-risk human papillomavirus (HPV) DNA and immunohistochemistry for the following antigens: keratin 7, keratin 20, keratin 5, keratins (AE1/AE3 and CAM5.2), CD45, CD20, CD5, CD99, chromogranin, synaptophysin, calcitonin, thyroglobulin, PAX8, TTF1, S100, p40, p63, p16, NUT, desmin, ER, FLI1, INI1, and myogenin. RESULTS: An uncommon EWSR1::FLI transcript with retained EWSR1 exon 8 was detected in both ALES cases. Regulators of EWSR1::FLI1 splicing (HNRNPH1, SUPT6H, SF3B1) necessary for production of a functional fusion oncoprotein, as well as 53 genes (including TNNT1, NKX2.2) activated downstream to the EWSR1::FLI1 cascade, were overexpressed. Eighty-six genes were uniquely overexpressed in ALES, most of which were related to squamous differentiation. Immunohistochemically, ALES strongly expressed keratins 5, AE1/AE3 and CAM5.2, p63, p40, p16, and focally CD99. INI1 was retained. The remaining immunostains and HPV DNA ISH were negative. CONCLUSION: Comparative transcriptomic profiling reveals overlapping features of ALES with skeletal Ewing's sarcoma and an epithelial carcinoma, as evidenced by immunohistochemical expression of keratin 5, p63, p40, CD99, the transcriptome profile, and detection of EWSR1::FLI1 fusion transcript by RNA sequencing.
Assuntos
Adamantinoma , Carcinoma , Infecções por Papillomavirus , Sarcoma de Ewing , Humanos , Sarcoma de Ewing/diagnóstico , Sarcoma de Ewing/genética , Adamantinoma/diagnóstico , Adamantinoma/genética , Adamantinoma/química , Glândula Tireoide/patologia , Transcriptoma , Queratina-5/metabolismo , Proteína EWS de Ligação a RNA/genética , Proteína EWS de Ligação a RNA/metabolismo , Fatores de Transcrição/genética , Proteínas de Fusão Oncogênica/genética , Proteínas de Fusão Oncogênica/metabolismoRESUMO
Primary aneurysmal bone cyst (ABC) is a benign multiloculated cystic lesion of bone that is defined cytogenetically by USP6 gene rearrangements. Rearrangements involving USP6 are promoter swaps, usually generated by fusion of the noncoding upstream exons of different partner genes with exon 1 or 2 of USP6, thus leading to transcriptional upregulation of full-length USP6 coding sequence. Testing for USP6 rearrangements is used diagnostically to distinguish it from secondary ABC and other giant cell-rich primary bone tumors. In this report, we present a case of a 16-year-old male with a primary ABC of the left distal femur. USP6 break apart fluorescence in situ hybridization was positive for a rearrangement and conventional chromosome analysis identified a reciprocal X;17 translocation. In order to identify the putative USP6 fusion partner, we performed RNA sequencing and uncovered a novel USP9X-USP6 promoter swap fusion. This result was confirmed by reverse transcription-polymerase chain reaction (RT-PCR) and by mate pair sequencing thus showing the utility of these alternative methodologies in identifying novel fusion candidates. Ubiquitin-specific protease 9X (USP9X), like USP6, encodes a highly conserved substrate-specific deubiquitylating enzyme. USP9X is highly expressed in a number of tissue types and acts as both an oncogene and tumor suppressor in several human cancers. We conclude that oncogenic activation of USP6 via USP9X promoter exchange represents a novel driver of primary ABC formation.
Assuntos
Cistos Ósseos Aneurismáticos/diagnóstico , Cistos Ósseos Aneurismáticos/genética , Predisposição Genética para Doença , Proteínas de Fusão Oncogênica/genética , Regiões Promotoras Genéticas , Proteínas Proto-Oncogênicas/genética , Ubiquitina Tiolesterase/genética , Adolescente , Biomarcadores Tumorais , Biópsia , Bandeamento Cromossômico , Biologia Computacional/métodos , Estudos de Associação Genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Hibridização in Situ Fluorescente , Imageamento por Ressonância Magnética , Masculino , Tomografia Computadorizada por Raios XRESUMO
BACKGROUND: Archived formalin fixed paraffin embedded (FFPE) samples are valuable clinical resources to examine clinically relevant morphology features and also to study genetic changes. However, DNA quality and quantity of FFPE samples are often sub-optimal, and resulting NGS-based genetics variant detections are prone to false positives. Evaluations of wet-lab and bioinformatics approaches are needed to optimize variant detection from FFPE samples. RESULTS: As a pilot study, we designed within-subject triplicate samples of DNA derived from paired FFPE and fresh frozen breast tissues to highlight FFPE-specific artifacts. For FFPE samples, we tested two FFPE DNA extraction methods to determine impact of wet-lab procedures on variant calling: QIAGEN QIAamp DNA Mini Kit ("QA"), and QIAGEN GeneRead DNA FFPE Kit ("QGR"). We also used negative-control (NA12891) and positive control samples (Horizon Discovery Reference Standard FFPE). All DNA sample libraries were prepared for NGS according to the QIAseq Human Breast Cancer Targeted DNA Panel protocol and sequenced on the HiSeq 4000. Variant calling and filtering were performed using QIAGEN Gene Globe Data Portal. Detailed variant concordance comparisons and mutational signature analysis were performed to investigate effects of FFPE samples compared to paired fresh frozen samples, along with different DNA extraction methods. In this study, we found that five times or more variants were called with FFPE samples, compared to their paired fresh-frozen tissue samples even after applying molecular barcoding error-correction and default bioinformatics filtering recommended by the vendor. We also found that QGR as an optimized FFPE-DNA extraction approach leads to much fewer discordant variants between paired fresh frozen and FFPE samples. Approximately 92% of the uniquely called FFPE variants were of low allelic frequency range (< 5%), and collectively shared a "C > T|G > A" mutational signature known to be representative of FFPE artifacts resulting from cytosine deamination. Based on control samples and FFPE-frozen replicates, we derived an effective filtering strategy with associated empirical false-discovery estimates. CONCLUSIONS: Through this study, we demonstrated feasibility of calling and filtering genetic variants from FFPE tissue samples using a combined strategy with molecular barcodes, optimized DNA extraction, and bioinformatics methods incorporating genomics context such as mutational signature and variant allelic frequency.
Assuntos
Neoplasias da Mama/genética , Análise Mutacional de DNA/métodos , DNA de Neoplasias/isolamento & purificação , Mama/química , Feminino , Fixadores , Formaldeído , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Inclusão em Parafina , Fixação de TecidosRESUMO
Peripheral T-cell lymphomas (PTCLs) represent a heterogeneous group of T-cell malignancies that generally demonstrate aggressive clinical behavior, often are refractory to standard therapy, and remain significantly understudied. The most common World Health Organization subtype is PTCL, not otherwise specified (NOS), essentially a "wastebasket" category because of inadequate understanding to assign cases to a more specific diagnostic entity. Identification of novel fusion genes has contributed significantly to improving the classification, biologic understanding, and therapeutic targeting of PTCLs. Here, we integrated mate-pair DNA and RNA next-generation sequencing to identify chromosomal rearrangements encoding expressed fusion transcripts in PTCL, NOS. Two of 11 cases had novel fusions involving VAV1, encoding a truncated form of the VAV1 guanine nucleotide exchange factor important in T-cell receptor signaling. Fluorescence in situ hybridization studies identified VAV1 rearrangements in 10 of 148 PTCLs (7%). These were observed exclusively in PTCL, NOS (11%) and anaplastic large cell lymphoma (11%). In vitro, ectopic expression of a VAV1 fusion promoted cell growth and migration in a RAC1-dependent manner. This growth was inhibited by azathioprine, a clinically available RAC1 inhibitor. We also identified novel kinase gene fusions, ITK-FER and IKZF2-ERBB4, as candidate therapeutic targets that show similarities to known recurrent oncogenic ITK-SYK fusions and ERBB4 transcript variants in PTCLs, respectively. Additional novel and potentially clinically relevant fusions also were discovered. Together, these findings identify VAV1 fusions as recurrent and targetable events in PTCLs and highlight the potential for clinical sequencing to guide individualized therapy approaches for this group of aggressive malignancies.
Assuntos
Linfoma de Células T Periférico/genética , Proteínas de Fusão Oncogênica/genética , Idoso , Animais , Feminino , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Células Jurkat , Linfoma de Células T Periférico/metabolismo , Masculino , Camundongos , Pessoa de Meia-Idade , Células NIH 3T3 , Proteínas de Fusão Oncogênica/metabolismoRESUMO
Gastroblastoma is a rare distinctive biphasic tumor of the stomach. The molecular biology of gastroblastoma has not been studied, and no affirmative diagnostic markers have been developed. We retrieved two gastroblastomas from the consultation practices of the authors and performed transcriptome sequencing on formalin-fixed paraffin-embedded tissue. Recurrent predicted fusion genes were validated at genomic and RNA levels. The presence of the fusion gene was confirmed on two additional paraffin-embedded cases of gastroblastoma. Control cases of histologic mimics (biphasic synovial sarcoma, leiomyoma, leiomyosarcoma, desmoid-type fibromatosis, EWSR1-FLI1-positive Ewing sarcoma, Wilms' tumor, gastrointestinal stromal tumor, plexiform fibromyxoma, Sonic hedgehog-type medulloblastomas, and normal gastric mucosa and muscularis propria were also analyzed. The gastroblastomas affected two males and two females aged 9-56 years. Transcriptome sequencing identified recurrent somatic MALAT1-GLI1 fusion genes, which were predicted to retain the key domains of GLI1. The MALAT1-GLI1 fusion gene was validated by break-apart and dual-fusion FISH and RT-PCR. The additional two gastroblastomas were also positive for the MALAT1-GLI1 fusion gene. None of the other control cases harbored MALAT1-GLI1. Overexpression of GLI1 in the cases of gastroblastomas was confirmed at RNA and protein levels. Pathway analysis revealed activation of the Sonic hedgehog pathway in gastroblastoma and gene expression profiling showed that gastroblastomas grouped together and were most similar to Sonic hedgehog-type medulloblastomas. In summary, we have identified an oncogenic MALAT1-GLI1 fusion gene in all cases of gastroblastoma that may serve as a diagnostic biomarker. The fusion gene is predicted to encode a protein that includes the zinc finger domains of GLI1 and results in overexpression of GLI1 protein and activation of the Sonic hedgehog pathway.
Assuntos
Neoplasias Complexas Mistas/genética , Proteínas de Fusão Oncogênica/genética , RNA Longo não Codificante/genética , Neoplasias Gástricas/genética , Proteína GLI1 em Dedos de Zinco/genética , Adulto , Criança , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias Complexas Mistas/patologia , Neoplasias Gástricas/patologiaRESUMO
To determine early somatic changes in high-grade serous ovarian cancer (HGSOC), we performed whole genome sequencing on a rare collection of 16 low stage HGSOCs. The majority showed extensive structural alterations (one had an ultramutated profile), exhibited high levels of p53 immunoreactivity, and harboured a TP53 mutation, deletion or inactivation. BRCA1 and BRCA2 mutations were observed in two tumors, with nine showing evidence of a homologous recombination (HR) defect. Combined Analysis with The Cancer Genome Atlas (TCGA) indicated that low and late stage HGSOCs have similar mutation and copy number profiles. We also found evidence that deleterious TP53 mutations are the earliest events, followed by deletions or loss of heterozygosity (LOH) of chromosomes carrying TP53, BRCA1 or BRCA2. Inactivation of HR appears to be an early event, as 62.5% of tumours showed a LOH pattern suggestive of HR defects. Three tumours with the highest ploidy had little genome-wide LOH, yet one of these had a homozygous somatic frame-shift BRCA2 mutation, suggesting that some carcinomas begin as tetraploid then descend into diploidy accompanied by genome-wide LOH. Lastly, we found evidence that structural variants (SV) cluster in HGSOC, but are absent in one ultramutated tumor, providing insights into the pathogenesis of low stage HGSOC.
Assuntos
Genes p53 , Mutação , Neoplasias Ovarianas/genética , Reparo de DNA por Recombinação , Tetraploidia , Carcinoma/genética , DNA Primase/genética , Feminino , Humanos , Perda de Heterozigosidade , Taxa de MutaçãoRESUMO
BACKGROUND: RNA-seq is a well-established method for studying the transcriptome. Popular methods for library preparation in RNA-seq such as Illumina TruSeq® RNA v2 kit use a poly-A pulldown strategy. Such methods can cause loss of coverage at the 5' end of genes, impacting the ability to detect fusions when used on degraded samples. The goal of this study was to quantify the effects RNA degradation has on fusion detection when using poly-A selected mRNA and to identify the variables involved in this process. RESULTS: Using both artificially and naturally degraded samples, we found that there is a reduced ability to detect fusions as the distance of the breakpoint from the 3' end of the gene increases. The median transcript coverage decreases exponentially as a function of the distance from the 3' end and there is a linear relationship between the coverage decay rate and the RNA integrity number (RIN). Based on these findings we developed plots that show the probability of detecting a gene fusion ("sensitivity") as a function of the distance of the fusion breakpoint from the 3' end. CONCLUSIONS: This study developed a strategy to assess the impact that RNA degradation has on the ability to detect gene fusions by RNA-seq.
Assuntos
Estabilidade de RNA , RNA/genética , Recombinação Genética , Linhagem Celular Tumoral , Pontos de Quebra do Cromossomo , Proteínas de Fusão bcr-abl/genética , Biblioteca Gênica , Humanos , Leucemia Mielogênica Crônica BCR-ABL Positiva/genética , RNA/metabolismo , RNA Mensageiro/genética , Análise de Sequência de RNARESUMO
UNLABELLED: Cholangiocarcinoma (CCA) is a lethal hepatobiliary neoplasm originating from the biliary apparatus. In humans, CCA risk factors include hepatobiliary inflammation and fibrosis. The recently identified interleukin (IL)-1 family member, IL-33, has been shown to be a biliary mitogen which also promotes liver inflammation and fibrosis. Our aim was to generate a mouse model of CCA mimicking the human disease. Ectopic oncogene expression in the biliary tract was accomplished by the Sleeping Beauty transposon transfection system with transduction of constitutively active AKT (myr-AKT) and Yes-associated protein. Intrabiliary instillation of the transposon-transposase complex was coupled with lobar bile duct ligation in C57BL/6 mice, followed by administration of IL-33 for 3 consecutive days. Tumors developed in 72% of the male mice receiving both oncogenes plus IL-33 by 10 weeks but in only 20% of the male mice transduced with the oncogenes alone. Tumors expressed SOX9 and pancytokeratin (features of CCA) but were negative for HepPar1 (a marker of hepatocellular carcinoma). Substantive overlap with human CCA specimens was revealed by RNA profiling. Not only did IL-33 induce IL-6 expression by human cholangiocytes but it likely facilitated tumor development in vivo by an IL-6-sensitive process as tumor development was significantly attenuated in Il-6(-/-) male animals. Furthermore, tumor formation occurred at a similar rate when IL-6 was substituted for IL-33 in this model. CONCLUSION: The transposase-mediated transduction of constitutively active AKT and Yes-associated protein in the biliary epithelium coupled with lobar obstruction and IL-33 administration results in the development of CCA with morphological and biochemical features of the human disease; this model highlights the role of inflammatory cytokines in CCA oncogenesis.
Assuntos
Neoplasias dos Ductos Biliares/genética , Ductos Biliares Intra-Hepáticos , Colangiocarcinoma/genética , Interleucina-6/fisiologia , Interleucinas/fisiologia , Oncogenes/fisiologia , Proteínas Adaptadoras de Transdução de Sinal/fisiologia , Animais , Proteínas de Ciclo Celular , Humanos , Interleucina-33 , Masculino , Camundongos , Camundongos Endogâmicos C57BL , Modelos Genéticos , Fosfoproteínas/fisiologia , Proteínas Proto-Oncogênicas c-akt/fisiologia , Células Tumorais Cultivadas , Proteínas de Sinalização YAPRESUMO
Immunoglobulin light chain (LC) amyloidosis (AL) is caused by deposition of clonal LCs produced by an underlying plasma cell neoplasm. The clonotypic LC sequences are unique to each patient, and they cannot be reliably detected by either immunoassays or standard proteomic workflows that target the constant regions of LCs. We addressed this issue by developing a novel sequence template-based workflow to detect LC variable (LCV) region peptides directly from AL amyloid deposits. The workflow was implemented in a CAP/CLIA compliant clinical laboratory dedicated to proteomic subtyping of amyloid deposits extracted from either formalin-fixed paraffin-embedded tissues or subcutaneous fat aspirates. We evaluated the performance of the workflow on a validation cohort of 30 AL patients, whose amyloidogenic clone was identified using a novel proteogenomics method, and 30 controls. The recall and negative predictive values of the workflow, when identifying the gene family of the AL clone, were 93 and 98%, respectively. Application of the workflow on a clinical cohort of 500 AL amyloidosis samples highlighted a bias in the LCV gene families used by the AL clones. We also detected similarity between AL clones deposited in multiple organs of systemic AL patients. In summary, AL proteomic data sets are rich in LCV region peptides of potential clinical significance that are recoverable with advanced bioinformatics.
Assuntos
Amiloide/metabolismo , Amiloidose/diagnóstico , Amiloidose/metabolismo , Cadeias Leves de Imunoglobulina/metabolismo , Região Variável de Imunoglobulina/metabolismo , Peptídeos/isolamento & purificação , Proteômica/métodos , Estudos de Coortes , Biologia Computacional , Humanos , Peptídeos/metabolismoRESUMO
Cholangiocytes are the target of a heterogeneous group of liver diseases known as the cholangiopathies. An evolving understanding of the mechanisms driving biliary development provides the theoretical underpinnings for rational development of induced pluripotent stem cell (iPSC)-derived cholangiocytes (iDCs). Therefore, the aims of this study were to develop an approach to generate iDCs and to fully characterize the cells in vitro and in vivo. Human iPSC lines were generated by forced expression of the Yamanaka pluripotency factors. We then pursued a stepwise differentiation strategy toward iDCs, using precise temporal exposure to key biliary morphogens, and we characterized the cells, using a variety of morphologic, molecular, cell biologic, functional, and in vivo approaches. Morphology shows a stepwise phenotypic change toward an epithelial monolayer. Molecular analysis during differentiation shows appropriate enrichment in markers of iPSC, definitive endoderm, hepatic specification, hepatic progenitors, and ultimately cholangiocytes. Immunostaining, western blotting, and flow cytometry demonstrate enrichment of multiple functionally relevant biliary proteins. RNA sequencing reveals that the transcriptome moves progressively toward that of human cholangiocytes. iDCs generate intracellular calcium signaling in response to ATP, form intact primary cilia, and self-assemble into duct-like structures in three-dimensional culture. In vivo, the cells engraft within mouse liver, following retrograde intrabiliary infusion. In summary, we have developed a novel approach to generate mature cholangiocytes from iPSCs. In addition to providing a model of biliary differentiation, iDCs represent a platform for in vitro disease modeling, pharmacologic testing, and individualized, cell-based, regenerative therapies for the cholangiopathies.
Assuntos
Ductos Biliares/citologia , Células Epiteliais/citologia , Células-Tronco Pluripotentes Induzidas/citologia , Animais , Ductos Biliares/química , Ductos Biliares/metabolismo , Biomarcadores/análise , Biomarcadores/metabolismo , Sinalização do Cálcio , Diferenciação Celular , Engenharia Celular , Linhagem Celular , Células Epiteliais/química , Células Epiteliais/metabolismo , Humanos , Células-Tronco Pluripotentes Induzidas/química , Células-Tronco Pluripotentes Induzidas/metabolismo , Fígado/química , Fígado/citologia , Fígado/metabolismo , Camundongos , Reação em Cadeia da Polimerase em Tempo RealRESUMO
UNLABELLED: Intron-containing mRNAs are subject to restricted nuclear export in higher eukaryotes. Retroviral replication requires the nucleocytoplasmic transport of both spliced and unspliced RNA transcripts, and RNA export mechanisms of gammaretroviruses are poorly characterized. Here, we report the involvement of the nuclear export receptor NXF1/TAP in the nuclear export of gammaretroviral RNA transcripts. We identified a conserved cis-acting element in the pol gene of gammaretroviruses, including murine leukemia virus (MLV) and xenotropic murine leukemia virus (XMRV), named the CAE (cytoplasmic accumulation element). The CAE enhanced the cytoplasmic accumulation of viral RNA transcripts and the expression of viral proteins without significantly affecting the stability, splicing, or translation efficiency of the transcripts. Insertion of the CAE sequence also facilitated Rev-independent HIV Gag expression. We found that the CAE sequence interacted with NXF1, whereas disruption of NXF1 ablated CAE function. Thus, the CAE sequence mediates the cytoplasmic accumulation of gammaretroviral transcripts in an NXF1-dependent manner. Disruption of NXF1 expression impaired cytoplasmic accumulations of both spliced and unspliced RNA transcripts of XMRV and MLV, resulting in their nuclear retention or degradation. Thus, our results demonstrate that gammaretroviruses use NXF1 for the cytoplasmic accumulation of both spliced and nonspliced viral RNA transcripts. IMPORTANCE: Murine leukemia virus (MLV) has been studied as one of the classic models of retrovirology. Although unspliced host messenger RNAs are rarely exported from the nucleus, MLV actively exports unspliced viral RNAs to the cytoplasm. Despite extensive studies, how MLV achieves this difficult task has remained a mystery. Here, we have studied the RNA export mechanism of MLV and found that (i) the genome contains a sequence which supports the efficient nuclear export of viral RNAs, (ii) the cellular factor NXF1 is involved in the nuclear export of both spliced and unspliced viral RNAs, and, finally, (iii) depletion of NXF1 results in nuclear retention or degradation of viral RNAs. Our study provides a novel insight into MLV nuclear export.
Assuntos
Vírus da Leucemia Murina/metabolismo , Proteínas de Transporte Nucleocitoplasmático/metabolismo , Splicing de RNA , RNA Viral/metabolismo , Infecções por Retroviridae/veterinária , Doenças dos Roedores/metabolismo , Transporte Ativo do Núcleo Celular , Animais , Sequência de Bases , Linhagem Celular , Núcleo Celular/genética , Núcleo Celular/metabolismo , Núcleo Celular/virologia , Produtos do Gene rev/genética , Produtos do Gene rev/metabolismo , Vírus da Leucemia Murina/genética , Camundongos , Dados de Sequência Molecular , Proteínas de Transporte Nucleocitoplasmático/genética , RNA Viral/genética , Infecções por Retroviridae/genética , Infecções por Retroviridae/metabolismo , Infecções por Retroviridae/virologia , Doenças dos Roedores/genética , Doenças dos Roedores/virologiaRESUMO
MOTIVATION: RNA-seq has become the method of choice to quantify genes and exons, discover novel transcripts and detect fusion genes. However, reliable variant identification from RNA-seq data remains challenging because of the complexities of the transcriptome, the challenges of accurately mapping exon boundary spanning reads and the bias introduced during the sequencing library preparation. METHOD: We developed RVboost, a novel method specific for RNA variant prioritization. RVboost uses several attributes unique in the process of RNA library preparation, sequencing and RNA-seq data analyses. It uses a boosting method to train a model of 'good quality' variants using common variants from HapMap, and prioritizes and calls the RNA variants based on the trained model. We packaged RVboost in a comprehensive workflow, which integrates tools of variant calling, annotation and filtering. RESULTS: RVboost consistently outperforms the variant quality score recalibration from the Genome Analysis Tool Kit and the RNA-seq variant-calling pipeline SNPiR in 12 RNA-seq samples using ground-truth variants from paired exome sequencing data. Several RNA-seq-specific attributes were identified as critical to differentiate true and false variants, including the distance of the variant positions to exon boundaries, and the percent of the reads supporting the variant in the first six base pairs. The latter identifies false variants introduced by the random hexamer priming during the library construction. AVAILABILITY AND IMPLEMENTATION: The RVboost package is implemented to readily run in Mac or Linux environments. The software and user manual are available at http://bioinformaticstools.mayo.edu/research/rvboost/.
Assuntos
Variação Genética , Análise de Sequência de RNA/métodos , Software , Exoma , Éxons , Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala/métodosAssuntos
Gastroenteropatias/diagnóstico , Gastroenteropatias/genética , Fusão Gênica , Janus Quinase 2/genética , Transtornos Linfoproliferativos/diagnóstico , Transtornos Linfoproliferativos/genética , Fator de Transcrição STAT3/genética , Linfócitos T/metabolismo , Gastroenteropatias/metabolismo , Rearranjo Gênico , Estudos de Associação Genética , Predisposição Genética para Doença , Humanos , Janus Quinase 2/metabolismo , Transtornos Linfoproliferativos/metabolismo , Fator de Transcrição STAT3/metabolismo , Linfócitos T/patologiaRESUMO
BACKGROUND: Although the costs of next generation sequencing technology have decreased over the past years, there is still a lack of simple-to-use applications, for a comprehensive analysis of RNA sequencing data. There is no one-stop shop for transcriptomic genomics. We have developed MAP-RSeq, a comprehensive computational workflow that can be used for obtaining genomic features from transcriptomic sequencing data, for any genome. RESULTS: For optimization of tools and parameters, MAP-RSeq was validated using both simulated and real datasets. MAP-RSeq workflow consists of six major modules such as alignment of reads, quality assessment of reads, gene expression assessment and exon read counting, identification of expressed single nucleotide variants (SNVs), detection of fusion transcripts, summarization of transcriptomics data and final report. This workflow is available for Human transcriptome analysis and can be easily adapted and used for other genomes. Several clinical and research projects at the Mayo Clinic have applied the MAP-RSeq workflow for RNA-Seq studies. The results from MAP-RSeq have thus far enabled clinicians and researchers to understand the transcriptomic landscape of diseases for better diagnosis and treatment of patients. CONCLUSIONS: Our software provides gene counts, exon counts, fusion candidates, expressed single nucleotide variants, mapping statistics, visualizations, and a detailed research data report for RNA-Seq. The workflow can be executed on a standalone virtual machine or on a parallel Sun Grid Engine cluster. The software can be downloaded from http://bioinformaticstools.mayo.edu/research/maprseq/.
Assuntos
Perfilação da Expressão Gênica , Genômica/métodos , Instalações de Saúde , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Análise de Sequência de RNA/métodos , Software , Sequência de Bases , Éxons/genética , HumanosRESUMO
Mitochondrial-plastid interdependence within the plant cell is presumed to be essential, but measurable demonstration of this intimate interaction is difficult. At the level of cellular metabolism, several biosynthetic pathways involve both mitochondrial- and plastid-localized steps. However, at an environmental response level, it is not clear how the two organelles intersect in programmed cellular responses. Here, we provide evidence, using genetic perturbation of the MutS Homolog1 (MSH1) nuclear gene in five plant species, that MSH1 functions within the mitochondrion and plastid to influence organellar genome behavior and plant growth patterns. The mitochondrial form of the protein participates in DNA recombination surveillance, with disruption of the gene resulting in enhanced mitochondrial genome recombination at numerous repeated sequences. The plastid-localized form of the protein interacts with the plastid genome and influences genome stability and plastid development, with its disruption leading to variegation of the plant. These developmental changes include altered patterns of nuclear gene expression. Consistency of plastid and mitochondrial response across both monocot and dicot species indicate that the dual-functioning nature of MSH1 is well conserved. Variegated tissues show changes in redox status together with enhanced plant survival and reproduction under photooxidative light conditions, evidence that the plastid changes triggered in this study comprise an adaptive response to naturally occurring light stress.
Assuntos
Proteínas de Arabidopsis/metabolismo , Cloroplastos/metabolismo , Luz , Magnoliopsida/efeitos da radiação , Mitocôndrias/metabolismo , Proteína MutS de Ligação de DNA com Erro de Pareamento/metabolismo , Estresse Oxidativo , DNA de Plantas/genética , Regulação da Expressão Gênica de Plantas , Teste de Complementação Genética , Genoma de Cloroplastos , Genoma Mitocondrial , Instabilidade Genômica , Magnoliopsida/genética , Magnoliopsida/fisiologia , Análise de Sequência com Séries de Oligonucleotídeos , Oxirredução , Folhas de Planta/genética , Folhas de Planta/fisiologia , Plantas Geneticamente Modificadas/genética , Plantas Geneticamente Modificadas/fisiologia , Plantas Geneticamente Modificadas/efeitos da radiação , Quinonas/análise , Recombinação GenéticaRESUMO
The accurate detection of point mutations from pathology slides using sequencing data is of great importance in cancer genomics and precision oncology. Formalin-fixation paraffin-embedding (FFPE) is a widely used technique to preserve pathology tissues. The FFPE process introduces artificial C > T mutations in next-generation sequencing, so we set out to develop excerno, a method to score and filter such spurious variants. FFPE mutational artifacts follow a mutational signature. By using the FFPE signature and Bayes' formula, we can calculate the probability of a mutation resulting from the FFPE process and use this probability to filter FFPE variants. We implement this method as the excerno R package. We tested excerno by simulating mutations across all 60-baseline mutational signatures from the Catalog of Somatic Mutations in Cancer (COSMIC) and combining them with mutations following the FFPE mutational signature. The sensitivity and specificity of excerno are adversely affected by the cosine similarity between the baseline and FFPE signatures (cosFFPE). Higher percentages of FFPE mutations (pctFFPE) result in increased sensitivity and reduced specificity. The specificity and sensitivity of excerno can be predicted as linear model with an interaction term using cosFFPE and pctFFPE, with an R2of0.84 and 0.79, respectively. Finally, we tested excerno using six RNA sequencing cancer samples and observed concordant trends of specificity and sensitivity with respect to our simulated data. The excerno R package can be used to annotate and filter FFPE-induced mutations in cancer genomics. Our method is adversely affected by cosFFPE and pctFFPE.
Assuntos
Neoplasias , Humanos , Neoplasias/genética , Teorema de Bayes , Fixação de Tecidos , Medicina de Precisão , Mutação , Formaldeído , Sequenciamento de Nucleotídeos em Larga Escala/métodosRESUMO
BACKGROUND: The mitochondrial genome of higher plants is unusually dynamic, with recombination and nonhomologous end-joining (NHEJ) activities producing variability in size and organization. Plant mitochondrial DNA also generally displays much lower nucleotide substitution rates than mammalian or yeast systems. Arabidopsis displays these features and expedites characterization of the mitochondrial recombination surveillance gene MSH1 (MutS 1 homolog), lending itself to detailed study of de novo mitochondrial genome activity. In the present study, we investigated the underlying basis for unusual plant features as they contribute to rapid mitochondrial genome evolution. RESULTS: We obtained evidence of double-strand break (DSB) repair, including NHEJ, sequence deletions and mitochondrial asymmetric recombination activity in Arabidopsis wild-type and msh1 mutants on the basis of data generated by Illumina deep sequencing and confirmed by DNA gel blot analysis. On a larger scale, with mitochondrial comparisons across 72 Arabidopsis ecotypes, similar evidence of DSB repair activity differentiated ecotypes. Forty-seven repeat pairs were active in DNA exchange in the msh1 mutant. Recombination sites showed asymmetrical DNA exchange within lengths of 50- to 556-bp sharing sequence identity as low as 85%. De novo asymmetrical recombination involved heteroduplex formation, gene conversion and mismatch repair activities. Substoichiometric shifting by asymmetrical exchange created the appearance of rapid sequence gain and loss in association with particular repeat classes. CONCLUSIONS: Extensive mitochondrial genomic variation within a single plant species derives largely from DSB activity and its repair. Observed gene conversion and mismatch repair activity contribute to the low nucleotide substitution rates seen in these genomes. On a phenotypic level, these patterns of rearrangement likely contribute to the reproductive versatility of higher plants.
Assuntos
Arabidopsis/genética , Quebras de DNA de Cadeia Dupla , Reparo do DNA/genética , Evolução Molecular , Genoma Mitocondrial/genética , Genoma de Planta/genética , Proteínas de Arabidopsis/genética , Sequência de Bases , Reparo de Erro de Pareamento de DNA/genética , DNA Mitocondrial/genética , Ecótipo , Rearranjo Gênico/genética , Genes de Plantas/genética , Modelos Genéticos , Dados de Sequência Molecular , Proteína MutS de Ligação de DNA com Erro de Pareamento/genética , Mutação/genética , Filogenia , Reação em Cadeia da Polimerase , Polimorfismo de Nucleotídeo Único/genética , Recombinação Genética/genética , Sequências Repetitivas de Ácido Nucleico/genética , Análise de Sequência de DNARESUMO
BACKGROUND: Next-generation sequencing provides comprehensive information about individuals' genetic makeup and is commonplace in precision oncology practice. Due to the heterogeneity of individual patient's disease conditions and treatment journeys, not all targeted therapies were initiated despite actionable mutations. To better understand and support the clinical decision-making process in precision oncology, there is a need to examine real-world associations between patients' genetic information and treatment choices. METHODS: To fill the gap of insufficient use of real-world data (RWD) in electronic health records (EHRs), we generated a single Resource Description Framework (RDF) resource, called PO2RDF (precision oncology to RDF), by integrating information regarding genes, variants, diseases, and drugs from genetic reports and EHRs. RESULTS: There are a total 2,309,014 triples contained in the PO2RDF. Among them, 32,815 triples are related to Gene, 34,695 triples are related to Variant, 8,787 triples are related to Disease, 26,154 triples are related to Drug. We performed two use case analyses to demonstrate the usability of the PO2RDF: (1) we examined real-world associations between EGFR mutations and targeted therapies to confirm existing knowledge and detect off-label use. (2) We examined differences in prognosis for lung cancer patients with/without TP53 mutations. CONCLUSIONS: In conclusion, our work proposed to use RDF to organize and distribute clinical RWD that is otherwise inaccessible externally. Our work serves as a pilot study that will lead to new clinical applications and could ultimately stimulate progress in the field of precision oncology.
Assuntos
Neoplasias , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Oncologia , Neoplasias/tratamento farmacológico , Neoplasias/genética , Projetos Piloto , Medicina de PrecisãoRESUMO
The differentiation of B cells into antibody secreting plasma cells (PCs) is governed by a strict regulatory network that results in expression of specific transcriptomes along the activation continuum. In vitro models yielding significant numbers of PCs phenotypically identical to the in vivo state enable investigation of pathways, metabolomes, and non-coding (ncRNAs) not previously identified. The objective of our study was to characterize ncRNA expression during human B cell activation and differentiation. To achieve this, we used an in vitro system and performed RNA-seq on resting and activated B cells and PCs. Characterization of coding gene transcripts, including immunoglobulin (Ig), validated our system and also demonstrated that memory B cells preferentially differentiated into PCs. Importantly, we identified more than 980 ncRNA transcripts that are differentially expressed across the stages of activation and differentiation, some of which are known to target transcription, proliferation, cytoskeletal, autophagy and proteasome pathways. Interestingly, ncRNAs located within Ig loci may be targeting both Ig and non-Ig-related transcripts. ncRNAs associated with B cell malignancies were also identified. Taken together, this system provides a platform to study the role of specific ncRNAs in B cell differentiation and altered expression of those ncRNAs involved in B cell malignancies.
RESUMO
Topological data analysis (TDA) is a powerful method for reducing data dimensionality, mining underlying data relationships, and intuitively representing the data structure. The Mapper algorithm is one such tool that projects high-dimensional data to 1-dimensional space by using a filter function that is subsequently used to reconstruct the data topology relationships. However, domain context information and prior knowledge have not been considered in current TDA modeling frameworks. Here, we report the development and evaluation of a semi-supervised topological analysis (STA) framework that incorporates discrete or continuously labeled data points and selects the most relevant filter functions accordingly. We validate the proposed STA framework with simulation data and then apply it to samples from Genotype-Tissue Expression data and ovarian cancer transcriptome datasets. The graphs generated by STA for these 2 datasets, based on gene expression profiles, are consistent with prior knowledge, thereby supporting the effectiveness of the proposed framework.