Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 40
Filtrar
Más filtros

Banco de datos
Tipo del documento
Intervalo de año de publicación
1.
Genome Res ; 30(8): 1191-1200, 2020 08.
Artículo en Inglés | MEDLINE | ID: mdl-32817073

RESUMEN

Despite the rapid advance in single-cell RNA sequencing (scRNA-seq) technologies within the last decade, single-cell transcriptome analysis workflows have primarily used gene expression data while isoform sequence analysis at the single-cell level still remains fairly limited. Detection and discovery of isoforms in single cells is difficult because of the inherent technical shortcomings of scRNA-seq data, and existing transcriptome assembly methods are mainly designed for bulk RNA samples. To address this challenge, we developed RNA-Bloom, an assembly algorithm that leverages the rich information content aggregated from multiple single-cell transcriptomes to reconstruct cell-specific isoforms. Assembly with RNA-Bloom can be either reference-guided or reference-free, thus enabling unbiased discovery of novel isoforms or foreign transcripts. We compared both assembly strategies of RNA-Bloom against five state-of-the-art reference-free and reference-based transcriptome assembly methods. In our benchmarks on a simulated 384-cell data set, reference-free RNA-Bloom reconstructed 37.9%-38.3% more isoforms than the best reference-free assembler, whereas reference-guided RNA-Bloom reconstructed 4.1%-11.6% more isoforms than reference-based assemblers. When applied to a real 3840-cell data set consisting of more than 4 billion reads, RNA-Bloom reconstructed 9.7%-25.0% more isoforms than the best competing reference-based and reference-free approaches evaluated. We expect RNA-Bloom to boost the utility of scRNA-seq data beyond gene expression analysis, expanding what is informatically accessible now.


Asunto(s)
Perfilación de la Expresión Génica/métodos , RNA-Seq/métodos , Análisis de la Célula Individual/métodos , Transcriptoma/genética , Algoritmos , Animales , Secuencia de Bases , Humanos , Ratones , Isoformas de Proteínas/genética , Programas Informáticos
2.
Proc Natl Acad Sci U S A ; 117(29): 16961-16968, 2020 07 21.
Artículo en Inglés | MEDLINE | ID: mdl-32641514

RESUMEN

Alignment-free classification tools have enabled high-throughput processing of sequencing data in many bioinformatics analysis pipelines primarily due to their computational efficiency. Originally k-mer based, such tools often lack sensitivity when faced with sequencing errors and polymorphisms. In response, some tools have been augmented with spaced seeds, which are capable of tolerating mismatches. However, spaced seeds have seen little practical use in classification because they bring increased computational and memory costs compared to methods that use k-mers. These limitations have also caused the design and length of practical spaced seeds to be constrained, since storing spaced seeds can be costly. To address these challenges, we have designed a probabilistic data structure called a multiindex Bloom Filter (miBF), which can store multiple spaced seed sequences with a low memory cost that remains static regardless of seed length or seed design. We formalize how to minimize the false-positive rate of miBFs when classifying sequences from multiple targets or references. Available within BioBloom Tools, we illustrate the utility of miBF in two use cases: read-binning for targeted assembly, and taxonomic read assignment. In our benchmarks, an analysis pipeline based on miBF shows higher sensitivity and specificity for read-binning than sequence alignment-based methods, also executing in less time. Similarly, for taxonomic classification, miBF enables higher sensitivity than a conventional spaced seed-based approach, while using half the memory and an order of magnitude less computational time.


Asunto(s)
Análisis de Secuencia de ADN/métodos , Programas Informáticos , Animales , Disparidad de Par Base , Humanos , Filogenia , Alineación de Secuencia , Análisis de Secuencia de ADN/normas
3.
Bioinformatics ; 36(7): 2256-2257, 2020 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-31790154

RESUMEN

SUMMARY: Presence or absence of gene fusions is one of the most important diagnostic markers in many cancer types. Consequently, fusion detection methods using various genomics data types, such as RNA sequencing (RNA-seq) are valuable tools for research and clinical applications. While information-rich RNA-seq data have proven to be instrumental in discovery of a number of hallmark fusion events, bioinformatics tools to detect fusions still have room for improvement. Here, we present Fusion-Bloom, a fusion detection method that leverages recent developments in de novo transcriptome assembly and assembly-based structural variant calling technologies (RNA-Bloom and PAVFinder, respectively). We benchmarked Fusion-Bloom against the performance of five other state-of-the-art fusion detection tools using multiple datasets. Overall, we observed Fusion-Bloom to display a good balance between detection sensitivity and specificity. We expect the tool to find applications in translational research and clinical genomics pipelines. AVAILABILITY AND IMPLEMENTATION: Fusion-Bloom is implemented as a UNIX Make utility, available at https://github.com/bcgsc/pavfinder and released under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Asunto(s)
Programas Informáticos , Transcriptoma , Genómica , ARN , Análisis de Secuencia de ARN
4.
BMC Genomics ; 19(1): 536, 2018 Jul 13.
Artículo en Inglés | MEDLINE | ID: mdl-30005633

RESUMEN

BACKGROUND: Alternative polyadenylation (APA) results in messenger RNA molecules with different 3' untranslated regions (3' UTRs), affecting the molecules' stability, localization, and translation. APA is pervasive and implicated in cancer. Earlier reports on APA focused on 3' UTR length modifications and commonly characterized APA events as 3' UTR shortening or lengthening. However, such characterization oversimplifies the processing of 3' ends of transcripts and fails to adequately describe the various scenarios we observe. RESULTS: We built a cloud-based targeted de novo transcript assembly and analysis pipeline that incorporates our previously developed cleavage site prediction tool, KLEAT. We applied this pipeline to elucidate the APA profiles of 114 genes in 9939 tumor and 729 tissue normal samples from The Cancer Genome Atlas (TCGA). The full set of 10,668 RNA-Seq samples from 33 cancer types has not been utilized by previous APA studies. By comparing the frequencies of predicted cleavage sites between normal and tumor sample groups, we identified 77 events (i.e. gene-cancer type pairs) of tumor-specific APA regulation in 13 cancer types; for 15 genes, such regulation is recurrent across multiple cancers. Our results also support a previous report showing the 3' UTR shortening of FGF2 in multiple cancers. However, over half of the events we identified display complex changes to 3' UTR length that resist simple classification like shortening or lengthening. CONCLUSIONS: Recurrent tumor-specific regulation of APA is widespread in cancer. However, the regulation pattern that we observed in TCGA RNA-seq data cannot be described as straightforward 3' UTR shortening or lengthening. Continued investigation into this complex, nuanced regulatory landscape will provide further insight into its role in tumor formation and development.


Asunto(s)
Neoplasias/genética , ARN Mensajero/genética , Regiones no Traducidas 3' , Nube Computacional , Bases de Datos Genéticas , Factor 2 de Crecimiento de Fibroblastos/genética , Regulación Neoplásica de la Expresión Génica , Humanos , Recurrencia Local de Neoplasia/genética , Neoplasias/patología , Poliadenilación , División del ARN , ARN Mensajero/metabolismo , Programas Informáticos
5.
Nature ; 488(7409): 49-56, 2012 Aug 02.
Artículo en Inglés | MEDLINE | ID: mdl-22832581

RESUMEN

Medulloblastoma, the most common malignant paediatric brain tumour, is currently treated with nonspecific cytotoxic therapies including surgery, whole-brain radiation, and aggressive chemotherapy. As medulloblastoma exhibits marked intertumoural heterogeneity, with at least four distinct molecular variants, previous attempts to identify targets for therapy have been underpowered because of small samples sizes. Here we report somatic copy number aberrations (SCNAs) in 1,087 unique medulloblastomas. SCNAs are common in medulloblastoma, and are predominantly subgroup-enriched. The most common region of focal copy number gain is a tandem duplication of SNCAIP, a gene associated with Parkinson's disease, which is exquisitely restricted to Group 4α. Recurrent translocations of PVT1, including PVT1-MYC and PVT1-NDRG1, that arise through chromothripsis are restricted to Group 3. Numerous targetable SCNAs, including recurrent events targeting TGF-ß signalling in Group 3, and NF-κB signalling in Group 4, suggest future avenues for rational, targeted therapy.


Asunto(s)
Neoplasias Cerebelosas/clasificación , Neoplasias Cerebelosas/genética , Genoma Humano/genética , Variación Estructural del Genoma/genética , Meduloblastoma/clasificación , Meduloblastoma/genética , Proteínas Portadoras/genética , Neoplasias Cerebelosas/metabolismo , Niño , Variaciones en el Número de Copia de ADN/genética , Duplicación de Gen/genética , Genes myc/genética , Genómica , Proteínas Hedgehog/metabolismo , Humanos , Meduloblastoma/metabolismo , FN-kappa B/metabolismo , Proteínas del Tejido Nervioso/genética , Proteínas de Fusión Oncogénica/genética , Proteínas/genética , ARN Largo no Codificante , Transducción de Señal , Factor de Crecimiento Transformador beta/metabolismo , Translocación Genética/genética
6.
Nature ; 476(7360): 298-303, 2011 Jul 27.
Artículo en Inglés | MEDLINE | ID: mdl-21796119

RESUMEN

Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most common non-Hodgkin lymphomas (NHLs). Here we sequenced tumour and matched normal DNA from 13 DLBCL cases and one FL case to identify genes with mutations in B-cell NHL. We analysed RNA-seq data from these and another 113 NHLs to identify genes with candidate mutations, and then re-sequenced tumour and matched normal DNA from these cases to confirm 109 genes with multiple somatic mutations. Genes with roles in histone modification were frequent targets of somatic mutation. For example, 32% of DLBCL and 89% of FL cases had somatic mutations in MLL2, which encodes a histone methyltransferase, and 11.4% and 13.4% of DLBCL and FL cases, respectively, had mutations in MEF2B, a calcium-regulated gene that cooperates with CREBBP and EP300 in acetylating histones. Our analysis suggests a previously unappreciated disruption of chromatin biology in lymphomagenesis.


Asunto(s)
Histonas/metabolismo , Linfoma no Hodgkin/genética , Mutación/genética , Cromatina/genética , Cromatina/metabolismo , Proteínas de Unión al ADN/genética , Proteínas de Unión al ADN/metabolismo , Genoma Humano/genética , Histona Acetiltransferasas/genética , Histona Acetiltransferasas/metabolismo , Histona Metiltransferasas , N-Metiltransferasa de Histona-Lisina/genética , N-Metiltransferasa de Histona-Lisina/metabolismo , Humanos , Pérdida de Heterocigocidad/genética , Linfoma Folicular/enzimología , Linfoma Folicular/genética , Linfoma de Células B Grandes Difuso/enzimología , Linfoma de Células B Grandes Difuso/genética , Linfoma no Hodgkin/enzimología , Proteínas de Dominio MADS/genética , Proteínas de Dominio MADS/metabolismo , Factores de Transcripción MEF2 , Factores Reguladores Miogénicos/genética , Factores Reguladores Miogénicos/metabolismo , Proteínas de Neoplasias/genética , Proteínas de Neoplasias/metabolismo
7.
Plant J ; 83(2): 189-212, 2015 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-26017574

RESUMEN

White spruce (Picea glauca), a gymnosperm tree, has been established as one of the models for conifer genomics. We describe the draft genome assemblies of two white spruce genotypes, PG29 and WS77111, innovative tools for the assembly of very large genomes, and the conifer genomics resources developed in this process. The two white spruce genotypes originate from distant geographic regions of western (PG29) and eastern (WS77111) North America, and represent elite trees in two Canadian tree-breeding programs. We present an update (V3 and V4) for a previously reported PG29 V2 draft genome assembly and introduce a second white spruce genome assembly for genotype WS77111. Assemblies of the PG29 and WS77111 genomes confirm the reconstructed white spruce genome size in the 20 Gbp range, and show broad synteny. Using the PG29 V3 assembly and additional white spruce genomics and transcriptomics resources, we performed MAKER-P annotation and meticulous expert annotation of very large gene families of conifer defense metabolism, the terpene synthases and cytochrome P450s. We also comprehensively annotated the white spruce mevalonate, methylerythritol phosphate and phenylpropanoid pathways. These analyses highlighted the large extent of gene and pseudogene duplications in a conifer genome, in particular for genes of secondary (i.e. specialized) metabolism, and the potential for gain and loss of function for defense and adaptation.


Asunto(s)
Genoma de Planta , Familia de Multigenes , Fenoles/metabolismo , Picea/genética , Terpenos/metabolismo , Transferasas Alquil y Aril/metabolismo , Biología Computacional , Sistema Enzimático del Citocromo P-450/metabolismo , Transcriptoma
8.
N Engl J Med ; 368(22): 2059-74, 2013 05 30.
Artículo en Inglés | MEDLINE | ID: mdl-23634996

RESUMEN

BACKGROUND: Many mutations that contribute to the pathogenesis of acute myeloid leukemia (AML) are undefined. The relationships between patterns of mutations and epigenetic phenotypes are not yet clear. METHODS: We analyzed the genomes of 200 clinically annotated adult cases of de novo AML, using either whole-genome sequencing (50 cases) or whole-exome sequencing (150 cases), along with RNA and microRNA sequencing and DNA-methylation analysis. RESULTS: AML genomes have fewer mutations than most other adult cancers, with an average of only 13 mutations found in genes. Of these, an average of 5 are in genes that are recurrently mutated in AML. A total of 23 genes were significantly mutated, and another 237 were mutated in two or more samples. Nearly all samples had at least 1 nonsynonymous mutation in one of nine categories of genes that are almost certainly relevant for pathogenesis, including transcription-factor fusions (18% of cases), the gene encoding nucleophosmin (NPM1) (27%), tumor-suppressor genes (16%), DNA-methylation-related genes (44%), signaling genes (59%), chromatin-modifying genes (30%), myeloid transcription-factor genes (22%), cohesin-complex genes (13%), and spliceosome-complex genes (14%). Patterns of cooperation and mutual exclusivity suggested strong biologic relationships among several of the genes and categories. CONCLUSIONS: We identified at least one potential driver mutation in nearly all AML samples and found that a complex interplay of genetic events contributes to AML pathogenesis in individual patients. The databases from this study are widely available to serve as a foundation for further investigations of AML pathogenesis, classification, and risk stratification. (Funded by the National Institutes of Health.).


Asunto(s)
Leucemia Mieloide Aguda/genética , Mutación , Adulto , Islas de CpG , Metilación de ADN , Epigenómica , Femenino , Expresión Génica , Fusión Génica , Genoma Humano , Humanos , Leucemia Mieloide Aguda/clasificación , Masculino , MicroARNs/genética , Persona de Mediana Edad , Nucleofosmina , Análisis de Secuencia de ADN/métodos
9.
Blood ; 122(7): 1256-65, 2013 Aug 15.
Artículo en Inglés | MEDLINE | ID: mdl-23699601

RESUMEN

Diffuse large B-cell lymphoma (DLBCL) is a genetically heterogeneous cancer composed of at least 2 molecular subtypes that differ in gene expression and distribution of mutations. Recently, application of genome/exome sequencing and RNA-seq to DLBCL has revealed numerous genes that are recurrent targets of somatic point mutation in this disease. Here we provide a whole-genome-sequencing-based perspective of DLBCL mutational complexity by characterizing 40 de novo DLBCL cases and 13 DLBCL cell lines and combining these data with DNA copy number analysis and RNA-seq from an extended cohort of 96 cases. Our analysis identified widespread genomic rearrangements including evidence for chromothripsis as well as the presence of known and novel fusion transcripts. We uncovered new gene targets of recurrent somatic point mutations and genes that are targeted by focal somatic deletions in this disease. We highlight the recurrence of germinal center B-cell-restricted mutations affecting genes that encode the S1P receptor and 2 small GTPases (GNA13 and GNAI2) that together converge on regulation of B-cell homing. We further analyzed our data to approximate the relative temporal order in which some recurrent mutations were acquired and demonstrate that ongoing acquisition of mutations and intratumoral clonal heterogeneity are common features of DLBCL. This study further improves our understanding of the processes and pathways involved in lymphomagenesis, and some of the pathways mutated here may indicate new avenues for therapeutic intervention.


Asunto(s)
Biomarcadores de Tumor/química , Biomarcadores de Tumor/genética , Variaciones en el Número de Copia de ADN/genética , Genoma Humano , Linfoma de Células B Grandes Difuso/genética , Mutación/genética , Subunidades alfa de la Proteína de Unión al GTP G12-G13/química , Subunidades alfa de la Proteína de Unión al GTP G12-G13/genética , Perfilación de la Expresión Génica , Secuenciación de Nucleótidos de Alto Rendimiento , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , ARN Mensajero/genética , Reacción en Cadena en Tiempo Real de la Polimerasa , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Células Tumorales Cultivadas
10.
Proc Natl Acad Sci U S A ; 108(22): 9166-71, 2011 May 31.
Artículo en Inglés | MEDLINE | ID: mdl-21536894

RESUMEN

Rust fungi are some of the most devastating pathogens of crop plants. They are obligate biotrophs, which extract nutrients only from living plant tissues and cannot grow apart from their hosts. Their lifestyle has slowed the dissection of molecular mechanisms underlying host invasion and avoidance or suppression of plant innate immunity. We sequenced the 101-Mb genome of Melampsora larici-populina, the causal agent of poplar leaf rust, and the 89-Mb genome of Puccinia graminis f. sp. tritici, the causal agent of wheat and barley stem rust. We then compared the 16,399 predicted proteins of M. larici-populina with the 17,773 predicted proteins of P. graminis f. sp tritici. Genomic features related to their obligate biotrophic lifestyle include expanded lineage-specific gene families, a large repertoire of effector-like small secreted proteins, impaired nitrogen and sulfur assimilation pathways, and expanded families of amino acid and oligopeptide membrane transporters. The dramatic up-regulation of transcripts coding for small secreted proteins, secreted hydrolytic enzymes, and transporters in planta suggests that they play a role in host infection and nutrient acquisition. Some of these genomic hallmarks are mirrored in the genomes of other microbial eukaryotes that have independently evolved to infect plants, indicating convergent adaptation to a biotrophic existence inside plant cells.


Asunto(s)
Basidiomycota/genética , Hongos/genética , Triticum/microbiología , Perfilación de la Expresión Génica , Genes Fúngicos , Genoma , Genoma Fúngico , Modelos Genéticos , Nitratos/química , Análisis de Secuencia por Matrices de Oligonucleótidos , Filogenia , Enfermedades de las Plantas/microbiología , Hojas de la Planta/microbiología , Análisis de Secuencia de ADN , Sulfatos/química
11.
medRxiv ; 2024 Jun 20.
Artículo en Inglés | MEDLINE | ID: mdl-38947075

RESUMEN

With the increasing availability of long-read sequencing data, high-quality human genome assemblies, and software for fully characterizing tandem repeats, genome-wide genotyping of tandem repeat loci on a population scale becomes more feasible. Such efforts not only expand our knowledge of the tandem repeat landscape in the human genome but also enhance our ability to differentiate pathogenic tandem repeat mutations from benign polymorphisms. To this end, we analyzed 272 genomes assembled using datasets from three public initiatives that employed different long-read sequencing technologies. Here, we report a catalog of over 18 million tandem repeat loci, many of which were previously unannotated. Some of these loci are highly polymorphic, and many of them reside within coding sequences.

12.
BMC Genomics ; 14: 550, 2013 Aug 14.
Artículo en Inglés | MEDLINE | ID: mdl-23941359

RESUMEN

BACKGROUND: Chimeric transcripts, including partial and internal tandem duplications (PTDs, ITDs) and gene fusions, are important in the detection, prognosis, and treatment of human cancers. RESULTS: We describe Barnacle, a production-grade analysis tool that detects such chimeras in de novo assemblies of RNA-seq data, and supports prioritizing them for review and validation by reporting the relative coverage of co-occurring chimeric and wild-type transcripts. We demonstrate applications in large-scale disease studies, by identifying PTDs in MLL, ITDs in FLT3, and reciprocal fusions between PML and RARA, in two deeply sequenced acute myeloid leukemia (AML) RNA-seq datasets. CONCLUSIONS: Our analyses of real and simulated data sets show that, with appropriate filter settings, Barnacle makes highly specific predictions for three types of chimeric transcripts that are important in a range of cancers: PTDs, ITDs, and fusions. High specificity makes manual review and validation efficient, which is necessary in large-scale disease studies. Characterizing an extended range of chimera types will help generate insights into progression, treatment, and outcomes for complex diseases.


Asunto(s)
Duplicación de Gen/genética , Perfilación de la Expresión Génica/métodos , Fusión Génica/genética , Genómica , Neoplasias de la Mama/genética , Exones/genética , Humanos , Leucemia Mieloide Aguda/genética , Anotación de Secuencia Molecular , ARN Mensajero/genética , Estadística como Asunto
13.
Nat Methods ; 7(11): 909-12, 2010 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-20935650

RESUMEN

We describe Trans-ABySS, a de novo short-read transcriptome assembly and analysis pipeline that addresses variation in local read densities by assembling read substrings with varying stringencies and then merging the resulting contigs before analysis. Analyzing 7.4 gigabases of 50-base-pair paired-end Illumina reads from an adult mouse liver poly(A) RNA library, we identified known, new and alternative structures in expressed transcripts, and achieved high sensitivity and specificity relative to reference-based assembly methods.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica , Análisis de Secuencia de ADN/métodos , Animales , Ratones
14.
J Pathol ; 226(1): 7-16, 2012 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-22072542

RESUMEN

Oligodendroglioma is characterized by unique clinical, pathological, and genetic features. Recurrent losses of chromosomes 1p and 19q are strongly associated with this brain cancer but knowledge of the identity and function of the genes affected by these alterations is limited. We performed exome sequencing on a discovery set of 16 oligodendrogliomas with 1p/19q co-deletion to identify new molecular features at base-pair resolution. As anticipated, there was a high rate of IDH mutations: all cases had mutations in either IDH1 (14/16) or IDH2 (2/16). In addition, we discovered somatic mutations and insertions/deletions in the CIC gene on chromosome 19q13.2 in 13/16 tumours. These discovery set mutations were validated by deep sequencing of 13 additional tumours, which revealed seven others with CIC mutations, thus bringing the overall mutation rate in oligodendrogliomas in this study to 20/29 (69%). In contrast, deep sequencing of astrocytomas and oligoastrocytomas without 1p/19q loss revealed that CIC alterations were otherwise rare (1/60; 2%). Of the 21 non-synonymous somatic mutations in 20 CIC-mutant oligodendrogliomas, nine were in exon 5 within an annotated DNA-interacting domain and three were in exon 20 within an annotated protein-interacting domain. The remaining nine were found in other exons and frequently included truncations. CIC mutations were highly associated with oligodendroglioma histology, 1p/19q co-deletion, and IDH1/2 mutation (p < 0.001). Although we observed no differences in the clinical outcomes of CIC mutant versus wild-type tumours, in a background of 1p/19q co-deletion, hemizygous CIC mutations are likely important. We hypothesize that the mutant CIC on the single retained 19q allele is linked to the pathogenesis of oligodendrogliomas with IDH mutation. Our detailed study of genetic aberrations in oligodendroglioma suggests a functional interaction between CIC mutation, IDH1/2 mutation, and 1p/19q co-deletion.


Asunto(s)
Biomarcadores de Tumor/genética , Neoplasias Encefálicas/genética , Isocitrato Deshidrogenasa/genética , Oligodendroglioma/genética , Proteínas Represoras/genética , Biomarcadores de Tumor/análisis , Neoplasias Encefálicas/mortalidad , Neoplasias Encefálicas/patología , Cromosomas Humanos Par 1/genética , Cromosomas Humanos Par 19/genética , Supervivencia sin Enfermedad , Humanos , Estimación de Kaplan-Meier , Mutación , Clasificación del Tumor , Oligodendroglioma/mortalidad , Oligodendroglioma/patología
15.
Nat Commun ; 14(1): 2940, 2023 05 22.
Artículo en Inglés | MEDLINE | ID: mdl-37217540

RESUMEN

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce "RNA-Bloom2 [ https://github.com/bcgsc/RNA-Bloom ]", a reference-free assembly method for long-read transcriptome sequencing data. Using simulated datasets and spike-in control data, we show that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods. Furthermore, we find that RNA-Bloom2 requires 27.0 to 80.6% of the peak memory and 3.6 to 10.8% of the total wall-clock runtime of a competing reference-free method. Finally, we showcase RNA-Bloom2 in assembling a transcriptome sample of Picea sitchensis (Sitka spruce). Since our method does not rely on a reference, it further sets the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.


Asunto(s)
ARN , Transcriptoma , Transcriptoma/genética , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos
16.
Sci Rep ; 12(1): 9352, 2022 06 07.
Artículo en Inglés | MEDLINE | ID: mdl-35672336

RESUMEN

Detection of short tandem repeat (STR) expansions with standard short-read sequencing is challenging due to the difficulty in mapping multicopy repeat sequences. In this study, we explored how the long-range sequence information of barcode linked-read sequencing (BLRS) can be leveraged to improve repeat-read detection. We also devised a novel algorithm using BLRS barcodes for distance estimation and evaluated its application for STR genotyping. Both approaches were designed for genotyping large expansions (> 1 kb) that cannot be sized accurately by existing methods. Using simulated and experimental data of genomes with STR expansions from multiple BLRS platforms, we validated the utility of barcode and phasing information in attaining better STR genotypes compared to standard short-read sequencing. Although the coverage bias of extremely GC-rich STRs is an important limitation of BLRS, BLRS is an effective strategy for genotyping many other STR loci.


Asunto(s)
Secuenciación de Nucleótidos de Alto Rendimiento , Repeticiones de Microsatélite , Algoritmos , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Repeticiones de Microsatélite/genética , Análisis de Secuencia de ADN/métodos
17.
Genome Biol ; 22(1): 224, 2021 08 13.
Artículo en Inglés | MEDLINE | ID: mdl-34389037

RESUMEN

Tandem repeat (TR) expansion is the underlying cause of over 40 neurological disorders. Long-read sequencing offers an exciting avenue over conventional technologies for detecting TR expansions. Here, we present Straglr, a robust software tool for both targeted genotyping and novel expansion detection from long-read alignments. We benchmark Straglr using various simulations, targeted genotyping data of cell lines carrying expansions of known diseases, and whole genome sequencing data with chromosome-scale assembly. Our results suggest that Straglr may be useful for investigating disease-associated TR expansions using long-read sequencing.


Asunto(s)
Genotipo , Técnicas de Genotipaje/métodos , Programas Informáticos , Expansión de las Repeticiones de ADN , Enfermedad/genética , Humanos , Secuenciación Completa del Genoma/métodos
18.
Genome Med ; 13(1): 126, 2021 08 09.
Artículo en Inglés | MEDLINE | ID: mdl-34372915

RESUMEN

BACKGROUND: Screening for short tandem repeat (STR) expansions in next-generation sequencing data can enable diagnosis, optimal clinical management/treatment, and accurate genetic counseling of patients with repeat expansion disorders. We aimed to develop an efficient computational workflow for reliable detection of STR expansions in next-generation sequencing data and demonstrate its clinical utility. METHODS: We characterized the performance of eight STR analysis methods (lobSTR, HipSTR, RepeatSeq, ExpansionHunter, TREDPARSE, GangSTR, STRetch, and exSTRa) on next-generation sequencing datasets of samples with known disease-causing full-mutation STR expansions and genomes simulated to harbor repeat expansions at selected loci and optimized their sensitivity. We then used a machine learning decision tree classifier to identify an optimal combination of methods for full-mutation detection. In Burrows-Wheeler Aligner (BWA)-aligned genomes, the ensemble approach of using ExpansionHunter, STRetch, and exSTRa performed the best (precision = 82%, recall = 100%, F1-score = 90%). We applied this pipeline to screen 301 families of children with suspected genetic disorders. RESULTS: We identified 10 individuals with full-mutations in the AR, ATXN1, ATXN8, DMPK, FXN, or HTT disease STR locus in the analyzed families. Additional candidates identified in our analysis include two probands with borderline ATXN2 expansions between the established repeat size range for reduced-penetrance and full-penetrance full-mutation and seven individuals with FMR1 CGG repeats in the intermediate/premutation repeat size range. In 67 probands with a prior negative clinical PCR test for the FMR1, FXN, or DMPK disease STR locus, or the spinocerebellar ataxia disease STR panel, our pipeline did not falsely identify aberrant expansion. We performed clinical PCR tests on seven (out of 10) full-mutation samples identified by our pipeline and confirmed the expansion status in all, showing absolute concordance between our bioinformatics and molecular findings. CONCLUSIONS: We have successfully demonstrated the application of a well-optimized bioinformatics pipeline that promotes the utility of genome-wide sequencing as a first-tier screening test to detect expansions of known disease STRs. Interrogating clinical next-generation sequencing data for pathogenic STR expansions using our ensemble pipeline can improve diagnostic yield and enhance clinical outcomes for patients with repeat expansion disorders.


Asunto(s)
Expansión de las Repeticiones de ADN , Estudio de Asociación del Genoma Completo , Secuenciación de Nucleótidos de Alto Rendimiento , Repeticiones de Microsatélite , Secuenciación Completa del Genoma , Algoritmos , Alelos , Toma de Decisiones Clínicas , Biología Computacional/métodos , Bases de Datos Genéticas , Árboles de Decisión , Enfermedades Genéticas Congénitas/diagnóstico , Enfermedades Genéticas Congénitas/genética , Sitios Genéticos , Estudio de Asociación del Genoma Completo/métodos , Humanos , Aprendizaje Automático , Técnicas de Diagnóstico Molecular , Mutación , Reproducibilidad de los Resultados
19.
Nat Commun ; 12(1): 2474, 2021 04 30.
Artículo en Inglés | MEDLINE | ID: mdl-33931648

RESUMEN

As more clinically-relevant genomic features of myeloid malignancies are revealed, it has become clear that targeted clinical genetic testing is inadequate for risk stratification. Here, we develop and validate a clinical transcriptome-based assay for stratification of acute myeloid leukemia (AML). Comparison of ribonucleic acid sequencing (RNA-Seq) to whole genome and exome sequencing reveals that a standalone RNA-Seq assay offers the greatest diagnostic return, enabling identification of expressed gene fusions, single nucleotide and short insertion/deletion variants, and whole-transcriptome expression information. Expression data from 154 AML patients are used to develop a novel AML prognostic score, which is strongly associated with patient outcomes across 620 patients from three independent cohorts, and 42 patients from a prospective cohort. When combined with molecular risk guidelines, the risk score allows for the re-stratification of 22.1 to 25.3% of AML patients from three independent cohorts into correct risk groups. Within the adverse-risk subgroup, we identify a subset of patients characterized by dysregulated integrin signaling and RUNX1 or TP53 mutation. We show that these patients may benefit from therapy with inhibitors of focal adhesion kinase, encoded by PTK2, demonstrating additional utility of transcriptome-based testing for therapy selection in myeloid malignancy.


Asunto(s)
Biomarcadores de Tumor/metabolismo , Regulación Neoplásica de la Expresión Génica/genética , Leucemia Mieloide Aguda/diagnóstico , Leucemia Mieloide Aguda/metabolismo , Biomarcadores de Tumor/genética , Línea Celular Tumoral , Estudios de Cohortes , Subunidad alfa 2 del Factor de Unión al Sitio Principal/genética , Subunidad alfa 2 del Factor de Unión al Sitio Principal/metabolismo , Femenino , Fusión Génica , Humanos , Mutación INDEL , Integrinas/genética , Integrinas/metabolismo , Leucemia Mieloide Aguda/genética , Masculino , Polimorfismo de Nucleótido Simple , Pronóstico , Estudios Prospectivos , RNA-Seq , Factores de Riesgo , Transducción de Señal/genética , Análisis de Supervivencia , Transcriptoma , Proteína p53 Supresora de Tumor/genética , Proteína p53 Supresora de Tumor/metabolismo , Secuenciación del Exoma , Secuenciación Completa del Genoma
20.
Artículo en Inglés | MEDLINE | ID: mdl-30833417

RESUMEN

We report a case of early-onset pancreatic ductal adenocarcinoma in a patient harboring biallelic MUTYH germline mutations, whose tumor featured somatic mutational signatures consistent with defective MUTYH-mediated base excision repair and the associated driver KRAS transversion mutation p.Gly12Cys. Analysis of an additional 730 advanced cancer cases (N = 731) was undertaken to determine whether the mutational signatures were also present in tumors from germline MUTYH heterozygote carriers or if instead the signatures were only seen in those with biallelic loss of function. We identified two patients with breast cancer each carrying a pathogenic germline MUTYH variant with a somatic MUTYH copy loss leading to the germline variant being homozygous in the tumor and demonstrating the same somatic signatures. Our results suggest that monoallelic inactivation of MUTYH is not sufficient for C:G>A:T transversion signatures previously linked to MUTYH deficiency to arise (N = 9), but that biallelic complete loss of MUTYH function can cause such signatures to arise even in tumors not classically seen in MUTYH-associated polyposis (N = 3). Although defective MUTYH is not the only determinant of these signatures, MUTYH germline variants may be present in a subset of patients with tumors demonstrating elevated somatic signatures possibly suggestive of MUTYH deficiency (e.g., COSMIC Signature 18, SigProfiler SBS18/SBS36, SignatureAnalyzer SBS18/SBS36).


Asunto(s)
Neoplasias de la Mama/genética , Carcinoma Ductal Pancreático/genética , ADN Glicosilasas/genética , Mutación , Neoplasias Pancreáticas/genética , Edad de Inicio , ADN Glicosilasas/deficiencia , Femenino , Mutación de Línea Germinal , Humanos , Pérdida de Heterocigocidad , Persona de Mediana Edad , Proteínas Proto-Oncogénicas p21(ras)/genética
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA