Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
bioRxiv ; 2024 May 02.
Artigo em Inglês | MEDLINE | ID: mdl-38746190

RESUMO

Enabled by the explosion of data and substantial increase in computational power, deep learning has transformed fields such as computer vision and natural language processing (NLP) and it has become a successful method to be applied to many transcriptomic analysis tasks. A core advantage of deep learning is its inherent capability to incorporate feature computation within the machine learning models. This results in a comprehensive and machine-readable representation of sequences, facilitating the downstream classification and clustering tasks. Compared to machine translation problems in NLP, feature embedding is particularly challenging for transcriptomic studies as the sequences are string of thousands of nucleotides in length, which make the long-term dependencies between features from different parts of the sequence even more difficult to capture. This highlights the need for nucleotide sequence embedding methods that are capable of learning input sequence features implicitly. Here we introduce ntEmbd, a deep learning embedding tool that captures dependencies between different features of the sequences and learns a latent representation for given nucleotide sequences. We further provide two sample use cases, describing how learned RNA features can be used in downstream analysis. The first use case demonstrates ntEmbd ' s utility in classifying coding and noncoding RNA benchmarked against existing tools, and the second one explores the utility of learned representations in identifying adapter sequences in nanopore RNA-seq reads. The tool as well as the trained models are freely available on GitHub at https://github.com/bcgsc/ntEmbd.

2.
bioRxiv ; 2023 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-37961641

RESUMO

Human papillomavirus (HPV) integration has been implicated in transforming HPV infection into cancer, but its genomic consequences have been difficult to study using short-read technologies. To resolve the dysregulation associated with HPV integration, we performed long-read sequencing on 63 cervical cancer genomes. We identified six categories of integration events based on HPV-human genomic structures. Of all HPV integrants, defined as two HPV-human breakpoints bridged by an HPV sequence, 24% contained variable copies of HPV between the breakpoints, a phenomenon we termed heterologous integration. Analysis of DNA methylation within and in proximity to the HPV genome at individual integration events revealed relationships between methylation status of the integrant and its orientation and structure. Dysregulation of the human epigenome and neighboring gene expression in cis with the HPV-integrated allele was observed over megabase-ranges of the genome. By elucidating the structural, epigenetic, and allele-specific impacts of HPV integration, we provide insight into the role of integrated HPV in cervical cancer.

3.
Nat Commun ; 14(1): 2906, 2023 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217507

RESUMO

Current state-of-the-art de novo long read genome assemblers follow the Overlap-Layout-Consensus paradigm. While read-to-read overlap - its most costly step - was improved in modern long read genome assemblers, these tools still often require excessive RAM when assembling a typical human dataset. Our work departs from this paradigm, foregoing all-vs-all sequence alignments in favor of a dynamic data structure implemented in GoldRush, a de novo long read genome assembly algorithm with linear time complexity. We tested GoldRush on Oxford Nanopore Technologies long sequencing read datasets with different base error profiles sourced from three human cell lines, rice, and tomato. Here, we show that GoldRush achieves assembly scaffold NGA50 lengths of 18.3-22.2, 0.3 and 2.6 Mbp, for the genomes of human, rice, and tomato, respectively, and assembles each genome within a day, using at most 54.5 GB of random-access memory, demonstrating the scalability of our genome assembly paradigm and its implementation.


Assuntos
Algoritmos , Genoma , Humanos , Análise de Sequência de DNA , Sequenciamento de Nucleotídeos em Larga Escala
4.
Nat Commun ; 14(1): 2940, 2023 05 22.
Artigo em Inglês | MEDLINE | ID: mdl-37217540

RESUMO

Long-read sequencing technologies have improved significantly since their emergence. Their read lengths, potentially spanning entire transcripts, is advantageous for reconstructing transcriptomes. Existing long-read transcriptome assembly methods are primarily reference-based and to date, there is little focus on reference-free transcriptome assembly. We introduce "RNA-Bloom2 [ https://github.com/bcgsc/RNA-Bloom ]", a reference-free assembly method for long-read transcriptome sequencing data. Using simulated datasets and spike-in control data, we show that the transcriptome assembly quality of RNA-Bloom2 is competitive to those of reference-based methods. Furthermore, we find that RNA-Bloom2 requires 27.0 to 80.6% of the peak memory and 3.6 to 10.8% of the total wall-clock runtime of a competing reference-free method. Finally, we showcase RNA-Bloom2 in assembling a transcriptome sample of Picea sitchensis (Sitka spruce). Since our method does not rely on a reference, it further sets the groundwork for large-scale comparative transcriptomics where high-quality draft genome assemblies are not readily available.


Assuntos
RNA , Transcriptoma , Transcriptoma/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos
5.
Invest Ophthalmol Vis Sci ; 64(4): 4, 2023 04 03.
Artigo em Inglês | MEDLINE | ID: mdl-37022704

RESUMO

Purpose: This study aimed to assess the prevalence and characteristics of the peripapillary gamma zone in myopic, emmetropic, and hyperopic eyes of Chinese children. Methods: Overall, 1274 children aged 6 to 8 years from the Hong Kong Children Eye Study underwent ocular examinations, including measurements of cycloplegic auto-refraction and axial length (AL). The optic disc was imaged using a Spectralis optical coherence tomography (OCT) unit and a protocol involving 24 equally spaced radial B-scans. The Bruch's membrane opening (BMO) was identified in over 48 meridians in each eye. The peripapillary gamma zone was defined as the region between the BMO and the border of the optic disc, identified by the OCT. Results: The prevalence of the peripapillary gamma zone was higher in myopic eyes (36.3%) than in emmetropic (16.1%) and hyperopic eyes (11.5%, P < 0.001). AL (per 1 mm; odds ratio [OR]) = 1.861, P < 0.001) and a more oval disc shape (OR = 3.144, P < 0.001) were associated with the presence of a peripapillary gamma zone after adjusting for demographic, systemic, and ocular variables. In the subgroup analysis, a longer AL was associated with the presence of a peripapillary gamma zone in myopic eyes (OR = 1.874, P < 0.001), but not in emmetropic (OR = 1.033, P = 0.913) or hyperopic eyes (OR = 1.044, P = 0.883). A peripapillary zone was not observed in the region nasal to the optic nerve in myopic eyes, in contrast to its presence in the same region in 1.9% of emmetropic eyes and 9.3% of hyperopic eyes; these intergroup differences were statistically significant (P < 0.001). Conclusions: Although peripapillary gamma zones were observed in the eyes of both myopic and non-myopic children, their characteristics and distribution patterns were substantially different.


Assuntos
Hiperopia , Miopia , Disco Óptico , Humanos , Criança , Hong Kong/epidemiologia , Prevalência , Miopia/epidemiologia , Refração Ocular , Hiperopia/epidemiologia , Tomografia de Coerência Óptica/métodos
6.
Gigascience ; 122023 03 20.
Artigo em Inglês | MEDLINE | ID: mdl-36939007

RESUMO

BACKGROUND: Nanopore sequencing is crucial to metagenomic studies as its kilobase-long reads can contribute to resolving genomic structural differences among microbes. However, sequencing platform-specific challenges, including high base-call error rate, nonuniform read lengths, and the presence of chimeric artifacts, necessitate specifically designed analytical algorithms. The use of simulated datasets with characteristics that are true to the sequencing platform under evaluation is a cost-effective way to assess the performance of bioinformatics tools with the ground truth in a controlled environment. RESULTS: Here, we present Meta-NanoSim, a fast and versatile utility that characterizes and simulates the unique properties of nanopore metagenomic reads. It improves upon state-of-the-art methods on microbial abundance estimation through a base-level quantification algorithm. Meta-NanoSim can simulate complex microbial communities composed of both linear and circular genomes and can stream reference genomes from online servers directly. Simulated datasets showed high congruence with experimental data in terms of read length, error profiles, and abundance levels. We demonstrate that Meta-NanoSim simulated data can facilitate the development of metagenomic algorithms and guide experimental design through a metagenome assembly benchmarking task. CONCLUSIONS: The Meta-NanoSim characterization module investigates read features, including chimeric information and abundance levels, while the simulation module simulates large and complex multisample microbial communities with different abundance profiles. All trained models and the software are freely accessible at GitHub: https://github.com/bcgsc/NanoSim.


Assuntos
Sequenciamento por Nanoporos , Nanoporos , Metagenoma , Sequenciamento por Nanoporos/métodos , Análise de Sequência de DNA/métodos , Simulação por Computador , Metagenômica/métodos , Software , Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
7.
Antibiotics (Basel) ; 11(7)2022 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-35884206

RESUMO

Antibiotic resistance is a global health crisis increasing in prevalence every day. To combat this crisis, alternative antimicrobial therapeutics are urgently needed. Antimicrobial peptides (AMPs), a family of short defense proteins, are produced naturally by all organisms and hold great potential as effective alternatives to small molecule antibiotics. Here, we present rAMPage, a scalable bioinformatics discovery platform for identifying AMP sequences from RNA sequencing (RNA-seq) datasets. In our study, we demonstrate the utility and scalability of rAMPage, running it on 84 publicly available RNA-seq datasets from 75 amphibian and insect species-species known to have rich AMP repertoires. Across these datasets, we identified 1137 putative AMPs, 1024 of which were deemed novel by a homology search in cataloged AMPs in public databases. We selected 21 peptide sequences from this set for antimicrobial susceptibility testing against Escherichia coli and Staphylococcus aureus and observed that seven of them have high antimicrobial activity. Our study illustrates how in silico methods such as rAMPage can enable the fast and efficient discovery of novel antimicrobial peptides as an effective first step in the strenuous process of antimicrobial drug development.

8.
Plant J ; 111(5): 1469-1485, 2022 09.
Artigo em Inglês | MEDLINE | ID: mdl-35789009

RESUMO

Spruces (Picea spp.) are coniferous trees widespread in boreal and mountainous forests of the northern hemisphere, with large economic significance and enormous contributions to global carbon sequestration. Spruces harbor very large genomes with high repetitiveness, hampering their comparative analysis. Here, we present and compare the genomes of four different North American spruces: the genome assemblies for Engelmann spruce (Picea engelmannii) and Sitka spruce (Picea sitchensis) together with improved and more contiguous genome assemblies for white spruce (Picea glauca) and for a naturally occurring introgress of these three species known as interior spruce (P. engelmannii × glauca × sitchensis). The genomes were structurally similar, and a large part of scaffolds could be anchored to a genetic map. The composition of the interior spruce genome indicated asymmetric contributions from the three ancestral genomes. Phylogenetic analysis of the nuclear and organelle genomes revealed a topology indicative of ancient reticulation. Different patterns of expansion of gene families among genomes were observed and related with presumed diversifying ecological adaptations. We identified rapidly evolving genes that harbored high rates of non-synonymous polymorphisms relative to synonymous ones, indicative of positive selection and its hitchhiking effects. These gene sets were mostly distinct between the genomes of ecologically contrasted species, and signatures of convergent balancing selection were detected. Stress and stimulus response was identified as the most frequent function assigned to expanding gene families and rapidly evolving genes. These two aspects of genomic evolution were complementary in their contribution to divergent evolution of presumed adaptive nature. These more contiguous spruce giga-genome sequences should strengthen our understanding of conifer genome structure and evolution, as their comparison offers clues into the genetic basis of adaptation and ecology of conifers at the genomic level. They will also provide tools to better monitor natural genetic diversity and improve the management of conifer forests. The genomes of four closely related North American spruces indicate that their high similarity at the morphological level is paralleled by the high conservation of their physical genome structure. Yet, the evidence of divergent evolution is apparent in their rapidly evolving genomes, supported by differential expansion of key gene families and large sets of genes under positive selection, largely in relation to stimulus and environmental stress response.


Assuntos
Picea , Traqueófitas , Etiquetas de Sequências Expressas , Genoma de Planta/genética , Família Multigênica/genética , Filogenia , Picea/genética , Traqueófitas/genética
9.
BMC Bioinformatics ; 23(1): 246, 2022 Jun 21.
Artigo em Inglês | MEDLINE | ID: mdl-35729491

RESUMO

BACKGROUND: De novo genome assembly is essential to modern genomics studies. As it is not biased by a reference, it is also a useful method for studying genomes with high variation, such as cancer genomes. De novo short-read assemblers commonly use de Bruijn graphs, where nodes are sequences of equal length k, also known as k-mers. Edges in this graph are established between nodes that overlap by [Formula: see text] bases, and nodes along unambiguous walks in the graph are subsequently merged. The selection of k is influenced by multiple factors, and optimizing this value results in a trade-off between graph connectivity and sequence contiguity. Ideally, multiple k sizes should be used, so lower values can provide good connectivity in lesser covered regions and higher values can increase contiguity in well-covered regions. However, current approaches that use multiple k values do not address the scalability issues inherent to the assembly of large genomes. RESULTS: Here we present RResolver, a scalable algorithm that takes a short-read de Bruijn graph assembly with a starting k as input and uses a k value closer to that of the read length to resolve repeats. RResolver builds a Bloom filter of sequencing reads which is used to evaluate the assembly graph path support at branching points and removes paths with insufficient support. RResolver runs efficiently, taking only 26 min on average for an ABySS human assembly with 48 threads and 60 GiB memory. Across all experiments, compared to a baseline assembly, RResolver improves scaffold contiguity (NGA50) by up to 15% and reduces misassemblies by up to 12%. CONCLUSIONS: RResolver adds a missing component to scalable de Bruijn graph genome assembly. By improving the initial and fundamental graph traversal outcome, all downstream ABySS algorithms greatly benefit by working with a more accurate and less complex representation of the genome. The RResolver code is integrated into ABySS and is available at https://github.com/bcgsc/abyss/tree/master/RResolver .


Assuntos
Genômica , Software , Algoritmos , Genoma , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Análise de Sequência de DNA/métodos
10.
Sci Rep ; 12(1): 9419, 2022 06 08.
Artigo em Inglês | MEDLINE | ID: mdl-35676317

RESUMO

Emu (Dromaius novaehollandiae) farming has been gaining wide interest for fat production. Oil rendered from this large flightless bird's fat is valued for its anti-inflammatory and antioxidant properties for uses in therapeutics and cosmetics. We analyzed the seasonal and sex-dependent differentially expressed (DE) genes involved in fat metabolism in emus. Samples were taken from back and abdominal fat tissues of a single set of four male and four female emus in April, June, and November for RNA-sequencing. We found 100 DE genes (47 seasonally in males; 34 seasonally in females; 19 between sexes). Seasonally DE genes with significant difference between the sexes in gene ontology terms suggested integrin beta chain-2 (ITGB2) influences fat changes, in concordance with earlier studies. Six seasonally DE genes functioned in more than two enriched pathways (two female: angiopoietin-like 4 (ANGPTL4) and lipoprotein lipase (LPL); four male: lumican (LUM), osteoglycin (OGN), aldolase B (ALDOB), and solute carrier family 37 member 2 (SLC37A2)). Two sexually DE genes, follicle stimulating hormone receptor (FSHR) and perilipin 2 (PLIN2), had functional investigations supporting their influence on fat gain and loss. The results suggested these nine genes influence fat metabolism and deposition in emus.


Assuntos
Dromaiidae , Tecido Adiposo/metabolismo , Animais , Sequência de Bases , Dromaiidae/genética , Feminino , Expressão Gênica , Masculino , Estações do Ano
11.
NAR Genom Bioinform ; 3(4): lqab105, 2021 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-34859209

RESUMO

Recent advances in single-cell RNA sequencing technologies have made detection of transcripts in single cells possible. The level of resolution provided by these technologies can be used to study changes in transcript usage across cell populations and help investigate new biology. Here, we introduce RNA-Scoop, an interactive cell cluster and transcriptome visualization tool to analyze transcript usage across cell categories and clusters. The tool allows users to examine differential transcript expression across clusters and investigate how usage of specific transcript expression mechanisms varies across cell groups.

12.
Front Genet ; 12: 665888, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34149808

RESUMO

RNA sequencing (RNAseq) has been widely used to generate bulk gene expression measurements collected from pools of cells. Only relatively recently have single-cell RNAseq (scRNAseq) methods provided opportunities for gene expression analyses at the single-cell level, allowing researchers to study heterogeneous mixtures of cells at unprecedented resolution. Tumors tend to be composed of heterogeneous cellular mixtures and are frequently the subjects of such analyses. Extensive method developments have led to several protocols for scRNAseq but, owing to the small amounts of RNA in single cells, technical constraints have required compromises. For example, the majority of scRNAseq methods are limited to sequencing only the 3' or 5' termini of transcripts. Other protocols that facilitate full-length transcript profiling tend to capture only polyadenylated mRNAs and are generally limited to processing only 96 cells at a time. Here, we address these limitations and present a novel protocol that allows for the high-throughput sequencing of full-length, total RNA at single-cell resolution. We demonstrate that our method produced strand-specific sequencing data for both polyadenylated and non-polyadenylated transcripts, enabled the profiling of transcript regions beyond only transcript termini, and yielded data rich enough to allow identification of cell types from heterogeneous biological samples.

13.
Nat Commun ; 12(1): 2474, 2021 04 30.
Artigo em Inglês | MEDLINE | ID: mdl-33931648

RESUMO

As more clinically-relevant genomic features of myeloid malignancies are revealed, it has become clear that targeted clinical genetic testing is inadequate for risk stratification. Here, we develop and validate a clinical transcriptome-based assay for stratification of acute myeloid leukemia (AML). Comparison of ribonucleic acid sequencing (RNA-Seq) to whole genome and exome sequencing reveals that a standalone RNA-Seq assay offers the greatest diagnostic return, enabling identification of expressed gene fusions, single nucleotide and short insertion/deletion variants, and whole-transcriptome expression information. Expression data from 154 AML patients are used to develop a novel AML prognostic score, which is strongly associated with patient outcomes across 620 patients from three independent cohorts, and 42 patients from a prospective cohort. When combined with molecular risk guidelines, the risk score allows for the re-stratification of 22.1 to 25.3% of AML patients from three independent cohorts into correct risk groups. Within the adverse-risk subgroup, we identify a subset of patients characterized by dysregulated integrin signaling and RUNX1 or TP53 mutation. We show that these patients may benefit from therapy with inhibitors of focal adhesion kinase, encoded by PTK2, demonstrating additional utility of transcriptome-based testing for therapy selection in myeloid malignancy.


Assuntos
Biomarcadores Tumorais/metabolismo , Regulação Neoplásica da Expressão Gênica/genética , Leucemia Mieloide Aguda/diagnóstico , Leucemia Mieloide Aguda/metabolismo , Biomarcadores Tumorais/genética , Linhagem Celular Tumoral , Estudos de Coortes , Subunidade alfa 2 de Fator de Ligação ao Core/genética , Subunidade alfa 2 de Fator de Ligação ao Core/metabolismo , Feminino , Fusão Gênica , Humanos , Mutação INDEL , Integrinas/genética , Integrinas/metabolismo , Leucemia Mieloide Aguda/genética , Masculino , Polimorfismo de Nucleotídeo Único , Prognóstico , Estudos Prospectivos , RNA-Seq , Fatores de Risco , Transdução de Sinais/genética , Análise de Sobrevida , Transcriptoma , Proteína Supressora de Tumor p53/genética , Proteína Supressora de Tumor p53/metabolismo , Sequenciamento do Exoma , Sequenciamento Completo do Genoma
14.
Genome Res ; 30(8): 1191-1200, 2020 08.
Artigo em Inglês | MEDLINE | ID: mdl-32817073

RESUMO

Despite the rapid advance in single-cell RNA sequencing (scRNA-seq) technologies within the last decade, single-cell transcriptome analysis workflows have primarily used gene expression data while isoform sequence analysis at the single-cell level still remains fairly limited. Detection and discovery of isoforms in single cells is difficult because of the inherent technical shortcomings of scRNA-seq data, and existing transcriptome assembly methods are mainly designed for bulk RNA samples. To address this challenge, we developed RNA-Bloom, an assembly algorithm that leverages the rich information content aggregated from multiple single-cell transcriptomes to reconstruct cell-specific isoforms. Assembly with RNA-Bloom can be either reference-guided or reference-free, thus enabling unbiased discovery of novel isoforms or foreign transcripts. We compared both assembly strategies of RNA-Bloom against five state-of-the-art reference-free and reference-based transcriptome assembly methods. In our benchmarks on a simulated 384-cell data set, reference-free RNA-Bloom reconstructed 37.9%-38.3% more isoforms than the best reference-free assembler, whereas reference-guided RNA-Bloom reconstructed 4.1%-11.6% more isoforms than reference-based assemblers. When applied to a real 3840-cell data set consisting of more than 4 billion reads, RNA-Bloom reconstructed 9.7%-25.0% more isoforms than the best competing reference-based and reference-free approaches evaluated. We expect RNA-Bloom to boost the utility of scRNA-seq data beyond gene expression analysis, expanding what is informatically accessible now.


Assuntos
Perfilação da Expressão Gênica/métodos , RNA-Seq/métodos , Análise de Célula Única/métodos , Transcriptoma/genética , Algoritmos , Animais , Sequência de Bases , Humanos , Camundongos , Isoformas de Proteínas/genética , Software
15.
Gigascience ; 9(6)2020 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-32520350

RESUMO

BACKGROUND: Compared with second-generation sequencing technologies, third-generation single-molecule RNA sequencing has unprecedented advantages; the long reads it generates facilitate isoform-level transcript characterization. In particular, the Oxford Nanopore Technology sequencing platforms have become more popular in recent years owing to their relatively high affordability and portability compared with other third-generation sequencing technologies. To aid the development of analytical tools that leverage the power of this technology, simulated data provide a cost-effective solution with ground truth. However, a nanopore sequence simulator targeting transcriptomic data is not available yet. FINDINGS: We introduce Trans-NanoSim, a tool that simulates reads with technical and transcriptome-specific features learnt from nanopore RNA-sequncing data. We comprehensively benchmarked Trans-NanoSim on direct RNA and complementary DNA datasets describing human and mouse transcriptomes. Through comparison against other nanopore read simulators, we show the unique advantage and robustness of Trans-NanoSim in capturing the characteristics of nanopore complementary DNA and direct RNA reads. CONCLUSIONS: As a cost-effective alternative to sequencing real transcriptomes, Trans-NanoSim will facilitate the rapid development of analytical tools for nanopore RNA-sequencing data. Trans-NanoSim and its pre-trained models are freely accessible at https://github.com/bcgsc/NanoSim.


Assuntos
Biologia Computacional/métodos , Genômica/métodos , Sequenciamento de Nucleotídeos em Larga Escala , Software , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Reprodutibilidade dos Testes , Transcriptoma , Fluxo de Trabalho
16.
Bioinformatics ; 36(7): 2256-2257, 2020 04 01.
Artigo em Inglês | MEDLINE | ID: mdl-31790154

RESUMO

SUMMARY: Presence or absence of gene fusions is one of the most important diagnostic markers in many cancer types. Consequently, fusion detection methods using various genomics data types, such as RNA sequencing (RNA-seq) are valuable tools for research and clinical applications. While information-rich RNA-seq data have proven to be instrumental in discovery of a number of hallmark fusion events, bioinformatics tools to detect fusions still have room for improvement. Here, we present Fusion-Bloom, a fusion detection method that leverages recent developments in de novo transcriptome assembly and assembly-based structural variant calling technologies (RNA-Bloom and PAVFinder, respectively). We benchmarked Fusion-Bloom against the performance of five other state-of-the-art fusion detection tools using multiple datasets. Overall, we observed Fusion-Bloom to display a good balance between detection sensitivity and specificity. We expect the tool to find applications in translational research and clinical genomics pipelines. AVAILABILITY AND IMPLEMENTATION: Fusion-Bloom is implemented as a UNIX Make utility, available at https://github.com/bcgsc/pavfinder and released under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Transcriptoma , Genômica , RNA , Análise de Sequência de RNA
17.
BMC Med Genomics ; 11(1): 79, 2018 Sep 10.
Artigo em Inglês | MEDLINE | ID: mdl-30200994

RESUMO

BACKGROUND: RNA-seq is a powerful and cost-effective technology for molecular diagnostics of cancer and other diseases, and it can reach its full potential when coupled with validated clinical-grade informatics tools. Despite recent advances in long-read sequencing, transcriptome assembly of short reads remains a useful and cost-effective methodology for unveiling transcript-level rearrangements and novel isoforms. One of the major concerns for adopting the proven de novo assembly approach for RNA-seq data in clinical settings has been the analysis turnaround time. To address this concern, we have developed a targeted approach to expedite assembly and analysis of RNA-seq data. RESULTS: Here we present our Targeted Assembly Pipeline (TAP), which consists of four stages: 1) alignment-free gene-level classification of RNA-seq reads using BioBloomTools, 2) de novo assembly of individual targets using Trans-ABySS, 3) alignment of assembled contigs to the reference genome and transcriptome with GMAP and BWA and 4) structural and splicing variant detection using PAVFinder. We show that PAVFinder is a robust gene fusion detection tool when compared to established methods such as Tophat-Fusion and deFuse on simulated data of 448 events. Using the Leucegene acute myeloid leukemia (AML) RNA-seq data and a set of 580 COSMIC target genes, TAP identified a wide range of hallmark molecular anomalies including gene fusions, tandem duplications, insertions and deletions in agreement with published literature results. Moreover, also in this dataset, TAP captured AML-specific splicing variants such as skipped exons and novel splice sites reported in studies elsewhere. Running time of TAP on 100-150 million read pairs and a 580-gene set is one to 2 hours on a 48-core machine. CONCLUSIONS: We demonstrated that TAP is a fast and robust RNA-seq variant detection pipeline that is potentially amenable to clinical applications. TAP is available at http://www.bcgsc.ca/platform/bioinfo/software/pavfinder.


Assuntos
Variação Genética , Genômica/métodos , RNA/metabolismo , Interface Usuário-Computador , Humanos , Mutação INDEL , Leucemia Mieloide Aguda/genética , Leucemia Mieloide Aguda/patologia , RNA/química , RNA/genética , Splicing de RNA , Análise de Sequência de RNA
18.
BMC Genomics ; 19(1): 536, 2018 Jul 13.
Artigo em Inglês | MEDLINE | ID: mdl-30005633

RESUMO

BACKGROUND: Alternative polyadenylation (APA) results in messenger RNA molecules with different 3' untranslated regions (3' UTRs), affecting the molecules' stability, localization, and translation. APA is pervasive and implicated in cancer. Earlier reports on APA focused on 3' UTR length modifications and commonly characterized APA events as 3' UTR shortening or lengthening. However, such characterization oversimplifies the processing of 3' ends of transcripts and fails to adequately describe the various scenarios we observe. RESULTS: We built a cloud-based targeted de novo transcript assembly and analysis pipeline that incorporates our previously developed cleavage site prediction tool, KLEAT. We applied this pipeline to elucidate the APA profiles of 114 genes in 9939 tumor and 729 tissue normal samples from The Cancer Genome Atlas (TCGA). The full set of 10,668 RNA-Seq samples from 33 cancer types has not been utilized by previous APA studies. By comparing the frequencies of predicted cleavage sites between normal and tumor sample groups, we identified 77 events (i.e. gene-cancer type pairs) of tumor-specific APA regulation in 13 cancer types; for 15 genes, such regulation is recurrent across multiple cancers. Our results also support a previous report showing the 3' UTR shortening of FGF2 in multiple cancers. However, over half of the events we identified display complex changes to 3' UTR length that resist simple classification like shortening or lengthening. CONCLUSIONS: Recurrent tumor-specific regulation of APA is widespread in cancer. However, the regulation pattern that we observed in TCGA RNA-seq data cannot be described as straightforward 3' UTR shortening or lengthening. Continued investigation into this complex, nuanced regulatory landscape will provide further insight into its role in tumor formation and development.


Assuntos
Neoplasias/genética , RNA Mensageiro/genética , Regiões 3' não Traduzidas , Computação em Nuvem , Bases de Dados Genéticas , Fator 2 de Crescimento de Fibroblastos/genética , Regulação Neoplásica da Expressão Gênica , Humanos , Recidiva Local de Neoplasia/genética , Neoplasias/patologia , Poliadenilação , Clivagem do RNA , RNA Mensageiro/metabolismo , Software
19.
Nat Cell Biol ; 17(3): 311-21, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-25686251

RESUMO

Although recent studies have shown that adenosine-to-inosine (A-to-I) RNA editing occurs in microRNAs (miRNAs), its effects on tumour growth and metastasis are not well understood. We present evidence of CREB-mediated low expression of ADAR1 in metastatic melanoma cell lines and tumour specimens. Re-expression of ADAR1 resulted in the suppression of melanoma growth and metastasis in vivo. Consequently, we identified three miRNAs undergoing A-to-I editing in the weakly metastatic melanoma but not in strongly metastatic cell lines. One of these miRNAs, miR-455-5p, has two A-to-I RNA-editing sites. The biological function of edited miR-455-5p is different from that of the unedited form, as it recognizes a different set of genes. Indeed, wild-type miR-455-5p promotes melanoma metastasis through inhibition of the tumour suppressor gene CPEB1. Moreover, wild-type miR-455 enhances melanoma growth and metastasis in vivo, whereas the edited form inhibits these features. These results demonstrate a previously unrecognized role for RNA editing in melanoma progression.


Assuntos
Adenosina/metabolismo , Regulação Neoplásica da Expressão Gênica , Inosina/metabolismo , Melanoma/genética , Edição de RNA , Neoplasias Cutâneas/genética , Adenosina Desaminase/genética , Adenosina Desaminase/metabolismo , Animais , Sequência de Bases , Linhagem Celular Tumoral , Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/genética , Proteína de Ligação ao Elemento de Resposta ao AMP Cíclico/metabolismo , Progressão da Doença , Feminino , Genes Reporter , Humanos , Luciferases/genética , Luciferases/metabolismo , Melanoma/metabolismo , Melanoma/patologia , Camundongos , Camundongos Nus , MicroRNAs , Dados de Sequência Molecular , Metástase Neoplásica , Transplante de Neoplasias , Proteínas de Ligação a RNA/genética , Proteínas de Ligação a RNA/metabolismo , Neoplasias Cutâneas/metabolismo , Neoplasias Cutâneas/patologia , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Fatores de Poliadenilação e Clivagem de mRNA/genética , Fatores de Poliadenilação e Clivagem de mRNA/metabolismo
20.
Pac Symp Biocomput ; : 347-58, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25592595

RESUMO

In eukaryotic cells, alternative cleavage of 3' untranslated regions (UTRs) can affect transcript stability, transport and translation. For polyadenylated (poly(A)) transcripts, cleavage sites can be characterized with short-read sequencing using specialized library construction methods. However, for large-scale cohort studies as well as for clinical sequencing applications, it is desirable to characterize such events using RNA-seq data, as the latter are already widely applied to identify other relevant information, such as mutations, alternative splicing and chimeric transcripts. Here we describe KLEAT, an analysis tool that uses de novo assembly of RNA-seq data to characterize cleavage sites on 3' UTRs. We demonstrate the performance of KLEAT on three cell line RNA-seq libraries constructed and sequenced by the ENCODE project, and assembled using Trans-ABySS. Validating the KLEAT predictions with matched ENCODE RNA-seq and RNA-PET libraries, we show that the tool has over 90% positive predictive value when there are at least three RNA-seq reads supporting a poly(A) tail and requiring at least three RNA-PET reads mapping within 100 nucleotides as validation. We also compare the performance of KLEAT with other popular RNA-seq analysis pipelines that reconstruct 3' UTR ends, and show that it performs favourably, based on an ROC-like curve.


Assuntos
Transcriptoma , Regiões 3' não Traduzidas , Sítios de Ligação , Linhagem Celular , Biologia Computacional , Biblioteca Gênica , Humanos , Curva ROC , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de RNA/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...