Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 27
Filter
Add more filters










Publication year range
1.
Nat Methods ; 2024 Jun 07.
Article in English | MEDLINE | ID: mdl-38849569

ABSTRACT

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

2.
Physiol Plant ; 176(3): e14398, 2024.
Article in English | MEDLINE | ID: mdl-38894544

ABSTRACT

RNA-seq data is currently generated in numerous non-model organisms that lack a reference genome. Nevertheless, the confirmation of gene expression levels using RT-qPCR remains necessary, and the existing techniques do not seamlessly interface with the omics pipeline workflow. Developing primers for many targets by utilising orthologous genes can be a laborious, imprecise, and subjective process, particularly for plant species that are not commonly studied and do not have a known genome. We have developed a primer design tool, named PABLOG, that analyses the alignments generated from long or short RNA-seq reads and a reference orthologous gene. PABLOG scans, much like a bee searching several flowers for pollen, and presents a sorted list of potential exon-exon junction locations, ranked according to their reliability. Through computational analysis across the whole genomes of several non-model species, we demonstrate that PABLOG performs more effectively than other methods in identifying exon-exon junctions since it generates significantly fewer false-positive results. Examination of candidate regions at the gene level, in conjunction with laboratory studies, shows that the suggested primers successfully amplified particular targets in non-model plants without any presence of genomic contamination. Our tool includes a consensus sequence feature that enables the complete process of primer design, from aligning with the target gene to determining amplification parameters. The utility can be accessed via the GitHub repository located at: https://github.com/tools4plant-omics/PABLOG.


Subject(s)
DNA Primers , Bees/genetics , DNA Primers/genetics , Exons/genetics , Software , Animals , Genes, Plant/genetics , Genome, Plant/genetics , Computational Biology/methods
3.
bioRxiv ; 2023 Jul 27.
Article in English | MEDLINE | ID: mdl-37546854

ABSTRACT

The Long-read RNA-Seq Genome Annotation Assessment Project (LRGASP) Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. The consortium generated over 427 million long-read sequences from cDNA and direct RNA datasets, encompassing human, mouse, and manatee species, using different protocols and sequencing platforms. These data were utilized by developers to address challenges in transcript isoform detection and quantification, as well as de novo transcript isoform identification. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. When aiming to detect rare and novel transcripts or when using reference-free approaches, incorporating additional orthogonal data and replicate samples are advised. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.

4.
Cell Death Dis ; 14(1): 19, 2023 01 12.
Article in English | MEDLINE | ID: mdl-36635266

ABSTRACT

The abnormal tumor microenvironment (TME) often dictates the therapeutic response of cancer to chemo- and immuno-therapy. Aberrant expression of pericentromeric satellite repeats has been reported for epithelial cancers, including lung cancer. However, the transcription of tandemly repetitive elements in stromal cells of the TME has been unappreciated, limiting the optimal use of satellite transcripts as biomarkers or anti-cancer targets. We found that transcription of pericentromeric satellite DNA (satDNA) in mouse and human lung adenocarcinoma was observed in cancer-associated fibroblasts (CAFs). In vivo, lung fibroblasts expressed pericentromeric satellite repeats HS2/HS3 specifically in tumors. In vitro, transcription of satDNA was induced in lung fibroblasts in response to TGFß, IL1α, matrix stiffness, direct contact with tumor cells and treatment with chemotherapeutic drugs. Single-cell transcriptome analysis of human lung adenocarcinoma confirmed that CAFs were the cell type with the highest number of satellite transcripts. Human HS2/HS3 pericentromeric transcripts were detected in the nucleus, cytoplasm, extracellularly and co-localized with extracellular vesicles in situ in human biopsies and activated fibroblasts in vitro. The transcripts were transmitted into recipient cells and entered their nuclei. Knock-down of satellite transcripts in human lung fibroblasts attenuated cellular senescence and blocked the formation of an inflammatory CAFs phenotype which resulted in the inhibition of their pro-tumorigenic functions. In sum, our data suggest that satellite long non-coding (lnc) RNAs are induced in CAFs, regulate expression of inflammatory genes and can be secreted from the cells, which potentially might present a new element of cell-cell communication in the TME.


Subject(s)
Adenocarcinoma , Cancer-Associated Fibroblasts , Lung Neoplasms , RNA, Long Noncoding , Humans , Animals , Mice , Cancer-Associated Fibroblasts/metabolism , RNA, Long Noncoding/genetics , RNA, Long Noncoding/metabolism , Fibroblasts/metabolism , DNA, Satellite , Lung Neoplasms/pathology , Adenocarcinoma/genetics , Lung , Carcinogenesis/genetics , Tumor Microenvironment/genetics
5.
Int J Mol Sci ; 24(2)2023 Jan 11.
Article in English | MEDLINE | ID: mdl-36674941

ABSTRACT

Elaboration of protocols for differentiation of human pluripotent stem cells to dopamine neurons is an important issue for development of cell replacement therapy for Parkinson's disease. A number of protocols have been already developed; however, their efficiency and specificity still can be improved. Investigating the role of signaling cascades, important for neurogenesis, can help to solve this problem and to provide a deeper understanding of their role in neuronal development. Notch signaling plays an essential role in development and maintenance of the central nervous system after birth. In our study, we analyzed the effect of Notch activation and inhibition at the early stages of differentiation of human induced pluripotent stem cells to dopaminergic neurons. We found that, during the first seven days of differentiation, the cells were not sensitive to the Notch inhibition. On the contrary, activation of Notch signaling during the same time period led to significant changes and was associated with an increase in expression of genes, specific for caudal parts of the brain, a decrease of expression of genes, specific for forebrain, as well as a decrease of expression of genes, important for the formation of axons and dendrites and microtubule stabilizing proteins.


Subject(s)
Induced Pluripotent Stem Cells , Pluripotent Stem Cells , Humans , Dopaminergic Neurons/metabolism , Induced Pluripotent Stem Cells/metabolism , Cell Differentiation , Pluripotent Stem Cells/metabolism , Signal Transduction , Receptors, Notch/metabolism
6.
Nat Biotechnol ; 41(7): 915-918, 2023 Jul.
Article in English | MEDLINE | ID: mdl-36593406

ABSTRACT

Annotating newly sequenced genomes and determining alternative isoforms from long-read RNA data are complex and incompletely solved problems. Here we present IsoQuant-a computational tool using intron graphs that accurately reconstructs transcripts both with and without reference genome annotation. For novel transcript discovery, IsoQuant reduces the false-positive rate fivefold and 2.5-fold for Oxford Nanopore reference-based or reference-free mode, respectively. IsoQuant also improves performance for Pacific Biosciences data.


Subject(s)
High-Throughput Nucleotide Sequencing , RNA , Protein Isoforms/genetics , Sequence Analysis, RNA , Genome , Sequence Analysis, DNA
7.
Front Microbiol ; 13: 981458, 2022.
Article in English | MEDLINE | ID: mdl-36386613

ABSTRACT

While metagenome sequencing may provide insights on the genome sequences and composition of microbial communities, metatranscriptome analysis can be useful for studying the functional activity of a microbiome. RNA-Seq data provides the possibility to determine active genes in the community and how their expression levels depend on external conditions. Although the field of metatranscriptomics is relatively young, the number of projects related to metatranscriptome analysis increases every year and the scope of its applications expands. However, there are several problems that complicate metatranscriptome analysis: complexity of microbial communities, wide dynamic range of transcriptome expression and importantly, the lack of high-quality computational methods for assembling meta-RNA sequencing data. These factors deteriorate the contiguity and completeness of metatranscriptome assemblies, therefore affecting further downstream analysis. Here we present MetaGT, a pipeline for de novo assembly of metatranscriptomes, which is based on the idea of combining both metatranscriptomic and metagenomic data sequenced from the same sample. MetaGT assembles metatranscriptomic contigs and fills in missing regions based on their alignments to metagenome assembly. This approach allows to overcome described complexities and obtain complete RNA sequences, and additionally estimate their abundances. Using various publicly available real and simulated datasets, we demonstrate that MetaGT yields significant improvement in coverage and completeness of metatranscriptome assemblies compared to existing methods that do not exploit metagenomic data. The pipeline is implemented in NextFlow and is freely available from https://github.com/ablab/metaGT.

8.
Nat Biotechnol ; 40(7): 1082-1092, 2022 07.
Article in English | MEDLINE | ID: mdl-35256815

ABSTRACT

Single-nuclei RNA sequencing characterizes cell types at the gene level. However, compared to single-cell approaches, many single-nuclei cDNAs are purely intronic, lack barcodes and hinder the study of isoforms. Here we present single-nuclei isoform RNA sequencing (SnISOr-Seq). Using microfluidics, PCR-based artifact removal, target enrichment and long-read sequencing, SnISOr-Seq increased barcoded, exon-spanning long reads 7.5-fold compared to naive long-read single-nuclei sequencing. We applied SnISOr-Seq to adult human frontal cortex and found that exons associated with autism exhibit coordinated and highly cell-type-specific inclusion. We found two distinct combination patterns: those distinguishing neural cell types, enriched in TSS-exon, exon-polyadenylation-site and non-adjacent exon pairs, and those with multiple configurations within one cell type, enriched in adjacent exon pairs. Finally, we observed that human-specific exons are almost as tightly coordinated as conserved exons, implying that coordination can be rapidly established during evolution. SnISOr-Seq enables cell-type-specific long-read isoform analysis in human brain and in any frozen or hard-to-dissociate sample.


Subject(s)
Brain , RNA , Alternative Splicing/genetics , Brain/metabolism , Exons/genetics , Humans , Protein Isoforms/genetics , RNA/genetics , Sequence Analysis, RNA
9.
Genome Res ; 32(4): 726-737, 2022 04.
Article in English | MEDLINE | ID: mdl-35301264

ABSTRACT

Long-read transcriptomics require understanding error sources inherent to technologies. Current approaches cannot compare methods for an individual RNA molecule. Here, we present a novel platform-comparison method that combines barcoding strategies and long-read sequencing to sequence cDNA copies representing an individual RNA molecule on both Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (ONT). We compare these long-read pairs in terms of sequence content and isoform patterns. Although individual read pairs show high similarity, we find differences in (1) aligned length, (2) transcription start site (TSS), (3) polyadenylation site (poly(A)-site) assignment, and (4) exon-intron structures. Overall, 25% of read pairs disagree on either TSS, poly(A)-site, or splice site. Intron-chain disagreement typically arises from alignment errors of microexons and complicated splice sites. Our single-molecule technology comparison reveals that inconsistencies are often caused by sequencing error-induced inaccurate ONT alignments, especially to downstream GUNNGU donor motifs. However, annotation-disagreeing upstream shifts in NAGNAG acceptors in ONT are often confirmed by PacBio and are thus likely real. In both barcoded and nonbarcoded ONT reads, we find that intron number and proximity of GU/AGs better predict inconsistencies with the annotation than read quality alone. We summarize these findings in an annotation-based algorithm for spliced alignment correction that improves subsequent transcript construction with ONT reads.


Subject(s)
Nanopores , DNA, Complementary , High-Throughput Nucleotide Sequencing/methods , RNA , Sequence Analysis, DNA/methods , Technology
10.
Int J Mol Sci ; 22(16)2021 Aug 16.
Article in English | MEDLINE | ID: mdl-34445502

ABSTRACT

Trace amine-associated receptors (TAAR) recognize organic compounds, including primary, secondary, and tertiary amines. The TAAR5 receptor is known to be involved in the olfactory sensing of innate socially relevant odors encoded by volatile amines. However, emerging data point to the involvement of TAAR5 in brain functions, particularly in the emotional behaviors mediated by the limbic system which suggests its potential contribution to the pathogenesis of neuropsychiatric diseases. TAAR5 expression was explored in datasets available in the Gene Expression Omnibus, Allen Brain Atlas, and Human Protein Atlas databases. Transcriptomic data demonstrate ubiquitous low TAAR5 expression in the cortical and limbic brain areas, the amygdala and the hippocampus, the nucleus accumbens, the thalamus, the hypothalamus, the basal ganglia, the cerebellum, the substantia nigra, and the white matter. Altered TAAR5 expression is identified in Down syndrome, major depressive disorder, or HIV-associated encephalitis. Taken together, these data indicate that TAAR5 in humans is expressed not only in the olfactory system but also in certain brain structures, including the limbic regions receiving olfactory input and involved in critical brain functions. Thus, TAAR5 can potentially be involved in the pathogenesis of brain disorders and represents a valuable novel target for neuropsychopharmacology.


Subject(s)
Brain/metabolism , Depressive Disorder, Major/genetics , Down Syndrome/genetics , Down-Regulation , Encephalitis, Viral/genetics , HIV Infections/complications , Receptors, G-Protein-Coupled/genetics , Databases, Genetic , Encephalitis, Viral/etiology , Gene Expression Profiling , Gene Expression Regulation , HIV Infections/genetics , Humans , Oligonucleotide Array Sequence Analysis , Sequence Analysis, RNA , Tissue Distribution
11.
Nat Commun ; 12(1): 463, 2021 01 19.
Article in English | MEDLINE | ID: mdl-33469025

ABSTRACT

Splicing varies across brain regions, but the single-cell resolution of regional variation is unclear. We present a single-cell investigation of differential isoform expression (DIE) between brain regions using single-cell long-read sequencing in mouse hippocampus and prefrontal cortex in 45 cell types at postnatal day 7 ( www.isoformAtlas.com ). Isoform tests for DIE show better performance than exon tests. We detect hundreds of DIE events traceable to cell types, often corresponding to functionally distinct protein isoforms. Mostly, one cell type is responsible for brain-region specific DIE. However, for fewer genes, multiple cell types influence DIE. Thus, regional identity can, although rarely, override cell-type specificity. Cell types indigenous to one anatomic structure display distinctive DIE, e.g. the choroid plexus epithelium manifests distinct transcription-start-site usage. Spatial transcriptomics and long-read sequencing yield a spatially resolved splicing map. Our methods quantify isoform expression with cell-type and spatial resolution and it contributes to further our understanding of how the brain integrates molecular and cellular complexity.


Subject(s)
Alternative Splicing/physiology , Gene Expression Regulation, Developmental/physiology , Hippocampus/metabolism , Prefrontal Cortex/metabolism , Protein Isoforms/metabolism , Animals , Animals, Newborn , Computational Biology , Female , Hippocampus/cytology , Hippocampus/growth & development , Mice , Models, Animal , Prefrontal Cortex/cytology , Prefrontal Cortex/growth & development , Protein Isoforms/analysis , Protein Isoforms/genetics , Single-Cell Analysis/methods , Spatial Analysis
12.
Sci Rep ; 10(1): 19981, 2020 11 17.
Article in English | MEDLINE | ID: mdl-33203921

ABSTRACT

Stress-related neuropsychiatric disorders are widespread, debilitating and often treatment-resistant illnesses that represent an urgent unmet biomedical problem. Animal models of these disorders are widely used to study stress pathogenesis. A more recent and historically less utilized model organism, the zebrafish (Danio rerio), is a valuable tool in stress neuroscience research. Utilizing the 5-week chronic unpredictable stress (CUS) model, here we examined brain transcriptomic profiles and complex dynamic behavioral stress responses, as well as neurochemical alterations in adult zebrafish and their correction by chronic antidepressant, fluoxetine, treatment. Overall, CUS induced complex neurochemical and behavioral alterations in zebrafish, including stable anxiety-like behaviors and serotonin metabolism deficits. Chronic fluoxetine (0.1 mg/L for 11 days) rescued most of the observed behavioral and neurochemical responses. Finally, whole-genome brain transcriptomic analyses revealed altered expression of various CNS genes (partially rescued by chronic fluoxetine), including inflammation-, ubiquitin- and arrestin-related genes. Collectively, this supports zebrafish as a valuable translational tool to study stress-related pathogenesis, whose anxiety and serotonergic deficits parallel rodent and clinical studies, and genomic analyses implicate neuroinflammation, structural neuronal remodeling and arrestin/ubiquitin pathways in both stress pathogenesis and its potential therapy.


Subject(s)
Behavior, Animal/physiology , Stress, Psychological/physiopathology , Transcriptome/physiology , Zebrafish/physiology , Animals , Antidepressive Agents/pharmacology , Anxiety/drug therapy , Anxiety/physiopathology , Behavior, Animal/drug effects , Brain/drug effects , Brain/physiopathology , Disease Models, Animal , Female , Fluoxetine/pharmacology , Male , Stress, Psychological/drug therapy , Transcriptome/drug effects
13.
Proc Natl Acad Sci U S A ; 117(44): 27300-27306, 2020 11 03.
Article in English | MEDLINE | ID: mdl-33087570

ABSTRACT

Conventional "bulk" PCR often yields inefficient and nonuniform amplification of complex templates in DNA libraries, introducing unwanted biases. Amplification of single DNA molecules encapsulated in a myriad of emulsion droplets (emulsion PCR, ePCR) allows the mitigation of this problem. Different ePCR regimes were experimentally analyzed to identify the most robust techniques for enhanced amplification of DNA libraries. A phenomenological mathematical model that forms an essential basis for optimal use of ePCR for library amplification was developed. A detailed description by high-throughput sequencing of amplified DNA-encoded libraries highlights the principal advantages of ePCR over bulk PCR. ePCR outperforms PCR, reduces gross DNA errors, and provides a more uniform distribution of the amplified sequences. The quasi single-molecule amplification achieved via ePCR represents the fundamental requirement in case of complex DNA templates being prone to diversity degeneration and provides a way to preserve the quality of DNA libraries.


Subject(s)
Emulsions/chemistry , High-Throughput Nucleotide Sequencing/methods , Polymerase Chain Reaction/methods , DNA/genetics , DNA Primers/genetics , Gene Library , Genome/genetics , Humans , Models, Theoretical , Nucleic Acid Amplification Techniques/methods , Templates, Genetic
14.
Plants (Basel) ; 9(9)2020 Sep 18.
Article in English | MEDLINE | ID: mdl-32961840

ABSTRACT

The association among environmental cues, ethylene response, ABA signaling, and reactive oxygen species (ROS) homeostasis in the process of seed dormancy release is nowadays well-established in many species. Alternating temperatures are recognized as one of the main environmental signals determining dormancy release, but their underlying mechanisms are scarcely known. Dry after-ripened wild cardoon achenes germinated poorly at a constant temperature of 20, 15, or 10 °C, whereas germination was stimulated by 80% at alternating temperatures of 20/10 °C. Using an RNA-Seq approach, we identified 23,640 and annotated 14,078 gene transcripts expressed in dry achenes and achenes exposed to constant or alternating temperatures. Transcriptional patterns identified in dry condition included seed reserve and response to dehydration stress genes (i.e., HSPs, peroxidases, and LEAs). At a constant temperature, we observed an upregulation of ABA biosynthesis genes (i.e., NCED9), ABA-responsive genes (i.e., ABI5 and TAP), as well as other genes previously related to physiological dormancy and inhibition of germination. However, the alternating temperatures were associated with the upregulation of ethylene metabolism (i.e., ACO1, 4, and ACS10) and signaling (i.e., EXPs) genes and ROS homeostasis regulators genes (i.e., RBOH and CAT). Accordingly, the ethylene production was twice as high at alternating than at constant temperatures. The presence in the germination medium of ethylene or ROS synthesis and signaling inhibitors reduced significantly, but not completely, germination at 20/10 °C. Conversely, the presence of methyl viologen and salicylhydroxamic acid (SHAM), a peroxidase inhibitor, partially increased germination at constant temperature. Taken together, the present study provides the first insights into the gene expression patterns and physiological response associated with dormancy release at alternating temperatures in wild cardoon (Cynara cardunculus var. sylvestris).

15.
BMC Genomics ; 21(1): 317, 2020 Aug 21.
Article in English | MEDLINE | ID: mdl-32819282

ABSTRACT

BACKGROUND: The investigation of transcriptome profiles using short reads in non-model organisms, which lack of well-annotated genomes, is limited by partial gene reconstruction and isoform detection. In contrast, long-reads sequencing techniques revealed their potential to generate complete transcript assemblies even when a reference genome is lacking. Cynara cardunculus var. altilis (DC) (cultivated cardoon) is a perennial hardy crop adapted to dry environments with many industrial and nutraceutical applications due to the richness of secondary metabolites mostly produced in flower heads. The investigation of this species benefited from the recent release of a draft genome, but the transcriptome profile during the capitula formation still remains unexplored. In the present study we show a transcriptome analysis of vegetative and inflorescence organs of cultivated cardoon through a novel hybrid RNA-seq assembly approach utilizing both long and short RNA-seq reads. RESULTS: The inclusion of a single Nanopore flow-cell output in a hybrid sequencing approach determined an increase of 15% complete assembled genes and 18% transcript isoforms respect to short reads alone. Among 25,463 assembled unigenes, we identified 578 new genes and updated 13,039 gene models, 11,169 of which were alternatively spliced isoforms. During capitulum development, 3424 genes were differentially expressed and approximately two-thirds were identified as transcription factors including bHLH, MYB, NAC, C2H2 and MADS-box which were highly expressed especially after capitulum opening. We also show the expression dynamics of key genes involved in the production of valuable secondary metabolites of which capitulum is rich such as phenylpropanoids, flavonoids and sesquiterpene lactones. Most of their biosynthetic genes were strongly transcribed in the flower heads with alternative isoforms exhibiting differentially expression levels across the tissues. CONCLUSIONS: This novel hybrid sequencing approach allowed to improve the transcriptome assembly, to update more than half of annotated genes and to identify many novel genes and different alternatively spliced isoforms. This study provides new insights on the flowering cycle in an Asteraceae plant, a valuable resource for plant biology and breeding in Cynara and an effective method for improving gene annotation.


Subject(s)
Cynara , Transcriptome , Cynara/genetics , Gene Expression Profiling , High-Throughput Nucleotide Sequencing , Molecular Sequence Annotation , Plant Breeding
16.
BMC Bioinformatics ; 21(Suppl 12): 302, 2020 Jul 24.
Article in English | MEDLINE | ID: mdl-32703149

ABSTRACT

BACKGROUND: De novo RNA-Seq assembly is a powerful method for analysing transcriptomes when the reference genome is not available or poorly annotated. However, due to the short length of Illumina reads it is usually impossible to reconstruct complete sequences of complex genes and alternative isoforms. Recently emerged possibility to generate long RNA reads, such as PacBio and Oxford Nanopores, may dramatically improve the assembly quality, and thus the consecutive analysis. While reference-based tools for analysing long RNA reads were recently developed, there is no established pipeline for de novo assembly of such data. RESULTS: In this work we present a novel method that allows to perform high-quality de novo transcriptome assemblies by combining accuracy and reliability of short reads with exon structure information carried out from long error-prone reads. The algorithm is designed by incorporating existing hybridSPAdes approach into rnaSPAdes pipeline and adapting it for transcriptomic data. CONCLUSION: To evaluate the benefit of using long RNA reads we selected several datasets containing both Illumina and Iso-seq or Oxford Nanopore Technologies (ONT) reads. Using an existing quality assessment software, we show that hybrid assemblies performed with rnaSPAdes contain more full-length genes and alternative isoforms comparing to the case when only short-read data is used.


Subject(s)
Algorithms , Transcriptome/genetics , Databases, Genetic , Humans , MCF-7 Cells , Nanopores , RNA-Seq , Reproducibility of Results
17.
Curr Protoc Bioinformatics ; 70(1): e102, 2020 06.
Article in English | MEDLINE | ID: mdl-32559359

ABSTRACT

SPAdes-St. Petersburg genome Assembler-was originally developed for de novo assembly of genome sequencing data produced for cultivated microbial isolates and for single-cell genomic DNA sequencing. With time, the functionality of SPAdes was extended to enable assembly of IonTorrent data, as well as hybrid assembly from short and long reads (PacBio and Oxford Nanopore). In this article we present protocols for five different assembly pipelines that comprise the SPAdes package and that are used for assembly of metagenomes and transcriptomes as well as assembly of putative plasmids and biosynthetic gene clusters from whole-genome sequencing and metagenomic datasets. In addition, we present guidelines for understanding results with use cases for each pipeline, and several additional support protocols that help in using SPAdes properly. © 2020 Wiley Periodicals LLC. Basic Protocol 1: Assembling isolate bacterial datasets Basic Protocol 2: Assembling metagenomic datasets Basic Protocol 3: Assembling sets of putative plasmids Basic Protocol 4: Assembling transcriptomes Basic Protocol 5: Assembling putative biosynthetic gene clusters Support Protocol 1: Installing SPAdes Support Protocol 2: Providing input via command line Support Protocol 3: Providing input data via YAML format Support Protocol 4: Restarting previous run Support Protocol 5: Determining strand-specificity of RNA-seq data.


Subject(s)
Algorithms , Sequence Analysis, DNA/methods , Bacteria/genetics , Biosynthetic Pathways/genetics , Databases, Genetic , Metagenome , Multigene Family , Plasmids/genetics , RNA-Seq , Transcriptome/genetics
18.
Cell Syst ; 10(1): 99-108.e5, 2020 01 22.
Article in English | MEDLINE | ID: mdl-31864964

ABSTRACT

Cyclic and branch cyclic peptides (cyclopeptides) represent a class of bioactive natural products that include many antibiotics and anti-tumor compounds. Despite the recent advances in metabolomics analysis, still little is known about the cyclopeptides in the human gut and their possible interactions due to a lack of computational analysis pipelines that are applicable to such compounds. Here, we introduce CycloNovo, an algorithm for automated de novo cyclopeptide analysis and sequencing that employs de Bruijn graphs, the workhorse of DNA sequencing algorithms, to identify cyclopeptides in spectral datasets. CycloNovo reconstructed 32 previously unreported cyclopeptides (to the best of our knowledge) in the human gut and reported over a hundred cyclopeptides in other environments represented by various spectra on Global Natural Products Social Molecular Network (GNPS). https://github.com/bbehsaz/cyclonovo.


Subject(s)
Amino Acid Sequence/genetics , Gastrointestinal Microbiome/genetics , Peptides, Cyclic/chemistry , Humans , Mass Spectrometry
19.
Gigascience ; 8(9)2019 09 01.
Article in English | MEDLINE | ID: mdl-31494669

ABSTRACT

BACKGROUND: The possibility of generating large RNA-sequencing datasets has led to development of various reference-based and de novo transcriptome assemblers with their own strengths and limitations. While reference-based tools are widely used in various transcriptomic studies, their application is limited to the organisms with finished and well-annotated genomes. De novo transcriptome reconstruction from short reads remains an open challenging problem, which is complicated by the varying expression levels across different genes, alternative splicing, and paralogous genes. RESULTS: Herein we describe the novel transcriptome assembler rnaSPAdes, which has been developed on top of the SPAdes genome assembler and explores computational parallels between assembly of transcriptomes and single-cell genomes. We also present quality assessment reports for rnaSPAdes assemblies, compare it with modern transcriptome assembly tools using several evaluation approaches on various RNA-sequencing datasets, and briefly highlight strong and weak points of different assemblers. CONCLUSIONS: Based on the performed comparison between different assembly methods, we infer that it is not possible to detect the absolute leader according to all quality metrics and all used datasets. However, rnaSPAdes typically outperforms other assemblers by such important property as the number of assembled genes and isoforms, and at the same time has higher accuracy statistics on average comparing to the closest competitors.


Subject(s)
Algorithms , RNA-Seq , Transcriptome , Animals , Arabidopsis/genetics , Caenorhabditis elegans/genetics , Humans , Mice , Zea mays/genetics
20.
Bioinformatics ; 35(13): 2303-2305, 2019 07 01.
Article in English | MEDLINE | ID: mdl-30475983

ABSTRACT

SUMMARY: Scaffolding is an important step in every genome assembly pipeline, which allows to order contigs into longer sequences using various types of linkage information, such as mate-pair libraries and long reads. In this work, we operate with a notion of a scaffold graph-a graph, vertices of which correspond to the assembled contigs and edges represent connections between them. We present a software package called Scaffold Graph ToolKit that allows to construct and visualize scaffold graphs using different kinds of sequencing data. We show that the scaffold graph appears to be useful for analyzing and assessing genome assemblies, and demonstrate several use cases that can be helpful for both assembly software developers and their users. AVAILABILITY AND IMPLEMENTATION: SGTK is implemented in C++, Python and JavaScript and is freely available at https://github.com/olga24912/SGTK. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
Software , Sequence Analysis, DNA
SELECTION OF CITATIONS
SEARCH DETAIL
...