Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 167
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
BMC Genomics ; 25(1): 455, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720252

RESUMO

BACKGROUND: Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous ("multimappers"). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored. RESULTS: In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified. CONCLUSION: Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines -currently restricted to specific contexts or communities- to ensure the reliability of genomic and transcriptomic studies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Elementos de DNA Transponíveis/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Análise de Sequência de RNA/métodos
2.
RNA Biol ; 21(1): 1-13, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38797889

RESUMO

Although circular RNAs (circRNAs) play important roles in regulating gene expression, the understanding of circRNAs in livestock animals is scarce due to the significant challenge to characterize them from a biological sample. In this study, we assessed the outcomes of bovine circRNA identification using six enrichment approaches with the combination of ribosomal RNAs removal (Ribo); linear RNAs degradation (R); linear RNAs and RNAs with structured 3' ends degradation (RTP); ribosomal RNAs coupled with linear RNAs elimination (Ribo-R); ribosomal RNA, linear RNAs and RNAs with poly (A) tailing elimination (Ribo-RP); and ribosomal RNA, linear RNAs and RNAs with structured 3' ends elimination (Ribo-RTP), respectively. RNA-sequencing analysis revealed that different approaches led to varied ratio of uniquely mapped reads, false-positive rate of identifying circRNAs, and the number of circRNAs per million clean reads (Padj <0.05). Out of 2,285 and 2,939 highly confident circRNAs identified in liver and rumen tissues, respectively, 308 and 260 were commonly identified from five methods, with Ribo-RTP method identified the highest number of circRNAs. Besides, 507 of 4,051 identified bovine highly confident circRNAs had shared splicing sites with human circRNAs. The findings from this work provide optimized methods to identify bovine circRNAs from cattle tissues for downstream research of their biological roles in cattle.


Assuntos
RNA Circular , Bovinos , RNA Circular/genética , Animais , RNA Ribossômico/genética , Análise de Sequência de RNA/métodos , Fígado/metabolismo , Rúmen/metabolismo , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Humanos
3.
Nat Commun ; 15(1): 3946, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38729950

RESUMO

Disease modeling with isogenic Induced Pluripotent Stem Cell (iPSC)-differentiated organoids serves as a powerful technique for studying disease mechanisms. Multiplexed coculture is crucial to mitigate batch effects when studying the genetic effects of disease-causing variants in differentiated iPSCs or organoids, and demultiplexing at the single-cell level can be conveniently achieved by assessing natural genetic barcodes. Here, to enable cost-efficient time-series experimental designs via multiplexed bulk and single-cell RNA-seq of hybrids, we introduce a computational method in our Vireo Suite, Vireo-bulk, to effectively deconvolve pooled bulk RNA-seq data by genotype reference, and thereby quantify donor abundance over the course of differentiation and identify differentially expressed genes among donors. Furthermore, with multiplexed scRNA-seq and bulk RNA-seq, we demonstrate the usefulness and necessity of a pooled design to reveal donor iPSC line heterogeneity during macrophage cell differentiation and to model rare WT1 mutation-driven kidney disease with chimeric organoids. Our work provides an experimental and analytic pipeline for dissecting disease mechanisms with chimeric organoids.


Assuntos
Diferenciação Celular , Células-Tronco Pluripotentes Induzidas , Organoides , RNA-Seq , Análise de Célula Única , Organoides/metabolismo , Análise de Célula Única/métodos , Células-Tronco Pluripotentes Induzidas/metabolismo , Células-Tronco Pluripotentes Induzidas/citologia , Humanos , Diferenciação Celular/genética , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Macrófagos/metabolismo , Macrófagos/citologia , Animais , Análise da Expressão Gênica de Célula Única
4.
Nat Commun ; 15(1): 3972, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730241

RESUMO

The advancement of Long-Read Sequencing (LRS) techniques has significantly increased the length of sequencing to several kilobases, thereby facilitating the identification of alternative splicing events and isoform expressions. Recently, numerous computational tools for isoform detection using long-read sequencing data have been developed. Nevertheless, there remains a deficiency in comparative studies that systemically evaluate the performance of these tools, which are implemented with different algorithms, under various simulations that encompass potential influencing factors. In this study, we conducted a benchmark analysis of thirteen methods implemented in nine tools capable of identifying isoform structures from long-read RNA-seq data. We evaluated their performances using simulated data, which represented diverse sequencing platforms generated by an in-house simulator, RNA sequins (sequencing spike-ins) data, as well as experimental data. Our findings demonstrate IsoQuant as a highly effective tool for isoform detection with LRS, with Bambu and StringTie2 also exhibiting strong performance. These results offer valuable guidance for future research on alternative splicing analysis and the ongoing improvement of tools for isoform detection using LRS data.


Assuntos
Algoritmos , Processamento Alternativo , RNA Mensageiro , Análise de Sequência de RNA , Humanos , RNA Mensageiro/genética , RNA Mensageiro/análise , Análise de Sequência de RNA/métodos , Isoformas de RNA/genética , Software , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética
5.
Nucleic Acids Res ; 52(3): e13, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38059347

RESUMO

Differential expression analysis of RNA-seq is one of the most commonly performed bioinformatics analyses. Transcript-level quantifications are inherently more uncertain than gene-level read counts because of ambiguous assignment of sequence reads to transcripts. While sequence reads can usually be assigned unambiguously to a gene, reads are very often compatible with multiple transcripts for that gene, particularly for genes with many isoforms. Software tools designed for gene-level differential expression do not perform optimally on transcript counts because the read-to-transcript ambiguity (RTA) disrupts the mean-variance relationship normally observed for gene level RNA-seq data and interferes with the efficiency of the empirical Bayes dispersion estimation procedures. The pseudoaligners kallisto and Salmon provide bootstrap samples from which quantification uncertainty can be assessed. We show that the overdispersion arising from RTA can be elegantly estimated by fitting a quasi-Poisson model to the bootstrap counts for each transcript. The technical overdispersion arising from RTA can then be divided out of the transcript counts, leading to scaled counts that can be input for analysis by established gene-level software tools with full statistical efficiency. Comprehensive simulations and test data show that an edgeR analysis of the scaled counts is more powerful and efficient than previous differential transcript expression pipelines while providing correct control of the false discovery rate. Simulations explore a wide range of scenarios including the effects of paired vs single-end reads, different read lengths and different numbers of replicates.


Assuntos
Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Teorema de Bayes , Incerteza , Análise de Sequência de RNA/métodos
6.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37944045

RESUMO

MOTIVATION: The recent development of spatially resolved transcriptomics (SRT) technologies has facilitated research on gene expression in the spatial context. Annotating cell types is one crucial step for downstream analysis. However, many existing algorithms use an unsupervised strategy to assign cell types for SRT data. They first conduct clustering analysis and then aggregate cluster-level expression based on the clustering results. This workflow fails to leverage the marker gene information efficiently. On the other hand, other cell annotation methods designed for single-cell RNA-seq data utilize the cell-type marker genes information but fail to use spatial information in SRT data. RESULTS: We introduce a statistical spatial transcriptomics cell assignment model, SPAN, to annotate clusters of cells or spots into known types in SRT data with prior knowledge of predefined marker genes and spatial information. The SPAN model annotates cells or spots from SRT data using predefined overexpressed marker genes and combines a mixture model with a hidden Markov random field to model the spatial dependency between neighboring spots. We demonstrate the effectiveness of SPAN against spatial and nonspatial clustering algorithms through extensive simulation and real data experiments. AVAILABILITY AND IMPLEMENTATION: https://github.com/ChengZ352/SPAN.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Análise por Conglomerados
7.
Nat Commun ; 14(1): 4760, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37553321

RESUMO

Long-read RNA sequencing (RNA-seq) is a powerful technology for transcriptome analysis, but the relatively low throughput of current long-read sequencing platforms limits transcript coverage. One strategy for overcoming this bottleneck is targeted long-read RNA-seq for preselected gene panels. We present TEQUILA-seq, a versatile, easy-to-implement, and low-cost method for targeted long-read RNA-seq utilizing isothermally linear-amplified capture probes. When performed on the Oxford nanopore platform with multiple gene panels of varying sizes, TEQUILA-seq consistently and substantially enriches transcript coverage while preserving transcript quantification. We profile full-length transcript isoforms of 468 actionable cancer genes across 40 representative breast cancer cell lines. We identify transcript isoforms enriched in specific subtypes and discover novel transcript isoforms in extensively studied cancer genes such as TP53. Among cancer genes, tumor suppressor genes (TSGs) are significantly enriched for aberrant transcript isoforms targeted for degradation via mRNA nonsense-mediated decay, revealing a common RNA-associated mechanism for TSG inactivation. TEQUILA-seq reduces the per-reaction cost of targeted capture by 2-3 orders of magnitude, as compared to a standard commercial solution. TEQUILA-seq can be broadly used for targeted sequencing of full-length transcripts in diverse biomedical research settings.


Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , RNA/genética , Isoformas de Proteínas/genética , Transcriptoma/genética
8.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37507115

RESUMO

Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have been used to describe the distribution of gene expression across cells, a comprehensive assessment of the different models is missing. Moreover, the growing number of features associated with scRNA-seq datasets creates new challenges for analytical accuracy and computing speed. Here, we developed a Python-based package (TensorZINB) to solve the zero-inflated negative binomial (ZINB) model using the TensorFlow deep learning framework. We used a sequential initialization method to solve the numerical stability issues associated with hurdle and zero-inflated models. A recursive feature selection protocol was used to optimize feature selections for data processing and downstream differentially expressed gene (DEG) analysis. We proposed a class of hybrid models combining nested models to further improve the model's performance. Additionally, we developed a new method to convert a continuous distribution to its equivalent discrete form, so that statistical models can be fairly compared. Finally, we showed that the proposed TensorFlow algorithm (TensorZINB) was numerically stable and that its computing speed and performance were superior to those of existing ZINB solvers. Moreover, we implemented seven hurdle and zero-inflated statistical models in Python and systematically assessed their performance using a real scRNA-seq dataset. We demonstrated that the ZINB model achieved the lowest Akaike information criterion compared with other models tested. Taken together, TensorZINB was accurate, efficient and scalable for the implementation of ZINB and for large-scale scRNA-seq data analysis with DEG identification.


Assuntos
Perfilação da Expressão Gênica , Modelos Estatísticos , Distribuição de Poisson , Perfilação da Expressão Gênica/métodos , RNA , Análise de Sequência de RNA/métodos
9.
Methods Mol Biol ; 2691: 279-325, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37355554

RESUMO

Transcriptomic profiling has fundamentally influenced our understanding of cancer pathophysiology and response to therapeutic intervention and has become a relatively routine approach. However, standard protocols are usually low-throughput, single-plex assays and costs are still quite prohibitive. With the evolving complexity of in vitro cell model systems, there is a need for resource-efficient high-throughput approaches that can support detailed time-course analytics, accommodate limited sample availability, and provide the capacity to correlate phenotype to genotype at scale. MAC-seq (multiplexed analysis of cells) is a low-cost, ultrahigh-throughput RNA-seq workflow in plate format to measure cell perturbations and is compatible with high-throughput imaging. Here we describe the steps to perform MAC-seq in 384-well format and apply it to 2D and 3D cell cultures. On average, our experimental conditions identified over ten thousand expressed genes per well when sequenced to a depth of one million reads. We discuss technical aspects, make suggestions on experimental design, and document critical operational procedures. Our protocol highlights the potential to couple MAC-seq with high-throughput screening applications including cell phenotyping using high-content cell imaging.


Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , RNA-Seq/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Perfilação da Expressão Gênica/métodos , Fenótipo , Ensaios de Triagem em Larga Escala/métodos , Análise de Sequência de RNA/métodos
10.
PLoS Biol ; 21(3): e3002007, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36862747

RESUMO

We assess inferential quality in the field of differential expression profiling by high-throughput sequencing (HT-seq) based on analysis of datasets submitted from 2008 to 2020 to the NCBI GEO data repository. We take advantage of the parallel differential expression testing over thousands of genes, whereby each experiment leads to a large set of p-values, the distribution of which can indicate the validity of assumptions behind the test. From a well-behaved p-value set π0, the fraction of genes that are not differentially expressed can be estimated. We found that only 25% of experiments resulted in theoretically expected p-value histogram shapes, although there is a marked improvement over time. Uniform p-value histogram shapes, indicative of <100 actual effects, were extremely few. Furthermore, although many HT-seq workflows assume that most genes are not differentially expressed, 37% of experiments have π0-s of less than 0.5, as if most genes changed their expression level. Most HT-seq experiments have very small sample sizes and are expected to be underpowered. Nevertheless, the estimated π0-s do not have the expected association with N, suggesting widespread problems of experiments with controlling false discovery rate (FDR). Both the fractions of different p-value histogram types and the π0 values are strongly associated with the differential expression analysis program used by the original authors. While we could double the proportion of theoretically expected p-value distributions by removing low-count features from the analysis, this treatment did not remove the association with the analysis program. Taken together, our results indicate widespread bias in the differential expression profiling field and the unreliability of statistical methods used to analyze HT-seq data.


Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Tamanho da Amostra
11.
Cancer Cytopathol ; 131(5): 289-299, 2023 05.
Artigo em Inglês | MEDLINE | ID: mdl-36650408

RESUMO

BACKGROUND: Rather than surgical resection, cytologic specimens are often used as first-line clinical diagnostic procedures due to higher safety, speed, and cost-effectiveness. Archival diagnostic cytology slides containing cancer can be equivalent to tissue biopsies for DNA mutation testing, but the accuracy of transcriptomic profiling by RNA sequencing (RNA-seq) is less understood. METHODS: This study compares the results from whole transcriptome RNA-seq and a targeted RNA-seq assay of stained cytology smears (CS) versus matched tumor tissue samples preserved fresh-frozen (FF) and processed as formalin-fixed paraffin-embedded (FFPE) sections. Cellular cytology scrapes from all 11 breast cancers were fixed and stained using three common protocols: Carnoy's (CS_C) or 95% ethanol (CS_E) fixation and then Papanicolaou stain or air-dried then methanol fixation and DiffQuik stain (CS_DQ). Agreement between samples was assessed using Lin's concordance correlation coefficient. RESULTS: Library yield for CS_DQ was too low, therefore it was not sequenced. The distributions of concordance correlation coefficient of gene expression levels in comparison to FF were comparable between CS_C and CS_E, but expression of genes enriched in stroma was lower in cytosmear samples than in FF or FFPE. Six signatures showed similar concordance to FF for all methods and two were slightly worse in CS_C and CS_E. Genomic signatures were highly concordant using targeted RNA-seq. The allele fraction of selected mutations calculated on cytosmear specimens was highly correlated with FF tissues using both RNA-seq methods. CONCLUSION: RNA can be reliably extracted from cytology smears and is suitable for transcriptome profiling or mutation detection, except for signatures of tumor stroma.


Assuntos
Neoplasias da Mama , Humanos , Feminino , Neoplasias da Mama/diagnóstico , Neoplasias da Mama/genética , Neoplasias da Mama/patologia , Transcriptoma , Fixação de Tecidos/métodos , Formaldeído , RNA/genética , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Inclusão em Parafina/métodos
12.
Methods Mol Biol ; 2584: 57-104, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-36495445

RESUMO

Seq-Well is a high-throughput, picowell-based single-cell RNA-seq technology that can be used to simultaneously profile the transcriptomes of thousands of cells (Gierahn et al. Nat Methods 14(4):395-398, 2017). Relative to its reverse-emulsion-droplet-based counterparts, Seq-Well addresses key cost, portability, and scalability limitations. Recently, we introduced an improved molecular biology for Seq-Well to enhance the information content that can be captured from individual cells using the platform. This update, which we call Seq-Well S3 (S3: Second-Strand Synthesis), incorporates a second-strand-synthesis step after reverse transcription to boost the detection of cellular transcripts normally missed when running the original Seq-Well protocol (Hughes et al. Immunity 53(4):878-894.e7, 2020). This chapter provides details and tips on how to perform Seq-Well S3, along with general pointers on how to subsequently analyze the resultant single-cell RNA-seq data.


Assuntos
Análise de Célula Única , Análise da Expressão Gênica de Célula Única , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos , Transcriptoma , Transcrição Reversa , Perfilação da Expressão Gênica/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos
13.
Brief Bioinform ; 24(1)2023 01 19.
Artigo em Inglês | MEDLINE | ID: mdl-36472568

RESUMO

Accounting for cell type compositions has been very successful at analyzing high-throughput data from heterogeneous tissues. Differential gene expression analysis at cell type level is becoming increasingly popular, yielding biomarker discovery in a finer granularity within a particular cell type. Although several computational methods have been developed to identify cell type-specific differentially expressed genes (csDEG) from RNA-seq data, a systematic evaluation is yet to be performed. Here, we thoroughly benchmark six recently published methods: CellDMC, CARseq, TOAST, LRCDE, CeDAR and TCA, together with two classical methods, csSAM and DESeq2, for a comprehensive comparison. We aim to systematically evaluate the performance of popular csDEG detection methods and provide guidance to researchers. In simulation studies, we benchmark available methods under various scenarios of baseline expression levels, sample sizes, cell type compositions, expression level alterations, technical noises and biological dispersions. Real data analyses of three large datasets on inflammatory bowel disease, lung cancer and autism provide evaluation in both the gene level and the pathway level. We find that csDEG calling is strongly affected by effect size, baseline expression level and cell type compositions. Results imply that csDEG discovery is a challenging task itself, with room to improvements on handling low signal-to-noise ratio and low expression genes.


Assuntos
Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , RNA-Seq , Simulação por Computador , Razão Sinal-Ruído , Análise de Sequência de RNA/métodos
14.
Brief Bioinform ; 25(1)2023 11 22.
Artigo em Inglês | MEDLINE | ID: mdl-38168839

RESUMO

Cell clustering is typically the initial step in single-cell RNA sequencing (scRNA-seq) analyses. The performance of clustering considerably impacts the validity and reproducibility of cell identification. A variety of clustering algorithms have been developed for scRNA-seq data. These algorithms generate cell label sets that assign each cell to a cluster. However, different algorithms usually yield different label sets, which can introduce variations in cell-type identification based on the generated label sets. Currently, the performance of these algorithms has not been systematically evaluated in single-cell transcriptome studies. Herein, we performed a critical assessment of seven state-of-the-art clustering algorithms including four deep learning-based clustering algorithms and commonly used methods Seurat, Cosine-based Tanimoto similarity-refined graph for community detection using Leiden's algorithm (CosTaL) and Single-cell consensus clustering (SC3). We used diverse evaluation indices based on 10 different scRNA-seq benchmarks to systematically evaluate their clustering performance. Our results show that CosTaL, Seurat, Deep Embedding for Single-cell Clustering (DESC) and SC3 consistently outperformed Single-Cell Clustering Assessment Framework and scDeepCluster based on nine effectiveness scores. Notably, CosTaL and DESC demonstrated superior performance in clustering specific cell types. The performance of the single-cell Variational Inference tools varied across different datasets, suggesting its sensitivity to certain dataset characteristics. Notably, DESC exhibited promising results for cell subtype identification and capturing cellular heterogeneity. In addition, SC3 requires more memory and exhibits slower computation speed compared to other algorithms for the same dataset. In sum, this study provides useful guidance for selecting appropriate clustering methods in scRNA-seq data analysis.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Reprodutibilidade dos Testes , Análise de Célula Única/métodos , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos
15.
Genes (Basel) ; 13(12)2022 12 14.
Artigo em Inglês | MEDLINE | ID: mdl-36553629

RESUMO

The ever-growing number of methods for the generation of synthetic bulk and single cell RNA-seq data have multiple and diverse applications. They are often aimed at benchmarking bioinformatics algorithms for purposes such as sample classification, differential expression analysis, correlation and network studies and the optimization of data integration and normalization techniques. Here, we propose a general framework to compare synthetically generated RNA-seq data and select a data-generating tool that is suitable for a set of specific study goals. As there are multiple methods for synthetic RNA-seq data generation, researchers can use the proposed framework to make an informed choice of an RNA-seq data simulation algorithm and software that are best suited for their specific scientific questions of interest.


Assuntos
Algoritmos , Software , RNA-Seq , Análise de Sequência de RNA/métodos , Simulação por Computador
16.
BMC Genomics ; 23(1): 613, 2022 Aug 24.
Artigo em Inglês | MEDLINE | ID: mdl-35999507

RESUMO

DNA and RNA sequencing are widely used techniques to investigate genomic modifications and gene expression. The costs for sequencing dropped dramatically in the last decade. However, due to material and labor intense steps, the sample preparation costs could not keep up with that pace. About 80% of the total costs occur prior to sequencing during DNA/RNA extraction, enrichment steps and subsequent library preparation. In this study, we investigate the potential of pooling different organisms samples prior to DNA/RNA extraction to significantly reduce costs in preparative steps. Similar to the common procedure of ligated DNA tags to pool (c)DNA samples, sequence diversity of different organisms intrinsically provide unique sequences that allow separation of reads after sequencing. With this approach, sample pooling can occur before DNA/RNA isolation and library preparation. We show that pooled sequencing of three related bacterial organisms is possible without loss of data quality at a cost reduction of approx. 50% in DNA- and RNA-seq approaches. Furthermore, we show that this approach is highly efficient down to the level of a shared genus and is, therefore, widely applicable in sequencing facilities and companies with diverse sample pools.


Assuntos
Metagenoma , Metagenômica , DNA/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , RNA/genética , Análise de Sequência de DNA , Análise de Sequência de RNA/métodos
17.
Brief Bioinform ; 23(5)2022 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-35514182

RESUMO

The development of single-cell RNA-sequencing (scRNA-seq) technologies has offered insights into complex biological systems at the single-cell resolution. In particular, these techniques facilitate the identifications of genes showing cell-type-specific differential expressions (DE). In this paper, we introduce MARBLES, a novel statistical model for cross-condition DE gene detection from scRNA-seq data. MARBLES employs a Markov Random Field model to borrow information across similar cell types and utilizes cell-type-specific pseudobulk count to account for sample-level variability. Our simulation results showed that MARBLES is more powerful than existing methods to detect DE genes with an appropriate control of false positive rate. Applications of MARBLES to real data identified novel disease-related DE genes and biological pathways from both a single-cell lipopolysaccharide mouse dataset with 24 381 cells and 11 076 genes and a Parkinson's disease human data set with 76 212 cells and 15 891 genes. Overall, MARBLES is a powerful tool to identify cell-type-specific DE genes across conditions from scRNA-seq data.


Assuntos
Lipopolissacarídeos , Análise de Célula Única , Animais , Perfilação da Expressão Gênica/métodos , Humanos , Camundongos , RNA/genética , RNA-Seq , Análise de Sequência de RNA/métodos , Análise de Célula Única/métodos
18.
J Med Virol ; 94(1): 327-334, 2022 01.
Artigo em Inglês | MEDLINE | ID: mdl-34524690

RESUMO

Genomic surveillance of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) plays an important role in COVID-19 pandemic control and elimination efforts, especially by elucidating its global transmission network and illustrating its viral evolution. The deployment of multiplex PCR assays that target SARS-CoV-2 followed by either massively parallel or nanopore sequencing is a widely-used strategy to obtain genome sequences from primary samples. However, multiplex PCR-based sequencing carries an inherent bias of sequencing depth among different amplicons, which may cause uneven coverage. Here we developed a two-pool, long-amplicon 36-plex PCR primer panel with ~1000-bp amplicon lengths for full-genome sequencing of SARS-CoV-2. We validated the panel by assessing nasopharyngeal swab samples with a <30 quantitative reverse transcription PCR cycle threshold value and found that ≥90% of viral genomes could be covered with high sequencing depths (≥20% mean depth). In comparison, the widely-used ARTIC panel yielded 79%-88% high-depth genome regions. We estimated that ~5 Mbp nanopore sequencing data may ensure a >95% viral genome coverage with a ≥10-fold depth and may generate reliable genomes at consensus sequence levels. Nanopore sequencing yielded false-positive variations with frequencies of supporting reads <0.8, and the sequencing errors mostly occurred on the 5' or 3' ends of reads. Thus, nanopore sequencing could not elucidate intra-host viral diversity.


Assuntos
Genoma Viral/genética , Reação em Cadeia da Polimerase Multiplex/métodos , Sequenciamento por Nanoporos/métodos , SARS-CoV-2/genética , Sequenciamento Completo do Genoma/métodos , COVID-19 , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos , Nasofaringe/virologia , RNA Viral/genética , Análise de Sequência de RNA/métodos
19.
PLoS Genet ; 17(9): e1009821, 2021 09.
Artigo em Inglês | MEDLINE | ID: mdl-34570751

RESUMO

RNA sequencing techniques have enabled the systematic elucidation of gene expression (RNA-Seq), transcription start sites (differential RNA-Seq), transcript 3' ends (Term-Seq), and post-transcriptional processes (ribosome profiling). The main challenge of transcriptomic studies is to remove ribosomal RNAs (rRNAs), which comprise more than 90% of the total RNA in a cell. Here, we report a low-cost and robust bacterial rRNA depletion method, RiboRid, based on the enzymatic degradation of rRNA by thermostable RNase H. This method implemented experimental considerations to minimize nonspecific degradation of mRNA and is capable of depleting pre-rRNAs that often comprise a large portion of RNA, even after rRNA depletion. We demonstrated the highly efficient removal of rRNA up to a removal efficiency of 99.99% for various transcriptome studies, including RNA-Seq, Term-Seq, and ribosome profiling, with a cost of approximately $10 per sample. This method is expected to be a robust method for large-scale high-throughput bacterial transcriptomic studies.


Assuntos
Bactérias/genética , Custos e Análise de Custo , RNA Bacteriano/isolamento & purificação , RNA Ribossômico/isolamento & purificação , Transcriptoma , RNA Bacteriano/genética , RNA Ribossômico/genética , Análise de Sequência de RNA/métodos
20.
J Mol Diagn ; 23(10): 1269-1278, 2021 10.
Artigo em Inglês | MEDLINE | ID: mdl-34325058

RESUMO

Alterations in the BCOR gene, including internal tandem duplications (ITDs) of exon 15 have emerged as important oncogenic changes that define several diagnostic entities. In pediatric cancers, BCOR ITDs have recurrently been described in clear cell sarcoma of kidney (CCSK), primitive myxoid mesenchymal tumor of infancy (PMMTI), and central nervous system high-grade neuroepithelial tumor with BCOR ITD in exon 15 (HGNET-BCOR ITDex15). In adults, BCOR ITDs are also reported in endometrial and other sarcomas. The utility of multiplex targeted RNA sequencing for the identification of BCOR ITD in pediatric cancers was investigated. All available archival cases of CCSK, PMMTI, and HGNET-BCOR ITDex15 were collected. Each case underwent anchored multiplex PCR library preparation with a custom-designed panel, with BCOR targeted for both fusions and ITDs. BCOR ITD was detected in all cases across three histologic subtypes using the RNA panel, with no other fusions identified in any of the cases. All BCOR ITDs occurred in the final exon, within 16 codons from the stop sequence. Multiplex targeted RNA sequencing from formalin-fixed, paraffin-embedded tissue is successful at identifying BCOR internal tandem duplications. This analysis supports the use of anchored multiplex PCR targeted RNA next-generation sequencing panels for identification of BCOR ITDs in pediatric tumors. The use of post-analytic algorithms to improve the detection of BCOR ITD using DNA panels was also explored.


Assuntos
Neoplasias Encefálicas/genética , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Neoplasias Renais/genética , Neoplasias Neuroepiteliomatosas/genética , Proteínas Proto-Oncogênicas/genética , Proteínas Repressoras/genética , Sarcoma de Células Claras/genética , Análise de Sequência de RNA/métodos , Neoplasias de Tecidos Moles/genética , Sequências de Repetição em Tandem/genética , Criança , Pré-Escolar , Códon/genética , Éxons , Feminino , Humanos , Lactente , Masculino , Reação em Cadeia da Polimerase Multiplex/métodos , Oncogenes , Reprodutibilidade dos Testes
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA