Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 361
Filtrar
Mais filtros

Intervalo de ano de publicação
1.
Bioinformatics ; 40(Supplement_1): i446-i452, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940162

RESUMO

BACKGROUND: Charting cellular trajectories over gene expression is key to understanding dynamic cellular processes and their underlying mechanisms. While advances in single-cell RNA-sequencing technologies and computational methods have pushed forward the recovery of such trajectories, trajectory inference remains a challenge due to the noisy, sparse, and high-dimensional nature of single-cell data. This challenge can be alleviated by increasing either the number of cells sampled along the trajectory (breadth) or the sequencing depth, i.e. the number of reads captured per cell (depth). Generally, these two factors are coupled due to an inherent breadth-depth tradeoff that arises when the sequencing budget is constrained due to financial or technical limitations. RESULTS: Here we study the optimal allocation of a fixed sequencing budget to optimize the recovery of trajectory attributes. Empirical results reveal that reconstruction accuracy of internal cell structure in expression space scales with the logarithm of either the breadth or depth of sequencing. We additionally observe a power law relationship between the optimal number of sampled cells and the corresponding sequencing budget. For linear trajectories, non-monotonicity in trajectory reconstruction across the breadth-depth tradeoff can impact downstream inference, such as expression pattern analysis along the trajectory. We demonstrate these results for five single-cell RNA-sequencing datasets encompassing differentiation of embryonic stem cells, pancreatic beta cells, hepatoblast and multipotent hematopoietic cells, as well as induced reprogramming of embryonic fibroblasts into neurons. By addressing the challenges of single-cell data, our study offers insights into maximizing the efficiency of cellular trajectory analysis through strategic allocation of sequencing resources.


Assuntos
Análise de Célula Única , Análise de Célula Única/métodos , Análise de Sequência de RNA/métodos , Humanos , Animais , Sequenciamento de Nucleotídeos em Larga Escala/métodos
2.
Nat Methods ; 21(7): 1349-1363, 2024 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-38849569

RESUMO

The Long-read RNA-Seq Genome Annotation Assessment Project Consortium was formed to evaluate the effectiveness of long-read approaches for transcriptome analysis. Using different protocols and sequencing platforms, the consortium generated over 427 million long-read sequences from complementary DNA and direct RNA datasets, encompassing human, mouse and manatee species. Developers utilized these data to address challenges in transcript isoform detection, quantification and de novo transcript detection. The study revealed that libraries with longer, more accurate sequences produce more accurate transcripts than those with increased read depth, whereas greater read depth improved quantification accuracy. In well-annotated genomes, tools based on reference sequences demonstrated the best performance. Incorporating additional orthogonal data and replicate samples is advised when aiming to detect rare and novel transcripts or using reference-free approaches. This collaborative study offers a benchmark for current practices and provides direction for future method development in transcriptome analysis.


Assuntos
Perfilação da Expressão Gênica , RNA-Seq , Humanos , Animais , Camundongos , RNA-Seq/métodos , Perfilação da Expressão Gênica/métodos , Transcriptoma , Análise de Sequência de RNA/métodos , Anotação de Sequência Molecular/métodos
3.
Nat Commun ; 15(1): 3972, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730241

RESUMO

The advancement of Long-Read Sequencing (LRS) techniques has significantly increased the length of sequencing to several kilobases, thereby facilitating the identification of alternative splicing events and isoform expressions. Recently, numerous computational tools for isoform detection using long-read sequencing data have been developed. Nevertheless, there remains a deficiency in comparative studies that systemically evaluate the performance of these tools, which are implemented with different algorithms, under various simulations that encompass potential influencing factors. In this study, we conducted a benchmark analysis of thirteen methods implemented in nine tools capable of identifying isoform structures from long-read RNA-seq data. We evaluated their performances using simulated data, which represented diverse sequencing platforms generated by an in-house simulator, RNA sequins (sequencing spike-ins) data, as well as experimental data. Our findings demonstrate IsoQuant as a highly effective tool for isoform detection with LRS, with Bambu and StringTie2 also exhibiting strong performance. These results offer valuable guidance for future research on alternative splicing analysis and the ongoing improvement of tools for isoform detection using LRS data.


Assuntos
Algoritmos , Processamento Alternativo , RNA Mensageiro , Análise de Sequência de RNA , Humanos , RNA Mensageiro/genética , RNA Mensageiro/análise , Análise de Sequência de RNA/métodos , Isoformas de RNA/genética , Software , Biologia Computacional/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Isoformas de Proteínas/genética
4.
BMC Genomics ; 25(1): 455, 2024 May 08.
Artigo em Inglês | MEDLINE | ID: mdl-38720252

RESUMO

BACKGROUND: Standard ChIP-seq and RNA-seq processing pipelines typically disregard sequencing reads whose origin is ambiguous ("multimappers"). This usual practice has potentially important consequences for the functional interpretation of the data: genomic elements belonging to clusters composed of highly similar members are left unexplored. RESULTS: In particular, disregarding multimappers leads to the underrepresentation in epigenetic studies of recently active transposable elements, such as AluYa5, L1HS and SVAs. Furthermore, this common strategy also has implications for transcriptomic analysis: members of repetitive gene families, such the ones including major histocompatibility complex (MHC) class I and II genes, are under-quantified. CONCLUSION: Revealing inherent biases that permeate routine tasks such as functional enrichment analysis, our results underscore the urgency of broadly adopting multimapper-aware bioinformatic pipelines -currently restricted to specific contexts or communities- to ensure the reliability of genomic and transcriptomic studies.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Elementos de DNA Transponíveis/genética , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Genômica/métodos , Análise de Sequência de RNA/métodos
5.
Nat Commun ; 15(1): 3946, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38729950

RESUMO

Disease modeling with isogenic Induced Pluripotent Stem Cell (iPSC)-differentiated organoids serves as a powerful technique for studying disease mechanisms. Multiplexed coculture is crucial to mitigate batch effects when studying the genetic effects of disease-causing variants in differentiated iPSCs or organoids, and demultiplexing at the single-cell level can be conveniently achieved by assessing natural genetic barcodes. Here, to enable cost-efficient time-series experimental designs via multiplexed bulk and single-cell RNA-seq of hybrids, we introduce a computational method in our Vireo Suite, Vireo-bulk, to effectively deconvolve pooled bulk RNA-seq data by genotype reference, and thereby quantify donor abundance over the course of differentiation and identify differentially expressed genes among donors. Furthermore, with multiplexed scRNA-seq and bulk RNA-seq, we demonstrate the usefulness and necessity of a pooled design to reveal donor iPSC line heterogeneity during macrophage cell differentiation and to model rare WT1 mutation-driven kidney disease with chimeric organoids. Our work provides an experimental and analytic pipeline for dissecting disease mechanisms with chimeric organoids.


Assuntos
Diferenciação Celular , Células-Tronco Pluripotentes Induzidas , Organoides , RNA-Seq , Análise de Célula Única , Organoides/metabolismo , Análise de Célula Única/métodos , Células-Tronco Pluripotentes Induzidas/metabolismo , Células-Tronco Pluripotentes Induzidas/citologia , Humanos , Diferenciação Celular/genética , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Macrófagos/metabolismo , Macrófagos/citologia , Animais , Análise da Expressão Gênica de Célula Única
6.
RNA Biol ; 21(1): 1-13, 2024 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-38797889

RESUMO

Although circular RNAs (circRNAs) play important roles in regulating gene expression, the understanding of circRNAs in livestock animals is scarce due to the significant challenge to characterize them from a biological sample. In this study, we assessed the outcomes of bovine circRNA identification using six enrichment approaches with the combination of ribosomal RNAs removal (Ribo); linear RNAs degradation (R); linear RNAs and RNAs with structured 3' ends degradation (RTP); ribosomal RNAs coupled with linear RNAs elimination (Ribo-R); ribosomal RNA, linear RNAs and RNAs with poly (A) tailing elimination (Ribo-RP); and ribosomal RNA, linear RNAs and RNAs with structured 3' ends elimination (Ribo-RTP), respectively. RNA-sequencing analysis revealed that different approaches led to varied ratio of uniquely mapped reads, false-positive rate of identifying circRNAs, and the number of circRNAs per million clean reads (Padj <0.05). Out of 2,285 and 2,939 highly confident circRNAs identified in liver and rumen tissues, respectively, 308 and 260 were commonly identified from five methods, with Ribo-RTP method identified the highest number of circRNAs. Besides, 507 of 4,051 identified bovine highly confident circRNAs had shared splicing sites with human circRNAs. The findings from this work provide optimized methods to identify bovine circRNAs from cattle tissues for downstream research of their biological roles in cattle.


Assuntos
RNA Circular , Bovinos , RNA Circular/genética , Animais , RNA Ribossômico/genética , Análise de Sequência de RNA/métodos , Fígado/metabolismo , Rúmen/metabolismo , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Humanos
7.
JCO Glob Oncol ; 10: e2300269, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38754050

RESUMO

PURPOSE: Molecular characterization is key to optimally diagnose and manage cancer. The complexity and cost of routine genomic analysis have unfortunately limited its use and denied many patients access to precision medicine. A possible solution is to rationalize use-creating a tiered approach to testing which uses inexpensive techniques for most patients and limits expensive testing to patients with the highest needs. Here, we tested the utility of this approach to molecularly characterize pediatric glioma in a cost- and time-sensitive manner. METHODS: We used a tiered testing pipeline of immunohistochemistry (IHC), customized fusion panels or fluorescence in situ hybridization (FISH), and targeted RNA sequencing in pediatric gliomas. Two distinct diagnostic algorithms were used for low- and high-grade gliomas (LGGs and HGGs). The percentage of driver alterations identified, associated testing costs, and turnaround time (TAT) are reported. RESULTS: The tiered approach successfully characterized 96% (95 of 99) of gliomas. For 82 LGGs, IHC, targeted fusion panel or FISH, and targeted RNA sequencing solved 35% (29 of 82), 29% (24 of 82), and 30% (25 of 82) of cases, respectively. A total of 64% (53 of 82) of samples were characterized without targeted RNA sequencing. Of 17 HGG samples, 13 were characterized by IHC and four were characterized by targeted RNA sequencing. The average cost per sample was more affordable when using the tiered approach as compared with up-front targeted RNA sequencing in LGG ($405 US dollars [USD] v $745 USD) and HGGs ($282 USD v $745 USD). The average TAT per sample was also shorter using the tiered approach (10 days for LGG, 5 days for HGG v 14 days for targeted RNA sequencing). CONCLUSION: Our tiered approach molecularly characterized 96% of samples in a cost- and time-sensitive manner. Such an approach may be feasible in neuro-oncology centers worldwide, particularly in resource-limited settings.


Assuntos
Glioma , Humanos , Glioma/genética , Glioma/diagnóstico , Glioma/patologia , Criança , Masculino , Pré-Escolar , Feminino , Adolescente , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/patologia , Neoplasias Encefálicas/economia , Neoplasias Encefálicas/diagnóstico , Hibridização in Situ Fluorescente/economia , Lactente , Imuno-Histoquímica/economia , Recursos em Saúde/economia , Análise de Sequência de RNA/economia , Região de Recursos Limitados
8.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38628114

RESUMO

Spatial transcriptomics (ST) has become a powerful tool for exploring the spatial organization of gene expression in tissues. Imaging-based methods, though offering superior spatial resolutions at the single-cell level, are limited in either the number of imaged genes or the sensitivity of gene detection. Existing approaches for enhancing ST rely on the similarity between ST cells and reference single-cell RNA sequencing (scRNA-seq) cells. In contrast, we introduce stDiff, which leverages relationships between gene expression abundance in scRNA-seq data to enhance ST. stDiff employs a conditional diffusion model, capturing gene expression abundance relationships in scRNA-seq data through two Markov processes: one introducing noise to transcriptomics data and the other denoising to recover them. The missing portion of ST is predicted by incorporating the original ST data into the denoising process. In our comprehensive performance evaluation across 16 datasets, utilizing multiple clustering and similarity metrics, stDiff stands out for its exceptional ability to preserve topological structures among cells, positioning itself as a robust solution for cell population identification. Moreover, stDiff's enhancement outcomes closely mirror the actual ST data within the batch space. Across diverse spatial expression patterns, our model accurately reconstructs them, delineating distinct spatial boundaries. This highlights stDiff's capability to unify the observed and predicted segments of ST data for subsequent analysis. We anticipate that stDiff, with its innovative approach, will contribute to advancing ST imputation methodologies.


Assuntos
Benchmarking , Perfilação da Expressão Gênica , Análise por Conglomerados , Difusão , Cadeias de Markov , Análise de Sequência de RNA , Transcriptoma
9.
Funct Integr Genomics ; 24(2): 56, 2024 Mar 13.
Artigo em Inglês | MEDLINE | ID: mdl-38472459

RESUMO

Bladder cancer is a malignancy characterized by significant heterogeneity. RNA methylation has received an increasing amount of attention in recent years. RNA data were collected from the GEO database, and cell subsets were classified according to specific cell markers. Epithelial, immunological, and fibroblast cells were clustered individually to explore the tumor heterogeneity. To distinguish between malignant and benign cells, the InferCNV R package was employed. The monocle2 R package was used for pseudotime analysis. The Decouple R package was used for transcription factor analysis of each cell subgroup, and PROGENy was used to predict the activity of pathways related to tumors. The target lncRNA was screened for model construction. In addition, the qPCR experiment was used to detect the transcription level of lncRNA. Epithelial cells, fibroblasts, and T cells significantly differ in tumor and normal tissues. The lncRNAs related to m6A/m5C/m1A were intersected to construct the model. Finally, six model lncRNAs (PSMB8-AS1, THUMPD3-AS1, U47924.27, XXbac-B135H6.15, MIR99AHG, and C14orf132) were screened. High-risk individuals were shown to have a better prognosis. qPCR experiments showed that the model lncRNA was differentially expressed between normal and tumor cells. Immunotherapy will be more effective in treating individuals with lower risk than those with higher risk using 4 candidate drugs. The prognostic m6A/m5C/m1A-related lncRNA model was constructed for evaluating the clinical outcomes of bladder cancer patients and guiding clinical medication.


Assuntos
RNA Longo não Codificante , Neoplasias da Bexiga Urinária , Humanos , Prognóstico , Metilação de RNA , Imunoterapia , Análise de Sequência de RNA
10.
Science ; 383(6690): 1398, 2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38547270
12.
Nucleic Acids Res ; 52(3): e13, 2024 Feb 09.
Artigo em Inglês | MEDLINE | ID: mdl-38059347

RESUMO

Differential expression analysis of RNA-seq is one of the most commonly performed bioinformatics analyses. Transcript-level quantifications are inherently more uncertain than gene-level read counts because of ambiguous assignment of sequence reads to transcripts. While sequence reads can usually be assigned unambiguously to a gene, reads are very often compatible with multiple transcripts for that gene, particularly for genes with many isoforms. Software tools designed for gene-level differential expression do not perform optimally on transcript counts because the read-to-transcript ambiguity (RTA) disrupts the mean-variance relationship normally observed for gene level RNA-seq data and interferes with the efficiency of the empirical Bayes dispersion estimation procedures. The pseudoaligners kallisto and Salmon provide bootstrap samples from which quantification uncertainty can be assessed. We show that the overdispersion arising from RTA can be elegantly estimated by fitting a quasi-Poisson model to the bootstrap counts for each transcript. The technical overdispersion arising from RTA can then be divided out of the transcript counts, leading to scaled counts that can be input for analysis by established gene-level software tools with full statistical efficiency. Comprehensive simulations and test data show that an edgeR analysis of the scaled counts is more powerful and efficient than previous differential transcript expression pipelines while providing correct control of the false discovery rate. Simulations explore a wide range of scenarios including the effects of paired vs single-end reads, different read lengths and different numbers of replicates.


Assuntos
Perfilação da Expressão Gênica , Software , Perfilação da Expressão Gênica/métodos , Teorema de Bayes , Incerteza , Análise de Sequência de RNA/métodos
13.
BMC Genomics ; 24(1): 777, 2023 Dec 15.
Artigo em Inglês | MEDLINE | ID: mdl-38102591

RESUMO

RNA-Seq analysis of Formalin-Fixed and Paraffin-Embedded (FFPE) samples has emerged as a highly effective approach and is increasingly being used in clinical research and drug development. However, the processing and storage of FFPE samples are known to cause extensive degradation of RNAs, which limits the discovery of gene expression or gene fusion-based biomarkers using RNA sequencing, particularly methods reliant on Poly(A) enrichment. Recently, researchers have developed an exome targeted RNA-Seq methodology that utilizes biotinylated oligonucleotide probes to enrich RNA transcripts of interest, which could overcome these limitations. Nevertheless, the standardization of this experimental framework, including probe designs, sample multiplexing, sequencing read length, and bioinformatic pipelines, remains an essential requirement. In this study, we conducted a comprehensive comparison of three main commercially available exome capture kits and evaluated key experimental parameters, to provide the overview of the advantages and limitations associated with the selection of library preparation protocols and sequencing platforms. The results provide valuable insights into the best practices for obtaining high-quality data from FFPE samples.


Assuntos
Exoma , Formaldeído , Perfilação da Expressão Gênica/métodos , Parafina , Inclusão em Parafina/métodos , RNA/genética , Análise de Sequência de RNA , Fixação de Tecidos/métodos
14.
Bioinformatics ; 39(11)2023 11 01.
Artigo em Inglês | MEDLINE | ID: mdl-37944045

RESUMO

MOTIVATION: The recent development of spatially resolved transcriptomics (SRT) technologies has facilitated research on gene expression in the spatial context. Annotating cell types is one crucial step for downstream analysis. However, many existing algorithms use an unsupervised strategy to assign cell types for SRT data. They first conduct clustering analysis and then aggregate cluster-level expression based on the clustering results. This workflow fails to leverage the marker gene information efficiently. On the other hand, other cell annotation methods designed for single-cell RNA-seq data utilize the cell-type marker genes information but fail to use spatial information in SRT data. RESULTS: We introduce a statistical spatial transcriptomics cell assignment model, SPAN, to annotate clusters of cells or spots into known types in SRT data with prior knowledge of predefined marker genes and spatial information. The SPAN model annotates cells or spots from SRT data using predefined overexpressed marker genes and combines a mixture model with a hidden Markov random field to model the spatial dependency between neighboring spots. We demonstrate the effectiveness of SPAN against spatial and nonspatial clustering algorithms through extensive simulation and real data experiments. AVAILABILITY AND IMPLEMENTATION: https://github.com/ChengZ352/SPAN.


Assuntos
Análise de Célula Única , Transcriptoma , Análise de Sequência de RNA/métodos , Perfilação da Expressão Gênica/métodos , Algoritmos , Análise por Conglomerados
15.
Nat Commun ; 14(1): 4760, 2023 08 08.
Artigo em Inglês | MEDLINE | ID: mdl-37553321

RESUMO

Long-read RNA sequencing (RNA-seq) is a powerful technology for transcriptome analysis, but the relatively low throughput of current long-read sequencing platforms limits transcript coverage. One strategy for overcoming this bottleneck is targeted long-read RNA-seq for preselected gene panels. We present TEQUILA-seq, a versatile, easy-to-implement, and low-cost method for targeted long-read RNA-seq utilizing isothermally linear-amplified capture probes. When performed on the Oxford nanopore platform with multiple gene panels of varying sizes, TEQUILA-seq consistently and substantially enriches transcript coverage while preserving transcript quantification. We profile full-length transcript isoforms of 468 actionable cancer genes across 40 representative breast cancer cell lines. We identify transcript isoforms enriched in specific subtypes and discover novel transcript isoforms in extensively studied cancer genes such as TP53. Among cancer genes, tumor suppressor genes (TSGs) are significantly enriched for aberrant transcript isoforms targeted for degradation via mRNA nonsense-mediated decay, revealing a common RNA-associated mechanism for TSG inactivation. TEQUILA-seq reduces the per-reaction cost of targeted capture by 2-3 orders of magnitude, as compared to a standard commercial solution. TEQUILA-seq can be broadly used for targeted sequencing of full-length transcripts in diverse biomedical research settings.


Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , RNA/genética , Isoformas de Proteínas/genética , Transcriptoma/genética
16.
Circulation ; 148(9): 778-797, 2023 08 29.
Artigo em Inglês | MEDLINE | ID: mdl-37427428

RESUMO

BACKGROUND: Cardiac fibroblasts have crucial roles in the heart. In particular, fibroblasts differentiate into myofibroblasts in the damaged myocardium, contributing to scar formation and interstitial fibrosis. Fibrosis is associated with heart dysfunction and failure. Myofibroblasts therefore represent attractive therapeutic targets. However, the lack of myofibroblast-specific markers has precluded the development of targeted therapies. In this context, most of the noncoding genome is transcribed into long noncoding RNAs (lncRNAs). A number of lncRNAs have pivotal functions in the cardiovascular system. lncRNAs are globally more cell-specific than protein-coding genes, supporting their importance as key determinants of cell identity. METHODS: In this study, we evaluated the value of the lncRNA transcriptome in very deep single-cell RNA sequencing. We profiled the lncRNA transcriptome in cardiac nonmyocyte cells after infarction and probed heterogeneity in the fibroblast and myofibroblast populations. In addition, we searched for subpopulation-specific markers that can constitute novel targets in therapy for heart disease. RESULTS: We demonstrated that cardiac cell identity can be defined by the sole expression of lncRNAs in single-cell experiments. In this analysis, we identified lncRNAs enriched in relevant myofibroblast subpopulations. Selecting 1 candidate we named FIXER (fibrogenic LOX-locus enhancer RNA), we showed that its silencing limits fibrosis and improves heart function after infarction. Mechanitically, FIXER interacts with CBX4, an E3 SUMO protein ligase and transcription factor, guiding CBX4 to the promoter of the transcription factor RUNX1 to control its expression and, consequently, the expression of a fibrogenic gene program.. FIXER is conserved in humans, supporting its translational value. CONCLUSIONS: Our results demonstrated that lncRNA expression is sufficient to identify the various cell types composing the mammalian heart. Focusing on cardiac fibroblasts and their derivatives, we identified lncRNAs uniquely expressed in myofibroblasts. In particular, the lncRNA FIXER represents a novel therapeutic target for cardiac fibrosis.


Assuntos
Cardiomiopatias , RNA Longo não Codificante , Animais , Humanos , Transcriptoma , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Cardiomiopatias/genética , Fibrose , Análise de Sequência de RNA , Fatores de Transcrição/genética , Infarto , Mamíferos/genética , Mamíferos/metabolismo , Ligases/genética , Ligases/metabolismo , Proteínas do Grupo Polycomb/genética , Proteínas do Grupo Polycomb/metabolismo
17.
Brief Bioinform ; 24(5)2023 09 20.
Artigo em Inglês | MEDLINE | ID: mdl-37507115

RESUMO

Single cell RNA-sequencing (scRNA-seq) technology has significantly advanced the understanding of transcriptomic signatures. Although various statistical models have been used to describe the distribution of gene expression across cells, a comprehensive assessment of the different models is missing. Moreover, the growing number of features associated with scRNA-seq datasets creates new challenges for analytical accuracy and computing speed. Here, we developed a Python-based package (TensorZINB) to solve the zero-inflated negative binomial (ZINB) model using the TensorFlow deep learning framework. We used a sequential initialization method to solve the numerical stability issues associated with hurdle and zero-inflated models. A recursive feature selection protocol was used to optimize feature selections for data processing and downstream differentially expressed gene (DEG) analysis. We proposed a class of hybrid models combining nested models to further improve the model's performance. Additionally, we developed a new method to convert a continuous distribution to its equivalent discrete form, so that statistical models can be fairly compared. Finally, we showed that the proposed TensorFlow algorithm (TensorZINB) was numerically stable and that its computing speed and performance were superior to those of existing ZINB solvers. Moreover, we implemented seven hurdle and zero-inflated statistical models in Python and systematically assessed their performance using a real scRNA-seq dataset. We demonstrated that the ZINB model achieved the lowest Akaike information criterion compared with other models tested. Taken together, TensorZINB was accurate, efficient and scalable for the implementation of ZINB and for large-scale scRNA-seq data analysis with DEG identification.


Assuntos
Perfilação da Expressão Gênica , Modelos Estatísticos , Distribuição de Poisson , Perfilação da Expressão Gênica/métodos , RNA , Análise de Sequência de RNA/métodos
18.
Methods Mol Biol ; 2691: 279-325, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37355554

RESUMO

Transcriptomic profiling has fundamentally influenced our understanding of cancer pathophysiology and response to therapeutic intervention and has become a relatively routine approach. However, standard protocols are usually low-throughput, single-plex assays and costs are still quite prohibitive. With the evolving complexity of in vitro cell model systems, there is a need for resource-efficient high-throughput approaches that can support detailed time-course analytics, accommodate limited sample availability, and provide the capacity to correlate phenotype to genotype at scale. MAC-seq (multiplexed analysis of cells) is a low-cost, ultrahigh-throughput RNA-seq workflow in plate format to measure cell perturbations and is compatible with high-throughput imaging. Here we describe the steps to perform MAC-seq in 384-well format and apply it to 2D and 3D cell cultures. On average, our experimental conditions identified over ten thousand expressed genes per well when sequenced to a depth of one million reads. We discuss technical aspects, make suggestions on experimental design, and document critical operational procedures. Our protocol highlights the potential to couple MAC-seq with high-throughput screening applications including cell phenotyping using high-content cell imaging.


Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , RNA-Seq/métodos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Perfilação da Expressão Gênica/métodos , Fenótipo , Ensaios de Triagem em Larga Escala/métodos , Análise de Sequência de RNA/métodos
19.
Bioinformatics ; 39(3)2023 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-36857576

RESUMO

MOTIVATION: The increasing availability of RNA structural information that spans many kilobases of transcript sequence imposes a need for tools that can rapidly screen, identify, and prioritize structural modules of interest. RESULTS: We describe RNA Structural Content Scanner (RSCanner), an automated tool that scans RNA transcripts for regions that contain high levels of secondary structure and then classifies each region for its relative propensity to adopt stable or dynamic structures. RSCanner then generates an intuitive heatmap enabling users to rapidly pinpoint regions likely to contain a high or low density of discrete RNA structures, thereby informing downstream functional or structural investigation. AVAILABILITY AND IMPLEMENTATION: RSCanner is freely available as both R script and R Markdown files, along with full documentation and test data (https://github.com/pylelab/RSCanner).


Assuntos
RNA , Software , Estrutura Secundária de Proteína , Documentação , Análise de Sequência de RNA
20.
PLoS Biol ; 21(3): e3002007, 2023 03.
Artigo em Inglês | MEDLINE | ID: mdl-36862747

RESUMO

We assess inferential quality in the field of differential expression profiling by high-throughput sequencing (HT-seq) based on analysis of datasets submitted from 2008 to 2020 to the NCBI GEO data repository. We take advantage of the parallel differential expression testing over thousands of genes, whereby each experiment leads to a large set of p-values, the distribution of which can indicate the validity of assumptions behind the test. From a well-behaved p-value set π0, the fraction of genes that are not differentially expressed can be estimated. We found that only 25% of experiments resulted in theoretically expected p-value histogram shapes, although there is a marked improvement over time. Uniform p-value histogram shapes, indicative of <100 actual effects, were extremely few. Furthermore, although many HT-seq workflows assume that most genes are not differentially expressed, 37% of experiments have π0-s of less than 0.5, as if most genes changed their expression level. Most HT-seq experiments have very small sample sizes and are expected to be underpowered. Nevertheless, the estimated π0-s do not have the expected association with N, suggesting widespread problems of experiments with controlling false discovery rate (FDR). Both the fractions of different p-value histogram types and the π0 values are strongly associated with the differential expression analysis program used by the original authors. While we could double the proportion of theoretically expected p-value distributions by removing low-count features from the analysis, this treatment did not remove the association with the analysis program. Taken together, our results indicate widespread bias in the differential expression profiling field and the unreliability of statistical methods used to analyze HT-seq data.


Assuntos
Perfilação da Expressão Gênica , Sequenciamento de Nucleotídeos em Larga Escala , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Tamanho da Amostra
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA