Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 5 de 5
Filter
Add more filters










Database
Language
Publication year range
1.
Genome Biol ; 25(1): 159, 2024 06 17.
Article in English | MEDLINE | ID: mdl-38886757

ABSTRACT

BACKGROUND: The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? RESULTS: Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. CONCLUSIONS: Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.


Subject(s)
RNA-Seq , Single-Cell Gene Expression Analysis , Animals , Humans , Cluster Analysis , Computational Biology/methods , Machine Learning , RNA-Seq/methods , Sequence Analysis, RNA/methods , Supervised Machine Learning
2.
Pac Symp Biocomput ; 25: 238-249, 2020.
Article in English | MEDLINE | ID: mdl-31797600

ABSTRACT

Mutational signatures are patterns of mutation types, many of which are linked to known mutagenic processes. Signature activity represents the proportion of mutations a signature generates. In cancer, cells may gain advantageous phenotypes through mutation accumulation, causing rapid growth of that subpopulation within the tumour. The presence of many subclones can make cancers harder to treat and have other clinical implications. Reconstructing changes in signature activities can give insight into the evolution of cells within a tumour. Recently, we introduced a new method, TrackSig, to detect changes in signature activities across time from single bulk tumour sample. By design, TrackSig is unable to identify mutation populations with different frequencies but little to no difference in signature activity. Here we present an extension of this method, TrackSigFreq, which enables trajectory reconstruction based on both observed density of mutation frequencies and changes in mutational signature activities. TrackSigFreq preserves the advantages of TrackSig, namely optimal and rapid mutation clustering through segmentation, while extending it so that it can identify distinct mutation populations that share similar signature activities.


Subject(s)
Genome, Human , Neoplasms , Computational Biology , Gene Frequency , Humans , Mutation , Neoplasms/genetics
3.
Nat Commun ; 8(1): 12, 2017 04 11.
Article in English | MEDLINE | ID: mdl-28400552

ABSTRACT

RNA-binding proteins play a key role in shaping gene expression profiles during stress, however, little is known about the dynamic nature of these interactions and how this influences the kinetics of gene expression. To address this, we developed kinetic cross-linking and analysis of cDNAs (χCRAC), an ultraviolet cross-linking method that enabled us to quantitatively measure the dynamics of protein-RNA interactions in vivo on a minute time-scale. Here, using χCRAC we measure the global RNA-binding dynamics of the yeast transcription termination factor Nab3 in response to glucose starvation. These measurements reveal rapid changes in protein-RNA interactions within 1 min following stress imposition. Changes in Nab3 binding are largely independent of alterations in transcription rate during the early stages of stress response, indicating orthogonal transcriptional control mechanisms. We also uncover a function for Nab3 in dampening expression of stress-responsive genes. χCRAC has the potential to greatly enhance our understanding of in vivo dynamics of protein-RNA interactions.Protein RNA interactions are dynamic and regulated in response to environmental changes. Here the authors describe 'kinetic CRAC', an approach that allows time resolved analyses of protein RNA interactions with minute time point resolution and apply it to gain insight into the function of the RNA-binding protein Nab3.


Subject(s)
Gene Expression Regulation, Fungal , Nuclear Proteins/genetics , RNA, Fungal/genetics , RNA-Binding Proteins/genetics , Saccharomyces cerevisiae Proteins/genetics , Saccharomyces cerevisiae/genetics , Transcriptome , Culture Media/pharmacology , DNA, Complementary/genetics , DNA, Complementary/metabolism , Gene Expression Profiling , Glucose/deficiency , Kinetics , Nuclear Proteins/metabolism , Protein Binding , RNA, Fungal/metabolism , RNA-Binding Proteins/metabolism , Saccharomyces cerevisiae/drug effects , Saccharomyces cerevisiae/metabolism , Saccharomyces cerevisiae/radiation effects , Saccharomyces cerevisiae Proteins/metabolism , Stress, Physiological , Time Factors , Ultraviolet Rays
4.
Nat Methods ; 14(1): 83-89, 2017 01.
Article in English | MEDLINE | ID: mdl-27819660

ABSTRACT

Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments.


Subject(s)
Algorithms , High-Throughput Nucleotide Sequencing/methods , Models, Statistical , RNA/chemistry , RNA/genetics , Sequence Analysis, RNA/methods , Transcriptome/genetics , Base Pairing , Base Sequence , Computational Biology/methods , Humans , Nucleic Acid Conformation
5.
Genome Biol ; 17(1): 253, 2016 12 07.
Article in English | MEDLINE | ID: mdl-27927225

ABSTRACT

A report on the Wellcome Trust Conference on Computational RNA Biology, held in Hinxton, UK, on 17-19 October 2016.


Subject(s)
Computational Biology/trends , Genomics , RNA/genetics , Humans , Proteomics
SELECTION OF CITATIONS
SEARCH DETAIL