Pesquisa | Portal Regional da BVS

Beyond benchmarking and towards predictive models of dataset-specific single-cell RNA-seq pipeline performance.

Fang, Cindy; Selega, Alina; Campbell, Kieran R.

Genome Biol ; 25(1): 159, 2024 06 17.

Artigo em Inglês | MEDLINE | ID: mdl-38886757

RESUMO

BACKGROUND: The advent of single-cell RNA-sequencing (scRNA-seq) has driven significant computational methods development for all steps in the scRNA-seq data analysis pipeline, including filtering, normalization, and clustering. The large number of methods and their resulting parameter combinations has created a combinatorial set of possible pipelines to analyze scRNA-seq data, which leads to the obvious question: which is best? Several benchmarking studies compare methods but frequently find variable performance depending on dataset and pipeline characteristics. Alternatively, the large number of scRNA-seq datasets along with advances in supervised machine learning raise a tantalizing possibility: could the optimal pipeline be predicted for a given dataset? RESULTS: Here, we begin to answer this question by applying 288 scRNA-seq analysis pipelines to 86 datasets and quantifying pipeline success via a range of measures evaluating cluster purity and biological plausibility. We build supervised machine learning models to predict pipeline success given a range of dataset and pipeline characteristics. We find that prediction performance is significantly better than random and that in many cases pipelines predicted to perform well provide clustering outputs similar to expert-annotated cell type labels. We identify characteristics of datasets that correlate with strong prediction performance that could guide when such prediction models may be useful. CONCLUSIONS: Supervised machine learning models have utility for recommending analysis pipelines and therefore the potential to alleviate the burden of choosing from the near-infinite number of possibilities. Different aspects of datasets influence the predictive performance of such models which will further guide users.

Assuntos

RNA-Seq , Análise da Expressão Gênica de Célula Única , Animais , Humanos , Análise por Conglomerados , Biologia Computacional/métodos , Aprendizado de Máquina , RNA-Seq/métodos , Análise de Sequência de RNA/métodos , Aprendizado de Máquina Supervisionado

TrackSigFreq: subclonal reconstructions based on mutation signatures and allele frequencies.

Harrigan, Caitlin F; Rubanova, Yulia; Morris, Quaid; Selega, Alina.

Pac Symp Biocomput ; 25: 238-249, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-31797600

RESUMO

Mutational signatures are patterns of mutation types, many of which are linked to known mutagenic processes. Signature activity represents the proportion of mutations a signature generates. In cancer, cells may gain advantageous phenotypes through mutation accumulation, causing rapid growth of that subpopulation within the tumour. The presence of many subclones can make cancers harder to treat and have other clinical implications. Reconstructing changes in signature activities can give insight into the evolution of cells within a tumour. Recently, we introduced a new method, TrackSig, to detect changes in signature activities across time from single bulk tumour sample. By design, TrackSig is unable to identify mutation populations with different frequencies but little to no difference in signature activity. Here we present an extension of this method, TrackSigFreq, which enables trajectory reconstruction based on both observed density of mutation frequencies and changes in mutational signature activities. TrackSigFreq preserves the advantages of TrackSig, namely optimal and rapid mutation clustering through segmentation, while extending it so that it can identify distinct mutation populations that share similar signature activities.

Assuntos

Genoma Humano , Neoplasias , Biologia Computacional , Frequência do Gene , Humanos , Mutação , Neoplasias/genética

Kinetic CRAC uncovers a role for Nab3 in determining gene expression profiles during stress.

van Nues, Rob; Schweikert, Gabriele; de Leau, Erica; Selega, Alina; Langford, Andrew; Franklin, Ryan; Iosub, Ira; Wadsworth, Peter; Sanguinetti, Guido; Granneman, Sander.

Nat Commun ; 8(1): 12, 2017 04 11.

Artigo em Inglês | MEDLINE | ID: mdl-28400552

RESUMO

RNA-binding proteins play a key role in shaping gene expression profiles during stress, however, little is known about the dynamic nature of these interactions and how this influences the kinetics of gene expression. To address this, we developed kinetic cross-linking and analysis of cDNAs (χCRAC), an ultraviolet cross-linking method that enabled us to quantitatively measure the dynamics of protein-RNA interactions in vivo on a minute time-scale. Here, using χCRAC we measure the global RNA-binding dynamics of the yeast transcription termination factor Nab3 in response to glucose starvation. These measurements reveal rapid changes in protein-RNA interactions within 1 min following stress imposition. Changes in Nab3 binding are largely independent of alterations in transcription rate during the early stages of stress response, indicating orthogonal transcriptional control mechanisms. We also uncover a function for Nab3 in dampening expression of stress-responsive genes. χCRAC has the potential to greatly enhance our understanding of in vivo dynamics of protein-RNA interactions.Protein RNA interactions are dynamic and regulated in response to environmental changes. Here the authors describe 'kinetic CRAC', an approach that allows time resolved analyses of protein RNA interactions with minute time point resolution and apply it to gain insight into the function of the RNA-binding protein Nab3.

Assuntos

Regulação Fúngica da Expressão Gênica , Proteínas Nucleares/genética , RNA Fúngico/genética , Proteínas de Ligação a RNA/genética , Proteínas de Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/genética , Transcriptoma , Meios de Cultura/farmacologia , DNA Complementar/genética , DNA Complementar/metabolismo , Perfilação da Expressão Gênica , Glucose/deficiência , Cinética , Proteínas Nucleares/metabolismo , Ligação Proteica , RNA Fúngico/metabolismo , Proteínas de Ligação a RNA/metabolismo , Saccharomyces cerevisiae/efeitos dos fármacos , Saccharomyces cerevisiae/metabolismo , Saccharomyces cerevisiae/efeitos da radiação , Proteínas de Saccharomyces cerevisiae/metabolismo , Estresse Fisiológico , Fatores de Tempo , Raios Ultravioleta

Robust statistical modeling improves sensitivity of high-throughput RNA structure probing experiments.

Selega, Alina; Sirocchi, Christel; Iosub, Ira; Granneman, Sander; Sanguinetti, Guido.

Nat Methods ; 14(1): 83-89, 2017 01.

Artigo em Inglês | MEDLINE | ID: mdl-27819660

RESUMO

Structure probing coupled with high-throughput sequencing could revolutionize our understanding of the role of RNA structure in regulation of gene expression. Despite recent technological advances, intrinsic noise and high sequence coverage requirements greatly limit the applicability of these techniques. Here we describe a probabilistic modeling pipeline that accounts for biological variability and biases in the data, yielding statistically interpretable scores for the probability of nucleotide modification transcriptome wide. Using two yeast data sets, we demonstrate that our method has increased sensitivity, and thus our pipeline identifies modified regions on many more transcripts than do existing pipelines. Our method also provides confident predictions at much lower sequence coverage levels than those recommended for reliable structural probing. Our results show that statistical modeling extends the scope and potential of transcriptome-wide structure probing experiments.

Assuntos

Algoritmos , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Modelos Estatísticos , RNA/química , RNA/genética , Análise de Sequência de RNA/métodos , Transcriptoma/genética , Pareamento de Bases , Sequência de Bases , Biologia Computacional/métodos , Humanos , Conformação de Ácido Nucleico

Trends and challenges in computational RNA biology.

Selega, Alina; Sanguinetti, Guido.

Genome Biol ; 17(1): 253, 2016 12 07.

Artigo em Inglês | MEDLINE | ID: mdl-27927225

RESUMO

A report on the Wellcome Trust Conference on Computational RNA Biology, held in Hinxton, UK, on 17-19 October 2016.

Assuntos

Biologia Computacional/tendências , Genômica , RNA/genética , Humanos , Proteômica

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA