Pesquisa | BVS IEC

1.

scATAC-seq preprocessing and imputation evaluation system for visualization, clustering and digital footprinting.

Akhtyamov, Pavel; Shaheen, Layal; Raevskiy, Mikhail; Stupnikov, Alexey; Medvedeva, Yulia A.

Brief Bioinform ; 25(1)2023 11 22.

Artigo em Inglês | MEDLINE | ID: mdl-38084919

RESUMO

Single-cell ATAC-seq (scATAC-seq) is a recently developed approach that provides means to investigate open chromatin at single cell level, to assess epigenetic regulation and transcription factors binding landscapes. The sparsity of the scATAC-seq data calls for imputation. Similarly, preprocessing (filtering) may be required to reduce computational load due to the large number of open regions. However, optimal strategies for both imputation and preprocessing have not been yet evaluated together. We present SAPIEnS (scATAC-seq Preprocessing and Imputation Evaluation System), a benchmark for scATAC-seq imputation frameworks, a combination of state-of-the-art imputation methods with commonly used preprocessing techniques. We assess different types of scATAC-seq analysis, i.e. clustering, visualization and digital genomic footprinting, and attain optimal preprocessing-imputation strategies. We discuss the benefits of the imputation framework depending on the task and the number of the dataset features (peaks). We conclude that the preprocessing with the Boruta method is beneficial for the majority of tasks, while imputation is helpful mostly for small datasets. We also implement a SAPIEnS database with pre-computed transcription factor footprints based on imputed data with their activity scores in a specific cell type. SAPIEnS is published at: https://github.com/lab-medvedeva/SAPIEnS. SAPIEnS database is available at: https://sapiensdb.com.

Assuntos

Epigênese Genética , Genômica , Genômica/métodos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Regulação da Expressão Gênica , Análise por Conglomerados

2.

Assessing the Differential Methylation Analysis Quality for Microarray and NGS Platforms.

Budkina, Anna; Medvedeva, Yulia A; Stupnikov, Alexey.

Int J Mol Sci ; 24(10)2023 May 11.

Artigo em Inglês | MEDLINE | ID: mdl-37239934

RESUMO

Differential methylation (DM) is actively recruited in different types of fundamental and translational studies. Currently, microarray- and NGS-based approaches for methylation analysis are the most widely used with multiple statistical models designed to extract differential methylation signatures. The benchmarking of DM models is challenging due to the absence of gold standard data. In this study, we analyze an extensive number of publicly available NGS and microarray datasets with divergent and widely utilized statistical models and apply the recently suggested and validated rank-statistic-based approach Hobotnica to evaluate the quality of their results. Overall, microarray-based methods demonstrate more robust and convergent results, while NGS-based models are highly dissimilar. Tests on the simulated NGS data tend to overestimate the quality of the DM methods and therefore are recommended for use with caution. Evaluation of the top 10 DMC and top 100 DMC in addition to the not-subset signature also shows more stable results for microarray data. Summing up, given the observed heterogeneity in NGS methylation data, the evaluation of newly generated methylation signatures is a crucial step in DM analysis. The Hobotnica metric is coordinated with previously developed quality metrics and provides a robust, sensitive, and informative estimation of methods' performance and DM signatures' quality in the absence of gold standard data solving a long-existing problem in DM analysis.

Assuntos

Metilação de DNA , Modelos Estatísticos , Análise em Microsséries

3.

In Silico Drug Repurposing in Multiple Sclerosis Using scRNA-Seq Data.

Shevtsov, Andrey; Raevskiy, Mikhail; Stupnikov, Alexey; Medvedeva, Yulia.

Int J Mol Sci ; 24(2)2023 Jan 04.

Artigo em Inglês | MEDLINE | ID: mdl-36674506

RESUMO

Multiple sclerosis (MS) is an autoimmune disease of the central nervous system still lacking a cure. Treatment typically focuses on slowing the progression and managing MS symptoms. Single-cell transcriptomics allows the investigation of the immune system-the key player in MS onset and development-in great detail increasing our understanding of MS mechanisms and stimulating the discovery of the targets for potential therapies. Still, de novo drug development takes decades; however, this can be reduced by drug repositioning. A promising approach is to select potential drugs based on activated or inhibited genes and pathways. In this study, we explored the public single-cell RNA data from an experiment with six patients on single-cell RNA peripheral blood mononuclear cells (PBMC) and cerebrospinal fluid cells (CSF) of patients with MS and idiopathic intracranial hypertension. We demonstrate that AIM2 inflammasome, SMAD2/3 signaling, and complement activation pathways are activated in MS in different CSF and PBMC immune cells. Using genes from top-activated pathways, we detected several promising small molecules to reverse MS immune cells' transcriptomic signatures, including AG14361, FGIN-1-27, CA-074, ARP 101, Flunisolide, and JAK3 Inhibitor VI. Among these molecules, we also detected an FDA-approved MS drug Mitoxantrone, supporting the reliability of our approach.

Assuntos

Esclerose Múltipla , Humanos , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/genética , Reposicionamento de Medicamentos , Leucócitos Mononucleares/metabolismo , Reprodutibilidade dos Testes , Análise da Expressão Gênica de Célula Única , RNA/metabolismo

4.

Approaches for sRNA Analysis of Human RNA-Seq Data: Comparison, Benchmarking.

Bezuglov, Vitalik; Stupnikov, Alexey; Skakov, Ivan; Shtratnikova, Victoria; Pilsner, J Richard; Suvorov, Alexander; Sergeyev, Oleg.

Int J Mol Sci ; 24(4)2023 Feb 20.

Artigo em Inglês | MEDLINE | ID: mdl-36835604

RESUMO

Expression analysis of small noncoding RNA (sRNA), including microRNA, piwi-interacting RNA, small rRNA-derived RNA, and tRNA-derived small RNA, is a novel and quickly developing field. Despite a range of proposed approaches, selecting and adapting a particular pipeline for transcriptomic analysis of sRNA remains a challenge. This paper focuses on the identification of the optimal pipeline configurations for each step of human sRNA analysis, including reads trimming, filtering, mapping, transcript abundance quantification and differential expression analysis. Based on our study, we suggest the following parameters for the analysis of human sRNA in relation to categorical analyses with two groups of biosamples: (1) trimming with the lower length bound = 15 and the upper length bound = Read length - 40% Adapter length; (2) mapping on a reference genome with bowtie aligner with one mismatch allowed (-v 1 parameter); (3) filtering by mean threshold > 5; (4) analyzing differential expression with DESeq2 with adjusted p-value < 0.05 or limma with p-value < 0.05 if there is very little signal and few transcripts.

Assuntos

Pequeno RNA não Traduzido , Humanos , Benchmarking , Sequenciamento de Nucleotídeos em Larga Escala , Pequeno RNA não Traduzido/genética , RNA-Seq , Análise de Sequência de RNA

5.

Cooption of heat shock regulatory system for anhydrobiosis in the sleeping chironomid Polypedilum vanderplanki.

Mazin, Pavel V; Shagimardanova, Elena; Kozlova, Olga; Cherkasov, Alexander; Sutormin, Roman; Stepanova, Vita V; Stupnikov, Alexey; Logacheva, Maria; Penin, Aleksey; Sogame, Yoichiro; Cornette, Richard; Tokumoto, Shoko; Miyata, Yugo; Kikawada, Takahiro; Gelfand, Mikhail S; Gusev, Oleg.

Proc Natl Acad Sci U S A ; 115(10): E2477-E2486, 2018 03 06.

Artigo em Inglês | MEDLINE | ID: mdl-29463761

RESUMO

Polypedilum vanderplanki is a striking and unique example of an insect that can survive almost complete desiccation. Its genome and a set of dehydration-rehydration transcriptomes, together with the genome of Polypedilum nubifer (a congeneric desiccation-sensitive midge), were recently released. Here, using published and newly generated datasets reflecting detailed transcriptome changes during anhydrobiosis, as well as a developmental series, we show that the TCTAGAA DNA motif, which closely resembles the binding motif of the Drosophila melanogaster heat shock transcription activator (Hsf), is significantly enriched in the promoter regions of desiccation-induced genes in P. vanderplanki, such as genes encoding late embryogenesis abundant (LEA) proteins, thioredoxins, or trehalose metabolism-related genes, but not in P. nubifer Unlike P. nubifer, P. vanderplanki has double TCTAGAA sites upstream of the Hsf gene itself, which is probably responsible for the stronger activation of Hsf in P. vanderplanki during desiccation compared with P. nubifer To confirm the role of Hsf in desiccation-induced gene activation, we used the Pv11 cell line, derived from P. vanderplanki embryo. After preincubation with trehalose, Pv11 cells can enter anhydrobiosis and survive desiccation. We showed that Hsf knockdown suppresses trehalose-induced activation of multiple predicted Hsf targets (including P. vanderplanki-specific LEA protein genes) and reduces the desiccation survival rate of Pv11 cells fivefold. Thus, cooption of the heat shock regulatory system has been an important evolutionary mechanism for adaptation to desiccation in P. vanderplanki.

Assuntos

Chironomidae/fisiologia , Fatores de Transcrição de Choque Térmico/metabolismo , Proteínas de Insetos/metabolismo , Animais , Evolução Biológica , Chironomidae/genética , Desidratação , Feminino , Fatores de Transcrição de Choque Térmico/genética , Resposta ao Choque Térmico , Proteínas de Insetos/genética , Masculino , Estresse Fisiológico

6.

NUQA: Estimating Cancer Spatial and Temporal Heterogeneity and Evolution through Alignment-Free Methods.

Roddy, Aideen C; Jurek-Loughrey, Anna; Souza, Jose; Gilmore, Alan; O'Reilly, Paul G; Stupnikov, Alexey; Gonzalez de Castro, David; Prise, Kevin M; Salto-Tellez, Manuel; McArt, Darragh G.

Mol Biol Evol ; 36(12): 2883-2889, 2019 12 01.

Artigo em Inglês | MEDLINE | ID: mdl-31424551

RESUMO

Longitudinal next-generation sequencing of cancer patient samples has enhanced our understanding of the evolution and progression of various cancers. As a result, and due to our increasing knowledge of heterogeneity, such sampling is becoming increasingly common in research and clinical trial sample collections. Traditionally, the evolutionary analysis of these cohorts involves the use of an aligner followed by subsequent stringent downstream analyses. However, this can lead to large levels of information loss due to the vast mutational landscape that characterizes tumor samples. Here, we propose an alignment-free approach for sequence comparison-a well-established approach in a range of biological applications including typical phylogenetic classification. Such methods could be used to compare information collated in raw sequence files to allow an unsupervised assessment of the evolutionary trajectory of patient genomic profiles. In order to highlight this utility in cancer research we have applied our alignment-free approach using a previously established metric, Jensen-Shannon divergence, and a metric novel to this area, Hellinger distance, to two longitudinal cancer patient cohorts in glioma and clear cell renal cell carcinoma using our software, NUQA. We hypothesize that this approach has the potential to reveal novel information about the heterogeneity and evolutionary trajectory of spatiotemporal tumor samples, potentially revealing early events in tumorigenesis and the origins of metastases and recurrences. Key words: alignment-free, Hellinger distance, exome-seq, evolution, phylogenetics, longitudinal.

Assuntos

Evolução Biológica , Heterogeneidade Genética , Técnicas Genéticas , Neoplasias/genética , Software , Humanos

7.

Comparing biological information contained in mRNA and non-coding RNAs for classification of lung cancer patients.

Smolander, Johannes; Stupnikov, Alexey; Glazko, Galina; Dehmer, Matthias; Emmert-Streib, Frank.

BMC Cancer ; 19(1): 1176, 2019 Dec 03.

Artigo em Inglês | MEDLINE | ID: mdl-31796020

RESUMO

BACKGROUND: Deciphering the meaning of the human DNA is an outstanding goal which would revolutionize medicine and our way for treating diseases. In recent years, non-coding RNAs have attracted much attention and shown to be functional in part. Yet the importance of these RNAs especially for higher biological functions remains under investigation. METHODS: In this paper, we analyze RNA-seq data, including non-coding and protein coding RNAs, from lung adenocarcinoma patients, a histologic subtype of non-small-cell lung cancer, with deep learning neural networks and other state-of-the-art classification methods. The purpose of our paper is three-fold. First, we compare the classification performance of different versions of deep belief networks with SVMs, decision trees and random forests. Second, we compare the classification capabilities of protein coding and non-coding RNAs. Third, we study the influence of feature selection on the classification performance. RESULTS: As a result, we find that deep belief networks perform at least competitively to other state-of-the-art classifiers. Second, data from non-coding RNAs perform better than coding RNAs across a number of different classification methods. This demonstrates the equivalence of predictive information as captured by non-coding RNAs compared to protein coding RNAs, conventionally used in computational diagnostics tasks. Third, we find that feature selection has in general a negative effect on the classification performance which means that unfiltered data with all features give the best classification results. CONCLUSIONS: Our study is the first to use ncRNAs beyond miRNAs for the computational classification of cancer and for performing a direct comparison of the classification capabilities of protein coding RNAs and non-coding RNAs.

Assuntos

Neoplasias Pulmonares/classificação , Neoplasias Pulmonares/genética , RNA Mensageiro/metabolismo , RNA não Traduzido/genética , Biologia Computacional/métodos , Árvores de Decisões , Humanos , Neoplasias Pulmonares/patologia , Aprendizado de Máquina , MicroRNAs/genética , Redes Neurais de Computação , RNA Mensageiro/genética , Análise de Sequência de RNA/métodos

8.

samExploreR: exploring reproducibility and robustness of RNA-seq results based on SAM files.

Stupnikov, Alexey; Tripathi, Shailesh; de Matos Simoes, Ricardo; McArt, Darragh; Salto-Tellez, Manuel; Glazko, Galina; Dehmer, Matthias; Emmert-Streib, Frank.

Bioinformatics ; 32(21): 3345-3347, 2016 11 01.

Artigo em Inglês | MEDLINE | ID: mdl-27402900

RESUMO

MOTIVATION: Data from RNA-seq experiments provide us with many new possibilities to gain insights into biological and disease mechanisms of cellular functioning. However, the reproducibility and robustness of RNA-seq data analysis results is often unclear. This is in part attributed to the two counter acting goals of (i) a cost efficient and (ii) an optimal experimental design leading to a compromise, e.g. in the sequencing depth of experiments. RESULTS: We introduce an R package called samExploreR that allows the subsampling (m out of n bootstraping) of short-reads based on SAM files facilitating the investigation of sequencing depth related questions for the experimental design. Overall, this provides a systematic way for exploring the reproducibility and robustness of general RNA-seq studies. We exemplify the usage of samExploreR by studying the influence of the sequencing depth and the annotation on the identification of differentially expressed genes. AVAILABILITY AND IMPLEMENTATION: samExploreR is available as an R package from Bioconductor. CONTACT: v@bio-complexity.comSupplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

RNA/genética , Análise de Sequência de RNA , Reprodutibilidade dos Testes , Projetos de Pesquisa , Software

9.

Effects of subsampling on characteristics of RNA-seq data from triple-negative breast cancer patients.

Stupnikov, Alexey; Glazko, Galina V; Emmert-Streib, Frank.

Chin J Cancer ; 34(10): 427-38, 2015 Aug 08.

Artigo em Inglês | MEDLINE | ID: mdl-26253000

RESUMO

BACKGROUND: Data from RNA-seq experiments provide a wealth of information about the transcriptome of an organism. However, the analysis of such data is very demanding. In this study, we aimed to establish robust analysis procedures that can be used in clinical practice. METHODS: We studied RNA-seq data from triple-negative breast cancer patients. Specifically, we investigated the subsampling of RNA-seq data. RESULTS: The main results of our investigations are as follows: (1) the subsampling of RNA-seq data gave biologically realistic simulations of sequencing experiments with smaller sequencing depth but not direct scaling of count matrices; (2) the saturation of results required an average sequencing depth larger than 32 million reads and an individual sequencing depth larger than 46 million reads; and (3) for an abrogated feature selection, higher moments of the distribution of all expressed genes had a higher sensitivity for signal detection than the corresponding mean values. CONCLUSIONS: Our results reveal important characteristics of RNA-seq data that must be understood before one can apply such an approach to translational medicine.

Assuntos

Perfilação da Expressão Gênica , RNA , Neoplasias de Mama Triplo Negativas , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Transcriptoma

10.

ITAS: Integrated Transcript Annotation for Small RNA.

Stupnikov, Alexey; Bezuglov, Vitaly; Skakov, Ivan; Shtratnikova, Victoria; Pilsner, J Richard; Suvorov, Alexander; Sergeyev, Oleg.

Noncoding RNA ; 8(3)2022 May 02.

Artigo em Inglês | MEDLINE | ID: mdl-35645337

RESUMO

Transcriptomics analysis of various small RNA (sRNA) biotypes is a new and rapidly developing field. Annotations for microRNAs, tRNAs, piRNAs and rRNAs contain information on transcript sequences and loci that is vital for downstream analyses. Several databases have been established to provide this type of data for specific RNA biotypes. However, these sources often contain data in different formats, which makes the bulk analysis of several sRNA biotypes in a single pipeline challenging. Information on some transcripts may be incomplete or conflicting with other entries. To overcome these challenges, we introduce ITAS, or Integrated Transcript Annotation for Small RNA, a filtered, corrected and integrated transcript annotation containing information on several types of small RNAs, including tRNA-derived small RNA, for several species (Homo sapiens, Rattus norvegicus, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans). ITAS is presented in a format applicable for the vast majority of bioinformatic transcriptomics analysis, and it was tested in several case studies for human-derived data against existing alternative databases.

11.

Hobotnica: exploring molecular signature quality.

Stupnikov, Alexey; Sizykh, Alexey; Budkina, Anna; Favorov, Alexander; Afsari, Bahman; Wheelan, Sarah; Marchionni, Luigi; Medvedeva, Yulia.

F1000Res ; 10: 1260, 2021.

Artigo em Inglês | MEDLINE | ID: mdl-36204675

RESUMO

A Molecular Features Set (MFS), is a result of a vast diversity of bioinformatics pipelines. The lack of a "gold standard" for most experimental data modalities makes it difficult to provide valid estimation for a particular MFS's quality. Yet, this goal can partially be achieved by analyzing inner-sample Distance Matrices (DM) and their power to distinguish between phenotypes. The quality of a DM can be assessed by summarizing its power to quantify the differences of inner-phenotype and outer-phenotype distances. This estimation of the DM quality can be construed as a measure of the MFS's quality. Here we propose Hobotnica, an approach to estimate MFSs quality by their ability to stratify data, and assign them significance scores, that allow for collating various signatures and comparing their quality for contrasting groups.

Assuntos

Biologia Computacional , Fenótipo

12.

Impact of Variable RNA-Sequencing Depth on Gene Expression Signatures and Target Compound Robustness: Case Study Examining Brain Tumor (Glioma) Disease Progression.

Stupnikov, Alexey; O'Reilly, Paul G; McInerney, Caitriona E; Roddy, Aideen C; Dunne, Philip D; Gilmore, Alan; Ellis, Hayley P; Flannery, Tom; Healy, Estelle; McIntosh, Stuart A; Savage, Kienan; Kurian, Kathreena M; Emmert-Streib, Frank; Prise, Kevin M; Salto-Tellez, Manuel; McArt, Darragh G.

JCO Precis Oncol ; 22018 Sep 13.

Artigo em Inglês | MEDLINE | ID: mdl-30324181

RESUMO

PURPOSE: Gene expression profiling can uncover biologic mechanisms underlying disease and is important in drug development. RNA sequencing (RNA-seq) is routinely used to assess gene expression, but costs remain high. Sample multiplexing reduces RNAseq costs; however, multiplexed samples have lower cDNA sequencing depth, which can hinder accurate differential gene expression detection. The impact of sequencing depth alteration on RNA-seq-based downstream analyses such as gene expression connectivity mapping is not known, where this method is used to identify potential therapeutic compounds for repurposing. METHODS: In this study, published RNA-seq profiles from patients with brain tumor (glioma) were assembled into two disease progression gene signature contrasts for astrocytoma. Available treatments for glioma have limited effectiveness, rendering this a disease of poor clinical outcome. Gene signatures were subsampled to simulate sequencing alterations and analyzed in connectivity mapping to investigate target compound robustness. RESULTS: Data loss to gene signatures led to the loss, gain, and consistent identification of significant connections. The most accurate gene signature contrast with consistent patient gene expression profiles was more resilient to data loss and identified robust target compounds. Target compounds lost included candidate compounds of potential clinical utility in glioma (eg, suramin, dasatinib). Lost connections may have been linked to low-abundance genes in the gene signature that closely characterized the disease phenotype. Consistently identified connections may have been related to highly expressed abundant genes that were ever-present in gene signatures, despite data reductions. Potential noise surrounding findings included false-positive connections that were gained as a result of gene signature modification with data loss. CONCLUSION: Findings highlight the necessity for gene signature accuracy for connectivity mapping, which should improve the clinical utility of future target compound discoveries.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA