Pesquisa | BVS Doenças Infecciosas e Parasitárias

1.

Matrix factorization and transfer learning uncover regulatory biology across multiple single-cell ATAC-seq data sets.

Erbe, Rossin; Kessler, Michael D; Favorov, Alexander V; Easwaran, Hariharan; Gaykalova, Daria A; Fertig, Elana J.

Nucleic Acids Res ; 48(12): e68, 2020 07 09.

Artigo em Inglês | MEDLINE | ID: mdl-32392348

RESUMO

While the methods available for single-cell ATAC-seq analysis are well optimized for clustering cell types, the question of how to integrate multiple scATAC-seq data sets and/or sequencing modalities is still open. We present an analysis framework that enables such integration across scATAC-seq data sets by applying the CoGAPS Matrix Factorization algorithm and the projectR transfer learning program to identify common regulatory patterns across scATAC-seq data sets. We additionally integrate our analysis with scRNA-seq data to identify orthogonal evidence for transcriptional regulators predicted by scATAC-seq analysis. Using publicly available scATAC-seq data, we find patterns that accurately characterize cell types both within and across data sets. Furthermore, we demonstrate that these patterns are both consistent with current biological understanding and reflective of novel regulatory biology.

Assuntos

Algoritmos , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Animais , Cromatina/genética , Conjuntos de Dados como Assunto , Humanos , Aprendizado de Máquina

2.

Allele-specific nonstationarity in evolution of influenza A virus surface proteins.

Popova, Anfisa V; Safina, Ksenia R; Ptushenko, Vasily V; Stolyarova, Anastasia V; Favorov, Alexander V; Neverov, Alexey D; Bazykin, Georgii A.

Proc Natl Acad Sci U S A ; 116(42): 21104-21112, 2019 10 15.

Artigo em Inglês | MEDLINE | ID: mdl-31578251

RESUMO

Influenza A virus (IAV) is a major public health problem and a pandemic threat. Its evolution is largely driven by diversifying positive selection so that relative fitness of different amino acid variants changes with time due to changes in herd immunity or genomic context, and novel amino acid variants attain fitness advantage. Here, we hypothesize that diversifying selection also has another manifestation: the fitness associated with a particular amino acid variant should decline with time since its origin, as the herd immunity adapts to it. By tracing the evolution of antigenic sites at IAV surface proteins, we show that an amino acid variant becomes progressively more likely to become replaced by another variant with time since its origin-a phenomenon we call "senescence." Senescence is particularly pronounced at experimentally validated antigenic sites, implying that it is largely driven by host immunity. By contrast, at internal sites, existing variants become more favorable with time, probably due to arising contingent mutations at other epistatically interacting sites. Our findings reveal a previously undescribed facet of adaptive evolution and suggest approaches for prediction of evolutionary dynamics of pathogens.

Assuntos

Aminoácidos/genética , Vírus da Influenza A/genética , Proteínas de Membrana/genética , Proteínas Virais/genética , Alelos , Aminoácidos/imunologia , Antígenos Virais/genética , Antígenos Virais/imunologia , Evolução Molecular , Variação Genética/genética , Variação Genética/imunologia , Vírus da Influenza A/imunologia , Proteínas de Membrana/imunologia , Pandemias , Proteínas Virais/imunologia

3.

Enter the Matrix: Factorization Uncovers Knowledge from Omics.

Stein-O'Brien, Genevieve L; Arora, Raman; Culhane, Aedin C; Favorov, Alexander V; Garmire, Lana X; Greene, Casey S; Goff, Loyal A; Li, Yifeng; Ngom, Aloune; Ochs, Michael F; Xu, Yanxun; Fertig, Elana J.

Trends Genet ; 34(10): 790-805, 2018 10.

Artigo em Inglês | MEDLINE | ID: mdl-30143323

RESUMO

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.

Assuntos

Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Algoritmos , Humanos , Biologia de Sistemas/estatística & dados numéricos

4.

Inferring causal molecular networks: empirical assessment through a community-based effort.

Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas; Unger, Michael; Nesser, Nicole K; Carlin, Daniel E; Zhang, Yang; Sokolov, Artem; Paull, Evan O; Wong, Chris K; Graim, Kiley; Bivol, Adrian; Wang, Haizhou; Zhu, Fan; Afsari, Bahman; Danilova, Ludmila V; Favorov, Alexander V; Lee, Wai Shing; Taylor, Dane; Hu, Chenyue W; Long, Byron L; Noren, David P; Bisberg, Alexander J; Mills, Gordon B; Gray, Joe W; Kellen, Michael; Norman, Thea; Friend, Stephen; Qutub, Amina A; Fertig, Elana J; Guan, Yuanfang; Song, Mingzhou; Stuart, Joshua M; Spellman, Paul T; Koeppl, Heinz; Stolovitzky, Gustavo; Saez-Rodriguez, Julio; Mukherjee, Sach.

Nat Methods ; 13(4): 310-8, 2016 Apr.

Artigo em Inglês | MEDLINE | ID: mdl-26901648

RESUMO

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.

Assuntos

Causalidade , Redes Reguladoras de Genes , Neoplasias/genética , Mapeamento de Interação de Proteínas/métodos , Software , Biologia de Sistemas , Algoritmos , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Transdução de Sinais , Células Tumorais Cultivadas

5.

Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer.

Afsari, Bahman; Guo, Theresa; Considine, Michael; Florea, Liliana; Kagohara, Luciane T; Stein-O'Brien, Genevieve L; Kelley, Dylan; Flam, Emily; Zambo, Kristina D; Ha, Patrick K; Geman, Donald; Ochs, Michael F; Califano, Joseph A; Gaykalova, Daria A; Favorov, Alexander V; Fertig, Elana J.

Bioinformatics ; 34(11): 1859-1867, 2018 06 01.

Artigo em Inglês | MEDLINE | ID: mdl-29342249

RESUMO

Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data. Availability and implementation: SEVA is implemented in the R/Bioconductor package GSReg. Contact: bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Assuntos

Processamento Alternativo , Neoplasias/genética , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Software , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias de Cabeça e Pescoço/genética , Humanos , Modelos Genéticos

6.

StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A.

Bioinformatics ; 33(20): 3158-3165, 2017 Oct 15.

Artigo em Inglês | MEDLINE | ID: mdl-29028265

RESUMO

MOTIVATION: Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. RESULTS: Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. AVAILABILITY AND IMPLEMENTATION: The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. CONTACT: favorov@sensi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Regulação da Expressão Gênica , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Imunoprecipitação da Cromatina/métodos , Epigenômica/métodos , Genoma Humano , Humanos

7.

PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF.

Stein-O'Brien, Genevieve L; Carey, Jacob L; Lee, Wai Shing; Considine, Michael; Favorov, Alexander V; Flam, Emily; Guo, Theresa; Li, Sijia; Marchionni, Luigi; Sherman, Thomas; Sivy, Shawn; Gaykalova, Daria A; McKay, Ronald D; Ochs, Michael F; Colantuoni, Carlo; Fertig, Elana J.

Bioinformatics ; 33(12): 1892-1894, 2017 Jun 15.

Artigo em Inglês | MEDLINE | ID: mdl-28174896

RESUMO

SUMMARY: Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. AVAILABILITY AND IMPLEMENTATION: PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. CONTACT: gsteinobrien@jhmi.edu or ccolantu@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Assuntos

Algoritmos , Perfilação da Expressão Gênica/métodos , Software , Teorema de Bayes , Biomarcadores , Humanos , Análise de Sequência de RNA/métodos

8.

Genetic risk factors for myocardial infarction more clearly manifest for early age of first onset.

Titov, Boris V; Osmak, German J; Matveeva, Natalia A; Kukava, Nino G; Shakhnovich, Roman M; Favorov, Alexander V; Ruda, Mikhail Ya; Favorova, Olga O.

Mol Biol Rep ; 44(4): 315-321, 2017 Aug.

Artigo em Inglês | MEDLINE | ID: mdl-28685248

RESUMO

Epidemiological genetics established that heritability in determining the risk of myocardial infarction (MI) is substantially greater when MI occurs early in life. However, the genetic architecture of early-onset and late-onset MI was not compared. We analyzed genotype frequencies of SNPs in/near 20 genes whose protein products are involved in the pathogenesis of atherosclerosis in two groups of Russian patients with MI: the first group included patients with age of first MI onset <60 years (N = 230) and the second group with onset ≥60 years (N = 174). The control group of corresponding ethnicity consisted of 193 unrelated volunteers without cardiovascular diseases (93 individuals were over 60 years). We found that in the group of patients with age of onset <60 years, SNPs FGB rs1800788*T, TGFB1 rs1982073*T/T, ENOS rs2070744*C and CRP rs1130864*T/T were associated with risk of MI, whereas in patients with age of onset ≥60 years, only TGFB1 rs1982073*T/T was associated with risk of MI. Using APSampler software, we found composite markers associated with MI only in patients with early onset: FGB rs1800788*T + TGFB1 rs1982073*T; FGB rs1800788*T + LPL rs328*C + IL4 rs2243250*C; FGB rs1800788*T + ENOS rs2070744*C (Fisher p values of 1.4 × 10-6 to 2.2 × 10-5; the permutation p values of 1.1 × 10-5 to 3.0 × 10-4; ORs = 2.67-2.54). Alleles included in the combinations were associated with MI less significantly and with lower ORs than the combinations themselves. The result showed a substantially greater contribution of the genetic component in the development of MI if it occurs early in life, and demonstrated the usefulness of genetic testing for young people.

Assuntos

Aterosclerose/genética , Infarto do Miocárdio/genética , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Alelos , Biomarcadores/sangue , Feminino , Frequência do Gene/genética , Estudos de Associação Genética , Predisposição Genética para Doença/genética , Humanos , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/epidemiologia , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Federação Russa

9.

Natural variation of gene models in Drosophila melanogaster.

Kurmangaliyev, Yerbol Z; Favorov, Alexander V; Osman, Noha M; Lehmann, Kjong-Van; Campo, Daniel; Salomon, Matthew P; Tower, John; Gelfand, Mikhail S; Nuzhdin, Sergey V.

BMC Genomics ; 16: 198, 2015 Mar 17.

Artigo em Inglês | MEDLINE | ID: mdl-25888292

RESUMO

BACKGROUND: Variation within splicing regulatory sequences often leads to differences in gene models among individuals within a species. Two alleles of the same gene may express transcripts with different exon/intron structures and consequently produce functionally different proteins. Matching genomic and transcriptomic data allows us to identify putative regulatory variants associated with changes in splicing patterns. RESULTS: Here we analyzed natural variation of splicing patterns in the transcriptomes of 81 natural strains of Drosophila melanogaster with known genotypes. We identified dozens of genotype-specific splicing patterns associated with putative cis-splicing quantitative trait loci (sQTL). The majority of changes can be explained by mutations in splice sites. Allelic-imbalance in splicing patterns confirmed that the majority are regulated mainly by cis-genetic effects. Remarkably, allele-specific splicing changes often lead to qualitative changes in gene models, yielding many isoforms not previously annotated. The observed alterations are typically outside protein-coding regions or affect only very short protein segments. CONCLUSIONS: Overall, the sets of gene models appear to be flexible within D. melanogaster populations. The observed variation in splicing patterns are predicted to have limited effects on the encoded protein sequences. To our knowledge, this is the first sQTL mapping study in Drosophila.

Assuntos

Drosophila melanogaster/genética , Variação Genética , Modelos Genéticos , Alelos , Desequilíbrio Alélico , Processamento Alternativo , Animais , Éxons , Perfilação da Expressão Gênica , Genótipo , Fases de Leitura Aberta , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sítios de Splice de RNA , Transcriptoma

10.

Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.

Parker, Hilary S; Leek, Jeffrey T; Favorov, Alexander V; Considine, Michael; Xia, Xiaoxin; Chavan, Sameer; Chung, Christine H; Fertig, Elana J.

Bioinformatics ; 30(19): 2757-63, 2014 Oct.

Artigo em Inglês | MEDLINE | ID: mdl-24907368

RESUMO

MOTIVATION: Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori. RESULTS: Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict Human Papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set. AVAILABILITY AND IMPLEMENTATION: All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https://sourceforge.net/projects/psva.

Assuntos

Algoritmos , Genômica/métodos , Neoplasias de Cabeça e Pescoço/genética , Infecções por Papillomavirus/diagnóstico , Artefatos , Biologia Computacional/métodos , Neoplasias de Cabeça e Pescoço/virologia , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Software

11.

CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A.

Nucleic Acids Res ; 40(12): e93, 2012 Jul.

Artigo em Inglês | MEDLINE | ID: mdl-22422836

RESUMO

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

Assuntos

Regulação da Expressão Gênica , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Algoritmos , Animais , Padronização Corporal/genética , Drosophila/embriologia , Drosophila/genética , Drosophila/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Músculos/metabolismo , Matrizes de Pontuação de Posição Específica , Software

12.

Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces.

Deshpande, Atul; Loth, Melanie; Sidiropoulos, Dimitrios N; Zhang, Shuming; Yuan, Long; Bell, Alexander T F; Zhu, Qingfeng; Ho, Won Jin; Santa-Maria, Cesar; Gilkes, Daniele M; Williams, Stephen R; Uytingco, Cedric R; Chew, Jennifer; Hartnett, Andrej; Bent, Zachary W; Favorov, Alexander V; Popel, Aleksander S; Yarchoan, Mark; Kiemen, Ashley; Wu, Pei-Hsun; Fujikura, Kohei; Wirtz, Denis; Wood, Laura D; Zheng, Lei; Jaffee, Elizabeth M; Anders, Robert A; Danilova, Ludmila; Stein-O'Brien, Genevieve; Kagohara, Luciane T; Fertig, Elana J.

Cell Syst ; 14(4): 285-301.e4, 2023 04 19.

Artigo em Inglês | MEDLINE | ID: mdl-37080163

RESUMO

Recent advances in spatial transcriptomics (STs) enable gene expression measurements from a tissue sample while retaining its spatial context. This technology enables unprecedented in situ resolution of the regulatory pathways that underlie the heterogeneity in the tumor as well as the tumor microenvironment (TME). The direct characterization of cellular co-localization with spatial technologies facilities quantification of the molecular changes resulting from direct cell-cell interaction, as it occurs in tumor-immune interactions. We present SpaceMarkers, a bioinformatics algorithm to infer molecular changes from cell-cell interactions from latent space analysis of ST data. We apply this approach to infer the molecular changes from tumor-immune interactions in Visium spatial transcriptomics data of metastasis, invasive and precursor lesions, and immunotherapy treatment. Further transfer learning in matched scRNA-seq data enabled further quantification of the specific cell types in which SpaceMarkers are enriched. Altogether, SpaceMarkers can identify the location and context-specific molecular interactions within the TME from ST data.

Assuntos

Algoritmos , Microambiente Tumoral , Comunicação Celular , Biologia Computacional , Perfilação da Expressão Gênica

13.

Investigation of the Role of PUFA Metabolism in Breast Cancer Using a Rank-Based Random Forest Algorithm.

Guryleva, Mariia V; Penzar, Dmitry D; Chistyakov, Dmitry V; Mironov, Andrey A; Favorov, Alexander V; Sergeeva, Marina G.

Cancers (Basel) ; 14(19)2022 Sep 25.

Artigo em Inglês | MEDLINE | ID: mdl-36230586

RESUMO

Polyunsaturated fatty acid (PUFA) metabolism is currently a focus in cancer research due to PUFAs functioning as structural components of the membrane matrix, as fuel sources for energy production, and as sources of secondary messengers, so called oxylipins, important players of inflammatory processes. Although breast cancer (BC) is the leading cause of cancer death among women worldwide, no systematic study of PUFA metabolism as a system of interrelated processes in this disease has been carried out. Here, we implemented a Boruta-based feature selection algorithm to determine the list of most important PUFA metabolism genes altered in breast cancer tissues compared with in normal tissues. A rank-based Random Forest (RF) model was built on the selected gene list (33 genes) and applied to predict the cancer phenotype to ascertain the PUFA genes involved in cancerogenesis. It showed high-performance of dichotomic classification (balanced accuracy of 0.94, ROC AUC 0.99) We also retrieved a list of the important PUFA genes (46 genes) that differed between molecular subtypes at the level of breast cancer molecular subtypes. The balanced accuracy of the classification model built on the specified genes was 0.82, while the ROC AUC for the sensitivity analysis was 0.85. Specific patterns of PUFA metabolic changes were obtained for each molecular subtype of breast cancer. These results show evidence that (1) PUFA metabolism genes are critical for the pathogenesis of breast cancer; (2) BC subtypes differ in PUFA metabolism genes expression; and (3) the lists of genes selected in the models are enriched with genes involved in the metabolism of signaling lipids.

14.

Deconvolution of B cell receptor repertoire in multiple sclerosis patients revealed a delay in tBreg maturation.

Lomakin, Yakov A; Zvyagin, Ivan V; Ovchinnikova, Leyla A; Kabilov, Marsel R; Staroverov, Dmitriy B; Mikelov, Artem; Tupikin, Alexey E; Zakharova, Maria Y; Bykova, Nadezda A; Mukhina, Vera S; Favorov, Alexander V; Ivanova, Maria; Simaniv, Taras; Rubtsov, Yury P; Chudakov, Dmitriy M; Zakharova, Maria N; Illarioshkin, Sergey N; Belogurov, Alexey A; Gabibov, Alexander G.

Front Immunol ; 13: 803229, 2022.

Artigo em Inglês | MEDLINE | ID: mdl-36052064

RESUMO

Background: B lymphocytes play a pivotal regulatory role in the development of the immune response. It was previously shown that deficiency in B regulatory cells (Bregs) or a decrease in their anti-inflammatory activity can lead to immunological dysfunctions. However, the exact mechanisms of Bregs development and functioning are only partially resolved. For instance, only a little is known about the structure of their B cell receptor (BCR) repertoires in autoimmune disorders, including multiple sclerosis (MS), a severe neuroinflammatory disease with a yet unknown etiology. Here, we elucidate specific properties of B regulatory cells in MS. Methods: We performed a prospective study of the transitional Breg (tBreg) subpopulations with the CD19+CD24highCD38high phenotype from MS patients and healthy donors by (i) measuring their content during two diverging courses of relapsing-remitting MS: benign multiple sclerosis (BMS) and highly active multiple sclerosis (HAMS); (ii) analyzing BCR repertoires of circulating B cells by high-throughput sequencing; and (iii) measuring the percentage of CD27+ cells in tBregs. Results: The tBregs from HAMS patients carry the heavy chain with a lower amount of hypermutations than tBregs from healthy donors. The percentage of transitional CD24highCD38high B cells is elevated, whereas the frequency of differentiated CD27+ cells in this transitional B cell subset was decreased in the MS patients as compared with healthy donors. Conclusions: Impaired maturation of regulatory B cells is associated with MS progression.

Assuntos

Linfócitos B Reguladores , Esclerose Múltipla , Humanos , Interleucina-10 , Estudos Prospectivos , Receptores de Antígenos de Linfócitos B

15.

CoGAPS: an R/C++ package to identify patterns and biological process activity in transcriptomic data.

Fertig, Elana J; Ding, Jie; Favorov, Alexander V; Parmigiani, Giovanni; Ochs, Michael F.

Bioinformatics ; 26(21): 2792-3, 2010 Nov 01.

Artigo em Inglês | MEDLINE | ID: mdl-20810601

RESUMO

SUMMARY: Coordinated Gene Activity in Pattern Sets (CoGAPS) provides an integrated package for isolating gene expression driven by a biological process, enhancing inference of biological processes from transcriptomic data. CoGAPS improves on other enrichment measurement methods by combining a Markov chain Monte Carlo (MCMC) matrix factorization algorithm (GAPS) with a threshold-independent statistic inferring activity on gene sets. The software is provided as open source C++ code built on top of JAGS software with an R interface. AVAILABILITY: The R package CoGAPS and the C++ package GAPS-JAGS are provided open source under the GNU Lesser Public License (GLPL) with a users manual containing installation and operating instructions. CoGAPS is available through Bioconductor and depends on the rjags package available through CRAN to interface CoGAPS with GAPS-JAGS. URL: http://www.cancerbiostats.onc.jhmi.edu/cogaps.cfm .

Assuntos

Expressão Gênica , Genômica/métodos , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Software , Biologia Computacional/métodos , Perfilação da Expressão Gênica , Cadeias de Markov

16.

Spatial correlation statistics enable transcriptome-wide characterization of RNA structure binding.

Busa, Veronica F; Favorov, Alexander V; Fertig, Elana J; Leung, Anthony K L.

Cell Rep Methods ; 1(6): 100088, 2021 10 25.

Artigo em Inglês | MEDLINE | ID: mdl-35474897

RESUMO

Molecular interactions at identical transcriptomic locations or at proximal but non-overlapping sites can mediate RNA modification and regulation, necessitating tools to uncover these spatial relationships. We present nearBynding, a flexible algorithm and software pipeline that models spatial correlation between transcriptome-wide tracks from diverse data types. nearBynding can process and correlate interval as well as continuous data and incorporate experimentally derived or in silico predicted transcriptomic tracks. nearBynding offers visualization functions for its statistics to identify colocalizations and adjacent features. We demonstrate the application of nearBynding to correlate RNA-binding protein (RBP) binding preferences with other RBPs, RNA structure, or RNA modification. By cross-correlating RBP binding and RNA structure data, we demonstrate that nearBynding recapitulates known RBP binding to structural motifs and provides biological insights into RBP binding preference of G-quadruplexes. nearBynding is available as an R/Bioconductor package and can run on a personal computer, making correlation of transcriptomic features broadly accessible.

Assuntos

Proteínas de Ligação a RNA , Transcriptoma , Transcriptoma/genética , Proteínas de Ligação a RNA/genética , Sítios de Ligação/genética , RNA/genética , Ligação Proteica

17.

Newly Identified Members of FGFR1 Splice Variants Engage in Cross-talk with AXL/AKT Axis in Salivary Adenoid Cystic Carcinoma.

Humtsoe, Joseph O; Kim, Hyun-Su; Leonard, Brandon; Ling, Shizhang; Keam, Bhumsuk; Marchionni, Luigi; Afsari, Bahman; Considine, Michael; Favorov, Alexander V; Fertig, Elana J; Kang, Hyunseok; Ha, Patrick K.

Cancer Res ; 81(4): 1001-1013, 2021 02 15.

Artigo em Inglês | MEDLINE | ID: mdl-33408119

RESUMO

Adenoid cystic carcinoma (ACC) is the second most common malignancy of the salivary gland. Although characterized as an indolent tumor, ACC often leads to incurable metastatic disease. Patients with ACC respond poorly to currently available therapeutic drugs and factors contributing to the limited response remain unknown. Determining the role of molecular alterations frequently occurring in ACC may clarify ACC tumorigenesis and advance the development of effective treatment strategies. Applying Splice Expression Variant Analysis and outlier statistics on RNA sequencing of primary ACC tumors and matched normal salivary gland tissues, we identified multiple alternative splicing events (ASE) of genes specific to ACC. In ACC cells and patient-derived xenografts, FGFR1 was a uniquely expressed ASE. Detailed PCR analysis identified three novel, truncated, intracellular domain-lacking FGFR1 variants (FGFR1v). Cloning and expression analysis suggest that the three FGFR1v are cell surface proteins, that expression of FGFR1v augmented pAKT activity, and that cells became more resistant to pharmacologic FGFR1 inhibitor. FGFR1v-induced AKT activation was associated with AXL function, and inhibition of AXL activity in FGFR1v knockdown cells led to enhanced cytotoxicity in ACC. Moreover, cell killing effect was increased by dual inhibition of AXL and FGFR1 in ACC cells. This study demonstrates that these previously undescribed FGFR1v cooperate with AXL and desensitize cells to FGFR1 inhibitor, which supports further investigation into combined FGFR1 and AXL inhibition as an effective ACC therapy.This study identifies several FGFR1 variants that function through the AXL/AKT signaling pathway independent of FGF/FGFR1, desensitizing cells to FGFR1 inhibitor suggestive of a potential resistance mechanism in ACC. SIGNIFICANCE: This study identifies several FGFR1 variants that function through the AXL/AKT signaling pathway independent of FGF/FGFR1, desensitizing cells to FGFR1 inhibitor, suggestive of a potential resistance mechanism in ACC.

Assuntos

Carcinoma Adenoide Cístico/genética , Receptor Tipo 1 de Fator de Crescimento de Fibroblastos/genética , Receptor Tipo 1 de Fator de Crescimento de Fibroblastos/metabolismo , Neoplasias das Glândulas Salivares/genética , Animais , Carcinoma Adenoide Cístico/metabolismo , Carcinoma Adenoide Cístico/patologia , Linhagem Celular Tumoral , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Camundongos , Camundongos Endogâmicos NOD , Camundongos Transgênicos , Isoformas de Proteínas/genética , Isoformas de Proteínas/isolamento & purificação , Isoformas de Proteínas/metabolismo , Proteínas Proto-Oncogênicas/genética , Proteínas Proto-Oncogênicas/metabolismo , Proteínas Proto-Oncogênicas c-akt/genética , Proteínas Proto-Oncogênicas c-akt/metabolismo , Receptor Cross-Talk/fisiologia , Receptores Proteína Tirosina Quinases/genética , Receptores Proteína Tirosina Quinases/metabolismo , Receptor Tipo 1 de Fator de Crescimento de Fibroblastos/isolamento & purificação , Neoplasias das Glândulas Salivares/metabolismo , Neoplasias das Glândulas Salivares/patologia , Glândulas Salivares/metabolismo , Glândulas Salivares/patologia , Transdução de Sinais/genética , Receptor Tirosina Quinase Axl

18.

Landscape of allele-specific transcription factor binding in the human genome.

Abramov, Sergey; Boytsov, Alexandr; Bykova, Daria; Penzar, Dmitry D; Yevshin, Ivan; Kolmykov, Semyon K; Fridman, Marina V; Favorov, Alexander V; Vorontsov, Ilya E; Baulin, Eugene; Kolpakov, Fedor; Makeev, Vsevolod J; Kulakovskiy, Ivan V.

Nat Commun ; 12(1): 2751, 2021 05 12.

Artigo em Inglês | MEDLINE | ID: mdl-33980847

RESUMO

Sequence variants in gene regulatory regions alter gene expression and contribute to phenotypes of individual cells and the whole organism, including disease susceptibility and progression. Single-nucleotide variants in enhancers or promoters may affect gene transcription by altering transcription factor binding sites. Differential transcription factor binding in heterozygous genomic loci provides a natural source of information on such regulatory variants. We present a novel approach to call the allele-specific transcription factor binding events at single-nucleotide variants in ChIP-Seq data, taking into account the joint contribution of aneuploidy and local copy number variation, that is estimated directly from variant calls. We have conducted a meta-analysis of more than 7 thousand ChIP-Seq experiments and assembled the database of allele-specific binding events listing more than half a million entries at nearly 270 thousand single-nucleotide polymorphisms for several hundred human transcription factors and cell types. These polymorphisms are enriched for associations with phenotypes of medical relevance and often overlap eQTLs, making candidates for causality by linking variants with molecular mechanisms. Specifically, there is a special class of switching sites, where different transcription factors preferably bind alternative alleles, thus revealing allele-specific rewiring of molecular circuitry.

Assuntos

Alelos , Genoma Humano , Sequências Reguladoras de Ácido Nucleico/genética , Fatores de Transcrição/metabolismo , Cromatina/metabolismo , Bases de Dados Genéticas , Dosagem de Genes , Regulação da Expressão Gênica/genética , Estudo de Associação Genômica Ampla , Humanos , Motivos de Nucleotídeos , Fenótipo , Polimorfismo de Nucleotídeo Único , Ligação Proteica , Locos de Características Quantitativas

19.

Motif discovery and motif finding from genome-mapped DNase footprint data.

Kulakovskiy, Ivan V; Favorov, Alexander V; Makeev, Vsevolod J.

Bioinformatics ; 25(18): 2318-25, 2009 Sep 15.

Artigo em Inglês | MEDLINE | ID: mdl-19605419

RESUMO

MOTIVATION: Footprint data is an important source of information on transcription factor recognition motifs. However, a footprinting fragment can contain no sequences similar to known protein recognition sites. Inspection of genome fragments nearby can help to identify missing site positions. RESULTS: Genome fragments containing footprints were supplied to a pipeline that constructed a position weight matrix (PWM) for different motif lengths and selected the optimal PWM. Fragments were aligned with the SeSiMCMC sampler and a new heuristic algorithm, Bigfoot. Footprints with missing hits were found for approximately 50% of factors. Adding only 2 bp on both sides of a footprinting fragment recovered most hits. We automatically constructed motifs for 41 Drosophila factors. New motifs can recognize footprints with a greater sensitivity at the same false positive rate than existing models. Also we discuss possible overfitting of constructed motifs. AVAILABILITY: Software and the collection of regulatory motifs are freely available at http://line.imb.ac.ru/DMMPMM.

Assuntos

Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Desoxirribonucleases/química , Algoritmos , Animais , Sequência de Bases , Drosophila/genética , Dados de Sequência Molecular , Software

20.

Extracellular Vesicles Released by Tumor Endothelial Cells Spread Immunosuppressive and Transforming Signals Through Various Recipient Cells.

Lopatina, Tatiana; Favaro, Enrica; Danilova, Ludmila; Fertig, Elana J; Favorov, Alexander V; Kagohara, Luciane T; Martone, Tiziana; Bussolati, Benedetta; Romagnoli, Renato; Albera, Roberto; Pecorari, Giancarlo; Brizzi, Maria Felice; Camussi, Giovanni; Gaykalova, Daria A.

Front Cell Dev Biol ; 8: 698, 2020.

Artigo em Inglês | MEDLINE | ID: mdl-33015029

RESUMO

Head and neck squamous cell carcinoma (HNSCC) has a high recurrence and metastatic rate with an unknown mechanism of cancer spread. Tumor inflammation is the most critical processes of cancer onset, growth, and metastasis. We hypothesize that the release of extracellular vesicles (EVs) by tumor endothelial cells (TECs) induce reprogramming of immune cells as well as stromal cells to create an immunosuppressive microenvironment that favor tumor spread. We call this mechanism as non-metastatic contagious carcinogenesis. Extracellular vesicles were collected from primary HNSCC-derived endothelial cells (TEC-EV) and were used for stimulation of peripheral blood mononuclear cells (PBMCs) and primary adipose mesenchymal stem cells (ASCs). Regulation of ASC gene expression was investigated by RNA sequencing and protein array. PBMC, stimulated with TEC-EV, were analyzed by enzyme-linked immunosorbent assay and fluorescence-activated cell sorting. We validated in vitro the effects of TEC-EV on ASCs or PBMC by measuring invasion, adhesion, and proliferation. We found and confirmed that TEC-EV were able to change ASC inflammatory gene expression signature within 24-48 h. TEC-EV were also able to enhance the secretion of TGF-ß1 and IL-10 by PBMC and to increase T regulatory cell (Treg) expansion. TEC-EV carry specific proteins and RNAs that are responsible for Treg differentiation and immune suppression. ASCs and PBMC, treated with TEC-EV, enhanced proliferation, adhesion of tumor cells, and their invasion. These data indicate that TEC-EV exhibit a mechanism of non-metastatic contagious carcinogenesis that regulates tumor microenvironment and reprograms immune cells to sustain tumor growth and progression.

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA