Búsqueda | Portal de Búsqueda de la BVS España

1.

HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors.

Vorontsov, Ilya E; Eliseeva, Irina A; Zinkevich, Arsenii; Nikonov, Mikhail; Abramov, Sergey; Boytsov, Alexandr; Kamenets, Vasily; Kasianova, Alexandra; Kolmykov, Semyon; Yevshin, Ivan S; Favorov, Alexander; Medvedeva, Yulia A; Jolma, Arttu; Kolpakov, Fedor; Makeev, Vsevolod J; Kulakovskiy, Ivan V.

Nucleic Acids Res ; 52(D1): D154-D163, 2024 Jan 05.

Artículo en Inglés | MEDLINE | ID: mdl-37971293

RESUMEN

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.

Asunto(s)

Bases de Datos Genéticas , Regulación de la Expresión Génica , Dominios y Motivos de Interacción de Proteínas , Factores de Transcripción , Animales , Humanos , Ratones , Sitios de Unión/genética , Motivos de Nucleótidos , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Internet , Dominios y Motivos de Interacción de Proteínas/genética

2.

Recounting the FANTOM CAGE-Associated Transcriptome.

Imada, Eddie Luidy; Sanchez, Diego Fernando; Collado-Torres, Leonardo; Wilks, Christopher; Matam, Tejasvi; Dinalankara, Wikum; Stupnikov, Aleksey; Lobo-Pereira, Francisco; Yip, Chi-Wai; Yasuzawa, Kayoko; Kondo, Naoto; Itoh, Masayoshi; Suzuki, Harukazu; Kasukawa, Takeya; Hon, Chung-Chau; de Hoon, Michiel J L; Shin, Jay W; Carninci, Piero; Jaffe, Andrew E; Leek, Jeffrey T; Favorov, Alexander; Franco, Gloria R; Langmead, Ben; Marchionni, Luigi.

Genome Res ; 30(7): 1073-1081, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32079618

RESUMEN

Long noncoding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes, including human diseases. We present here FC-R2, a comprehensive expression atlas across a broadly defined human transcriptome, inclusive of over 109,000 coding and noncoding genes, as described in the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) study. This atlas greatly extends the gene annotation used in the original recount2 resource. We demonstrate the utility of the FC-R2 atlas by reproducing key findings from published large studies and by generating new results across normal and diseased human samples. In particular, we (a) identify tissue-specific transcription profiles for distinct classes of coding and noncoding genes, (b) perform differential expression analysis across thirteen cancer types, identifying novel noncoding genes potentially involved in tumor pathogenesis and progression, and (c) confirm the prognostic value for several enhancer lncRNAs expression in cancer. Our resource is instrumental for the systematic molecular characterization of lncRNA by the FANTOM6 Consortium. In conclusion, comprised of over 70,000 samples, the FC-R2 atlas will empower other researchers to investigate functions and biological roles of both known coding genes and novel lncRNAs.

Asunto(s)

Transcriptoma , Bases de Datos Genéticas , Elementos de Facilitación Genéticos , Perfilación de la Expresión Génica , Genoma Humano , Humanos , Neoplasias/genética , Especificidad de Órganos , Pronóstico , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , ARN Mensajero/metabolismo

3.

Functional annotation of human long noncoding RNAs via molecular phenotyping.

Ramilowski, Jordan A; Yip, Chi Wai; Agrawal, Saumya; Chang, Jen-Chien; Ciani, Yari; Kulakovskiy, Ivan V; Mendez, Mickaël; Ooi, Jasmine Li Ching; Ouyang, John F; Parkinson, Nick; Petri, Andreas; Roos, Leonie; Severin, Jessica; Yasuzawa, Kayoko; Abugessaisa, Imad; Akalin, Altuna; Antonov, Ivan V; Arner, Erik; Bonetti, Alessandro; Bono, Hidemasa; Borsari, Beatrice; Brombacher, Frank; Cameron, Christopher JF; Cannistraci, Carlo Vittorio; Cardenas, Ryan; Cardon, Melissa; Chang, Howard; Dostie, Josée; Ducoli, Luca; Favorov, Alexander; Fort, Alexandre; Garrido, Diego; Gil, Noa; Gimenez, Juliette; Guler, Reto; Handoko, Lusy; Harshbarger, Jayson; Hasegawa, Akira; Hasegawa, Yuki; Hashimoto, Kosuke; Hayatsu, Norihito; Heutink, Peter; Hirose, Tetsuro; Imada, Eddie L; Itoh, Masayoshi; Kaczkowski, Bogumil; Kanhere, Aditi; Kawabata, Emily; Kawaji, Hideya; Kawashima, Tsugumi.

Genome Res ; 30(7): 1060-1072, 2020 07.

Artículo en Inglés | MEDLINE | ID: mdl-32718982

RESUMEN

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.

Asunto(s)

ARN Largo no Codificante/fisiología , Procesos de Crecimiento Celular/genética , Movimiento Celular/genética , Fibroblastos/citología , Fibroblastos/metabolismo , Humanos , Canales de Potasio KCNQ/metabolismo , Anotación de Secuencia Molecular , Oligonucleótidos Antisentido , ARN Largo no Codificante/antagonistas & inhibidores , ARN Largo no Codificante/metabolismo , ARN Interferente Pequeño

4.

Matrix factorization and transfer learning uncover regulatory biology across multiple single-cell ATAC-seq data sets.

Erbe, Rossin; Kessler, Michael D; Favorov, Alexander V; Easwaran, Hariharan; Gaykalova, Daria A; Fertig, Elana J.

Nucleic Acids Res ; 48(12): e68, 2020 07 09.

Artículo en Inglés | MEDLINE | ID: mdl-32392348

RESUMEN

While the methods available for single-cell ATAC-seq analysis are well optimized for clustering cell types, the question of how to integrate multiple scATAC-seq data sets and/or sequencing modalities is still open. We present an analysis framework that enables such integration across scATAC-seq data sets by applying the CoGAPS Matrix Factorization algorithm and the projectR transfer learning program to identify common regulatory patterns across scATAC-seq data sets. We additionally integrate our analysis with scRNA-seq data to identify orthogonal evidence for transcriptional regulators predicted by scATAC-seq analysis. Using publicly available scATAC-seq data, we find patterns that accurately characterize cell types both within and across data sets. Furthermore, we demonstrate that these patterns are both consistent with current biological understanding and reflective of novel regulatory biology.

Asunto(s)

Algoritmos , Secuenciación de Inmunoprecipitación de Cromatina/métodos , Perfilación de la Expresión Génica/métodos , Análisis de la Célula Individual/métodos , Animales , Cromatina/genética , Conjuntos de Datos como Asunto , Humanos , Aprendizaje Automático

5.

Allele-specific nonstationarity in evolution of influenza A virus surface proteins.

Popova, Anfisa V; Safina, Ksenia R; Ptushenko, Vasily V; Stolyarova, Anastasia V; Favorov, Alexander V; Neverov, Alexey D; Bazykin, Georgii A.

Proc Natl Acad Sci U S A ; 116(42): 21104-21112, 2019 10 15.

Artículo en Inglés | MEDLINE | ID: mdl-31578251

RESUMEN

Influenza A virus (IAV) is a major public health problem and a pandemic threat. Its evolution is largely driven by diversifying positive selection so that relative fitness of different amino acid variants changes with time due to changes in herd immunity or genomic context, and novel amino acid variants attain fitness advantage. Here, we hypothesize that diversifying selection also has another manifestation: the fitness associated with a particular amino acid variant should decline with time since its origin, as the herd immunity adapts to it. By tracing the evolution of antigenic sites at IAV surface proteins, we show that an amino acid variant becomes progressively more likely to become replaced by another variant with time since its origin-a phenomenon we call "senescence." Senescence is particularly pronounced at experimentally validated antigenic sites, implying that it is largely driven by host immunity. By contrast, at internal sites, existing variants become more favorable with time, probably due to arising contingent mutations at other epistatically interacting sites. Our findings reveal a previously undescribed facet of adaptive evolution and suggest approaches for prediction of evolutionary dynamics of pathogens.

Asunto(s)

Aminoácidos/genética , Virus de la Influenza A/genética , Proteínas de la Membrana/genética , Proteínas Virales/genética , Alelos , Aminoácidos/inmunología , Antígenos Virales/genética , Antígenos Virales/inmunología , Evolución Molecular , Variación Genética/genética , Variación Genética/inmunología , Virus de la Influenza A/inmunología , Proteínas de la Membrana/inmunología , Pandemias , Proteínas Virales/inmunología

6.

Enter the Matrix: Factorization Uncovers Knowledge from Omics.

Stein-O'Brien, Genevieve L; Arora, Raman; Culhane, Aedin C; Favorov, Alexander V; Garmire, Lana X; Greene, Casey S; Goff, Loyal A; Li, Yifeng; Ngom, Aloune; Ochs, Michael F; Xu, Yanxun; Fertig, Elana J.

Trends Genet ; 34(10): 790-805, 2018 10.

Artículo en Inglés | MEDLINE | ID: mdl-30143323

RESUMEN

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.

Asunto(s)

Interpretación Estadística de Datos , Genómica/estadística & datos numéricos , Proteómica/estadística & datos numéricos , Algoritmos , Humanos , Biología de Sistemas/estadística & datos numéricos

7.

Coloc-stats: a unified web interface to perform colocalization analysis of genomic features.

Simovski, Boris; Kanduri, Chakravarthi; Gundersen, Sveinung; Titov, Dmytro; Domanska, Diana; Bock, Christoph; Bossini-Castillo, Lara; Chikina, Maria; Favorov, Alexander; Layer, Ryan M; Mironov, Andrey A; Quinlan, Aaron R; Sheffield, Nathan C; Trynka, Gosia; Sandve, Geir K.

Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.

Artículo en Inglés | MEDLINE | ID: mdl-29873782

RESUMEN

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.

Asunto(s)

Genómica/métodos , Programas Informáticos , Inmunoprecipitación de Cromatina , Factor de Transcripción GATA1/metabolismo , Internet , Análisis de Secuencia de ADN , Interfaz Usuario-Computador

8.

Inferring causal molecular networks: empirical assessment through a community-based effort.

Hill, Steven M; Heiser, Laura M; Cokelaer, Thomas; Unger, Michael; Nesser, Nicole K; Carlin, Daniel E; Zhang, Yang; Sokolov, Artem; Paull, Evan O; Wong, Chris K; Graim, Kiley; Bivol, Adrian; Wang, Haizhou; Zhu, Fan; Afsari, Bahman; Danilova, Ludmila V; Favorov, Alexander V; Lee, Wai Shing; Taylor, Dane; Hu, Chenyue W; Long, Byron L; Noren, David P; Bisberg, Alexander J; Mills, Gordon B; Gray, Joe W; Kellen, Michael; Norman, Thea; Friend, Stephen; Qutub, Amina A; Fertig, Elana J; Guan, Yuanfang; Song, Mingzhou; Stuart, Joshua M; Spellman, Paul T; Koeppl, Heinz; Stolovitzky, Gustavo; Saez-Rodriguez, Julio; Mukherjee, Sach.

Nat Methods ; 13(4): 310-8, 2016 Apr.

Artículo en Inglés | MEDLINE | ID: mdl-26901648

RESUMEN

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.

Asunto(s)

Causalidad , Redes Reguladoras de Genes , Neoplasias/genética , Mapeo de Interacción de Proteínas/métodos , Programas Informáticos , Biología de Sistemas , Algoritmos , Biología Computacional , Simulación por Computador , Perfilación de la Expresión Génica , Humanos , Modelos Biológicos , Transducción de Señal , Células Tumorales Cultivadas

9.

Splice Expression Variation Analysis (SEVA) for inter-tumor heterogeneity of gene isoform usage in cancer.

Afsari, Bahman; Guo, Theresa; Considine, Michael; Florea, Liliana; Kagohara, Luciane T; Stein-O'Brien, Genevieve L; Kelley, Dylan; Flam, Emily; Zambo, Kristina D; Ha, Patrick K; Geman, Donald; Ochs, Michael F; Califano, Joseph A; Gaykalova, Daria A; Favorov, Alexander V; Fertig, Elana J.

Bioinformatics ; 34(11): 1859-1867, 2018 06 01.

Artículo en Inglés | MEDLINE | ID: mdl-29342249

RESUMEN

Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data. Availability and implementation: SEVA is implemented in the R/Bioconductor package GSReg. Contact: bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu. Supplementary information: Supplementary data are available at Bioinformatics online.

Asunto(s)

Empalme Alternativo , Neoplasias/genética , Isoformas de Proteínas/genética , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Biología Computacional/métodos , Regulación Neoplásica de la Expresión Génica , Neoplasias de Cabeza y Cuello/genética , Humanos , Modelos Genéticos

10.

Discovery and development of differentially methylated regions in human papillomavirus-related oropharyngeal squamous cell carcinoma.

Ren, Shuling; Gaykalova, Daria; Wang, Jennifer; Guo, Theresa; Danilova, Ludmila; Favorov, Alexander; Fertig, Elana; Bishop, Justin; Khan, Zubair; Flam, Emily; Wysocki, Piotr T; DeJong, Peter; Ando, Mizuo; Liu, Chao; Sakai, Akihiro; Fukusumi, Takahito; Haft, Sunny; Sadat, Sayed; Califano, Joseph A.

Int J Cancer ; 143(10): 2425-2436, 2018 11 15.

Artículo en Inglés | MEDLINE | ID: mdl-30070359

RESUMEN

Human papillomavirus (HPV)-related oropharyngeal squamous cell carcinoma (OPSCC) exhibits a different composition of epigenetic alterations. In this study, we identified differentially methylated regions (DMRs) with potential utility in screening for HPV-positive OPSCC. Genome wide DNA methylation was measured using methyl-CpG binding domain protein-enriched genome sequencing (MBD-seq) in 50 HPV-positive OPSCC tissues and 25 normal tissues. Fifty-one DMRs were defined with maximal methylation specificity to cancer samples. The Cancer Genome Atlas (TCGA) methylation array data was used to evaluate the performance of the proposed candidates. Supervised hierarchical clustering of 51 DMRs found that HPV-positive OPSCC had significantly higher DNA methylation levels compared to normal samples, and non-HPV-related head and neck squamous cell carcinoma (HNSCC). The methylation levels of all top 20 DNA methylation biomarkers in HPV-positive OPSCC were significantly higher than those in normal samples. Further confirmation using quantitative methylation specific PCR (QMSP) in an independent set of 24 HPV-related OPSCCs and 22 controls showed that 16 of the 20 candidates had significant higher methylation levels in HPV-positive OPSCC samples compared with controls. One candidate, OR6S1, had a sensitivity of 100%, while 17 candidates (KCNA3, EMBP1, CCDC181, DPP4, ITGA4, BEND4, ELMO1, SFMBT2, C1QL3, MIR129-2, NID2, HOXB4, ZNF439, ZNF93, VSTM2B, ZNF137P and ZNF773) had specificities of 100%. The prediction accuracy of the 20 candidates rang from 56.2% to 99.8% by receiver operating characteristic analysis. We have defined 20 highly specific DMRs in HPV-related OPSCC, which can potentially be applied to molecular-based detection tests and improve disease management.

Asunto(s)

Metilación de ADN , Neoplasias Orofaríngeas/genética , Neoplasias Orofaríngeas/virología , Infecciones por Papillomavirus/genética , Carcinoma de Células Escamosas de Cabeza y Cuello/genética , Carcinoma de Células Escamosas de Cabeza y Cuello/virología , Biomarcadores de Tumor/genética , Estudios de Casos y Controles , Estudios de Cohortes , Epigénesis Genética , Femenino , Humanos , Masculino , Persona de Mediana Edad , Neoplasias Orofaríngeas/patología , Papillomaviridae/aislamiento & purificación , Infecciones por Papillomavirus/patología , Carcinoma de Células Escamosas de Cabeza y Cuello/patología

11.

StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data.

Stavrovskaya, Elena D; Niranjan, Tejasvi; Fertig, Elana J; Wheelan, Sarah J; Favorov, Alexander V; Mironov, Andrey A.

Bioinformatics ; 33(20): 3158-3165, 2017 Oct 15.

Artículo en Inglés | MEDLINE | ID: mdl-29028265

RESUMEN

MOTIVATION: Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. RESULTS: Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. AVAILABILITY AND IMPLEMENTATION: The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. CONTACT: favorov@sensi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Regulación de la Expresión Génica , Genómica/métodos , Análisis de Secuencia de ADN/métodos , Programas Informáticos , Algoritmos , Inmunoprecipitación de Cromatina/métodos , Epigenómica/métodos , Genoma Humano , Humanos

12.

PatternMarkers & GWCoGAPS for novel data-driven biomarkers via whole transcriptome NMF.

Stein-O'Brien, Genevieve L; Carey, Jacob L; Lee, Wai Shing; Considine, Michael; Favorov, Alexander V; Flam, Emily; Guo, Theresa; Li, Sijia; Marchionni, Luigi; Sherman, Thomas; Sivy, Shawn; Gaykalova, Daria A; McKay, Ronald D; Ochs, Michael F; Colantuoni, Carlo; Fertig, Elana J.

Bioinformatics ; 33(12): 1892-1894, 2017 Jun 15.

Artículo en Inglés | MEDLINE | ID: mdl-28174896

RESUMEN

SUMMARY: Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. AVAILABILITY AND IMPLEMENTATION: PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. CONTACT: gsteinobrien@jhmi.edu or ccolantu@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

Asunto(s)

Algoritmos , Perfilación de la Expresión Génica/métodos , Programas Informáticos , Teorema de Bayes , Biomarcadores , Humanos , Análisis de Secuencia de ARN/métodos

13.

Genetic risk factors for myocardial infarction more clearly manifest for early age of first onset.

Titov, Boris V; Osmak, German J; Matveeva, Natalia A; Kukava, Nino G; Shakhnovich, Roman M; Favorov, Alexander V; Ruda, Mikhail Ya; Favorova, Olga O.

Mol Biol Rep ; 44(4): 315-321, 2017 Aug.

Artículo en Inglés | MEDLINE | ID: mdl-28685248

RESUMEN

Epidemiological genetics established that heritability in determining the risk of myocardial infarction (MI) is substantially greater when MI occurs early in life. However, the genetic architecture of early-onset and late-onset MI was not compared. We analyzed genotype frequencies of SNPs in/near 20 genes whose protein products are involved in the pathogenesis of atherosclerosis in two groups of Russian patients with MI: the first group included patients with age of first MI onset <60 years (N = 230) and the second group with onset ≥60 years (N = 174). The control group of corresponding ethnicity consisted of 193 unrelated volunteers without cardiovascular diseases (93 individuals were over 60 years). We found that in the group of patients with age of onset <60 years, SNPs FGB rs1800788*T, TGFB1 rs1982073*T/T, ENOS rs2070744*C and CRP rs1130864*T/T were associated with risk of MI, whereas in patients with age of onset ≥60 years, only TGFB1 rs1982073*T/T was associated with risk of MI. Using APSampler software, we found composite markers associated with MI only in patients with early onset: FGB rs1800788*T + TGFB1 rs1982073*T; FGB rs1800788*T + LPL rs328*C + IL4 rs2243250*C; FGB rs1800788*T + ENOS rs2070744*C (Fisher p values of 1.4 × 10-6 to 2.2 × 10-5; the permutation p values of 1.1 × 10-5 to 3.0 × 10-4; ORs = 2.67-2.54). Alleles included in the combinations were associated with MI less significantly and with lower ORs than the combinations themselves. The result showed a substantially greater contribution of the genetic component in the development of MI if it occurs early in life, and demonstrated the usefulness of genetic testing for young people.

Asunto(s)

Aterosclerosis/genética , Infarto del Miocardio/genética , Adulto , Factores de Edad , Anciano , Anciano de 80 o más Años , Alelos , Biomarcadores/sangre , Femenino , Frecuencia de los Genes/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad/genética , Humanos , Masculino , Persona de Mediana Edad , Infarto del Miocardio/epidemiología , Polimorfismo de Nucleótido Simple/genética , Factores de Riesgo , Federación de Rusia

14.

Natural variation of gene models in Drosophila melanogaster.

Kurmangaliyev, Yerbol Z; Favorov, Alexander V; Osman, Noha M; Lehmann, Kjong-Van; Campo, Daniel; Salomon, Matthew P; Tower, John; Gelfand, Mikhail S; Nuzhdin, Sergey V.

BMC Genomics ; 16: 198, 2015 Mar 17.

Artículo en Inglés | MEDLINE | ID: mdl-25888292

RESUMEN

BACKGROUND: Variation within splicing regulatory sequences often leads to differences in gene models among individuals within a species. Two alleles of the same gene may express transcripts with different exon/intron structures and consequently produce functionally different proteins. Matching genomic and transcriptomic data allows us to identify putative regulatory variants associated with changes in splicing patterns. RESULTS: Here we analyzed natural variation of splicing patterns in the transcriptomes of 81 natural strains of Drosophila melanogaster with known genotypes. We identified dozens of genotype-specific splicing patterns associated with putative cis-splicing quantitative trait loci (sQTL). The majority of changes can be explained by mutations in splice sites. Allelic-imbalance in splicing patterns confirmed that the majority are regulated mainly by cis-genetic effects. Remarkably, allele-specific splicing changes often lead to qualitative changes in gene models, yielding many isoforms not previously annotated. The observed alterations are typically outside protein-coding regions or affect only very short protein segments. CONCLUSIONS: Overall, the sets of gene models appear to be flexible within D. melanogaster populations. The observed variation in splicing patterns are predicted to have limited effects on the encoded protein sequences. To our knowledge, this is the first sQTL mapping study in Drosophila.

Asunto(s)

Drosophila melanogaster/genética , Variación Genética , Modelos Genéticos , Alelos , Desequilibrio Alélico , Empalme Alternativo , Animales , Exones , Perfilación de la Expresión Génica , Genotipo , Sistemas de Lectura Abierta , Polimorfismo de Nucleótido Simple , Sitios de Carácter Cuantitativo , Sitios de Empalme de ARN , Transcriptoma

15.

Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction.

Parker, Hilary S; Leek, Jeffrey T; Favorov, Alexander V; Considine, Michael; Xia, Xiaoxin; Chavan, Sameer; Chung, Christine H; Fertig, Elana J.

Bioinformatics ; 30(19): 2757-63, 2014 Oct.

Artículo en Inglés | MEDLINE | ID: mdl-24907368

RESUMEN

MOTIVATION: Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori. RESULTS: Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict Human Papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set. AVAILABILITY AND IMPLEMENTATION: All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https://sourceforge.net/projects/psva.

Asunto(s)

Algoritmos , Genómica/métodos , Neoplasias de Cabeza y Cuello/genética , Infecciones por Papillomavirus/diagnóstico , Artefactos , Biología Computacional/métodos , Neoplasias de Cabeza y Cuello/virología , Humanos , Modelos Estadísticos , Reproducibilidad de los Resultados , Programas Informáticos

16.

CORECLUST: identification of the conserved CRM grammar together with prediction of gene regulation.

Nikulova, Anna A; Favorov, Alexander V; Sutormin, Roman A; Makeev, Vsevolod J; Mironov, Andrey A.

Nucleic Acids Res ; 40(12): e93, 2012 Jul.

Artículo en Inglés | MEDLINE | ID: mdl-22422836

RESUMEN

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.

Asunto(s)

Regulación de la Expresión Génica , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN , Algoritmos , Animales , Tipificación del Cuerpo/genética , Drosophila/embriología , Drosophila/genética , Drosophila/metabolismo , Elementos de Facilitación Genéticos , Regulación del Desarrollo de la Expresión Génica , Músculos/metabolismo , Posición Específica de Matrices de Puntuación , Programas Informáticos

17.

Exploring massive, genome scale datasets with the GenometriCorr package.

Favorov, Alexander; Mularoni, Loris; Cope, Leslie M; Medvedeva, Yulia; Mironov, Andrey A; Makeev, Vsevolod J; Wheelan, Sarah J.

PLoS Comput Biol ; 8(5): e1002529, 2012 May.

Artículo en Inglés | MEDLINE | ID: mdl-22693437

RESUMEN

UNLABELLED: We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. AVAILABILITY AND IMPLEMENTATION: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.

Asunto(s)

Bases de Datos Genéticas , Genómica/métodos , Almacenamiento y Recuperación de la Información , Modelos Genéticos , Modelos Estadísticos , Programas Informáticos , Animales , Cromosomas , Epigenómica , Sitios Genéticos , Genoma , Humanos , Internet , ARN de Transferencia/genética , Estadísticas no Paramétricas , Interfaz Usuario-Computador

18.

Uncovering the spatial landscape of molecular interactions within the tumor microenvironment through latent spaces.

Deshpande, Atul; Loth, Melanie; Sidiropoulos, Dimitrios N; Zhang, Shuming; Yuan, Long; Bell, Alexander T F; Zhu, Qingfeng; Ho, Won Jin; Santa-Maria, Cesar; Gilkes, Daniele M; Williams, Stephen R; Uytingco, Cedric R; Chew, Jennifer; Hartnett, Andrej; Bent, Zachary W; Favorov, Alexander V; Popel, Aleksander S; Yarchoan, Mark; Kiemen, Ashley; Wu, Pei-Hsun; Fujikura, Kohei; Wirtz, Denis; Wood, Laura D; Zheng, Lei; Jaffee, Elizabeth M; Anders, Robert A; Danilova, Ludmila; Stein-O'Brien, Genevieve; Kagohara, Luciane T; Fertig, Elana J.

Cell Syst ; 14(4): 285-301.e4, 2023 04 19.

Artículo en Inglés | MEDLINE | ID: mdl-37080163

RESUMEN

Recent advances in spatial transcriptomics (STs) enable gene expression measurements from a tissue sample while retaining its spatial context. This technology enables unprecedented in situ resolution of the regulatory pathways that underlie the heterogeneity in the tumor as well as the tumor microenvironment (TME). The direct characterization of cellular co-localization with spatial technologies facilities quantification of the molecular changes resulting from direct cell-cell interaction, as it occurs in tumor-immune interactions. We present SpaceMarkers, a bioinformatics algorithm to infer molecular changes from cell-cell interactions from latent space analysis of ST data. We apply this approach to infer the molecular changes from tumor-immune interactions in Visium spatial transcriptomics data of metastasis, invasive and precursor lesions, and immunotherapy treatment. Further transfer learning in matched scRNA-seq data enabled further quantification of the specific cell types in which SpaceMarkers are enriched. Altogether, SpaceMarkers can identify the location and context-specific molecular interactions within the TME from ST data.

Asunto(s)

Algoritmos , Microambiente Tumoral , Comunicación Celular , Biología Computacional , Perfilación de la Expresión Génica

19.

Investigation of the Role of PUFA Metabolism in Breast Cancer Using a Rank-Based Random Forest Algorithm.

Guryleva, Mariia V; Penzar, Dmitry D; Chistyakov, Dmitry V; Mironov, Andrey A; Favorov, Alexander V; Sergeeva, Marina G.

Cancers (Basel) ; 14(19)2022 Sep 25.

Artículo en Inglés | MEDLINE | ID: mdl-36230586

RESUMEN

Polyunsaturated fatty acid (PUFA) metabolism is currently a focus in cancer research due to PUFAs functioning as structural components of the membrane matrix, as fuel sources for energy production, and as sources of secondary messengers, so called oxylipins, important players of inflammatory processes. Although breast cancer (BC) is the leading cause of cancer death among women worldwide, no systematic study of PUFA metabolism as a system of interrelated processes in this disease has been carried out. Here, we implemented a Boruta-based feature selection algorithm to determine the list of most important PUFA metabolism genes altered in breast cancer tissues compared with in normal tissues. A rank-based Random Forest (RF) model was built on the selected gene list (33 genes) and applied to predict the cancer phenotype to ascertain the PUFA genes involved in cancerogenesis. It showed high-performance of dichotomic classification (balanced accuracy of 0.94, ROC AUC 0.99) We also retrieved a list of the important PUFA genes (46 genes) that differed between molecular subtypes at the level of breast cancer molecular subtypes. The balanced accuracy of the classification model built on the specified genes was 0.82, while the ROC AUC for the sensitivity analysis was 0.85. Specific patterns of PUFA metabolic changes were obtained for each molecular subtype of breast cancer. These results show evidence that (1) PUFA metabolism genes are critical for the pathogenesis of breast cancer; (2) BC subtypes differ in PUFA metabolism genes expression; and (3) the lists of genes selected in the models are enriched with genes involved in the metabolism of signaling lipids.

20.

Deconvolution of B cell receptor repertoire in multiple sclerosis patients revealed a delay in tBreg maturation.

Lomakin, Yakov A; Zvyagin, Ivan V; Ovchinnikova, Leyla A; Kabilov, Marsel R; Staroverov, Dmitriy B; Mikelov, Artem; Tupikin, Alexey E; Zakharova, Maria Y; Bykova, Nadezda A; Mukhina, Vera S; Favorov, Alexander V; Ivanova, Maria; Simaniv, Taras; Rubtsov, Yury P; Chudakov, Dmitriy M; Zakharova, Maria N; Illarioshkin, Sergey N; Belogurov, Alexey A; Gabibov, Alexander G.

Front Immunol ; 13: 803229, 2022.

Artículo en Inglés | MEDLINE | ID: mdl-36052064

RESUMEN

Background: B lymphocytes play a pivotal regulatory role in the development of the immune response. It was previously shown that deficiency in B regulatory cells (Bregs) or a decrease in their anti-inflammatory activity can lead to immunological dysfunctions. However, the exact mechanisms of Bregs development and functioning are only partially resolved. For instance, only a little is known about the structure of their B cell receptor (BCR) repertoires in autoimmune disorders, including multiple sclerosis (MS), a severe neuroinflammatory disease with a yet unknown etiology. Here, we elucidate specific properties of B regulatory cells in MS. Methods: We performed a prospective study of the transitional Breg (tBreg) subpopulations with the CD19+CD24highCD38high phenotype from MS patients and healthy donors by (i) measuring their content during two diverging courses of relapsing-remitting MS: benign multiple sclerosis (BMS) and highly active multiple sclerosis (HAMS); (ii) analyzing BCR repertoires of circulating B cells by high-throughput sequencing; and (iii) measuring the percentage of CD27+ cells in tBregs. Results: The tBregs from HAMS patients carry the heavy chain with a lower amount of hypermutations than tBregs from healthy donors. The percentage of transitional CD24highCD38high B cells is elevated, whereas the frequency of differentiated CD27+ cells in this transitional B cell subset was decreased in the MS patients as compared with healthy donors. Conclusions: Impaired maturation of regulatory B cells is associated with MS progression.

Asunto(s)

Linfocitos B Reguladores , Esclerosis Múltiple , Humanos , Interleucina-10 , Estudios Prospectivos , Receptores de Antígenos de Linfocitos B

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

Asunto(s)

RESUMEN

RESUMEN

Asunto(s)

ENVIAR RESULTADO:

SELECCIÓN DE REFERENCIAS

DETALLE DE LA BÚSQUEDA