Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 67
Filtrar
Mais filtros

Base de dados
País/Região como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 52(D1): D154-D163, 2024 Jan 05.
Artigo em Inglês | MEDLINE | ID: mdl-37971293

RESUMO

We present a major update of the HOCOMOCO collection that provides DNA binding specificity patterns of 949 human transcription factors and 720 mouse orthologs. To make this release, we performed motif discovery in peak sets that originated from 14 183 ChIP-Seq experiments and reads from 2554 HT-SELEX experiments yielding more than 400 thousand candidate motifs. The candidate motifs were annotated according to their similarity to known motifs and the hierarchy of DNA-binding domains of the respective transcription factors. Next, the motifs underwent human expert curation to stratify distinct motif subtypes and remove non-informative patterns and common artifacts. Finally, the curated subset of 100 thousand motifs was supplied to the automated benchmarking to select the best-performing motifs for each transcription factor. The resulting HOCOMOCO v12 core collection contains 1443 verified position weight matrices, including distinct subtypes of DNA binding motifs for particular transcription factors. In addition to the core collection, HOCOMOCO v12 provides motif sets optimized for the recognition of binding sites in vivo and in vitro, and for annotation of regulatory sequence variants. HOCOMOCO is available at https://hocomoco12.autosome.org and https://hocomoco.autosome.org.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica , Domínios e Motivos de Interação entre Proteínas , Fatores de Transcrição , Animais , Humanos , Camundongos , Sítios de Ligação/genética , Motivos de Nucleotídeos , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo , Internet , Domínios e Motivos de Interação entre Proteínas/genética
2.
Genome Res ; 30(7): 1073-1081, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32079618

RESUMO

Long noncoding RNAs (lncRNAs) have emerged as key coordinators of biological and cellular processes. Characterizing lncRNA expression across cells and tissues is key to understanding their role in determining phenotypes, including human diseases. We present here FC-R2, a comprehensive expression atlas across a broadly defined human transcriptome, inclusive of over 109,000 coding and noncoding genes, as described in the FANTOM CAGE-Associated Transcriptome (FANTOM-CAT) study. This atlas greatly extends the gene annotation used in the original recount2 resource. We demonstrate the utility of the FC-R2 atlas by reproducing key findings from published large studies and by generating new results across normal and diseased human samples. In particular, we (a) identify tissue-specific transcription profiles for distinct classes of coding and noncoding genes, (b) perform differential expression analysis across thirteen cancer types, identifying novel noncoding genes potentially involved in tumor pathogenesis and progression, and (c) confirm the prognostic value for several enhancer lncRNAs expression in cancer. Our resource is instrumental for the systematic molecular characterization of lncRNA by the FANTOM6 Consortium. In conclusion, comprised of over 70,000 samples, the FC-R2 atlas will empower other researchers to investigate functions and biological roles of both known coding genes and novel lncRNAs.


Assuntos
Transcriptoma , Bases de Dados Genéticas , Elementos Facilitadores Genéticos , Perfilação da Expressão Gênica , Genoma Humano , Humanos , Neoplasias/genética , Especificidade de Órgãos , Prognóstico , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , RNA Mensageiro/metabolismo
3.
Genome Res ; 30(7): 1060-1072, 2020 07.
Artigo em Inglês | MEDLINE | ID: mdl-32718982

RESUMO

Long noncoding RNAs (lncRNAs) constitute the majority of transcripts in the mammalian genomes, and yet, their functions remain largely unknown. As part of the FANTOM6 project, we systematically knocked down the expression of 285 lncRNAs in human dermal fibroblasts and quantified cellular growth, morphological changes, and transcriptomic responses using Capped Analysis of Gene Expression (CAGE). Antisense oligonucleotides targeting the same lncRNAs exhibited global concordance, and the molecular phenotype, measured by CAGE, recapitulated the observed cellular phenotypes while providing additional insights on the affected genes and pathways. Here, we disseminate the largest-to-date lncRNA knockdown data set with molecular phenotyping (over 1000 CAGE deep-sequencing libraries) for further exploration and highlight functional roles for ZNF213-AS1 and lnc-KHDC3L-2.


Assuntos
RNA Longo não Codificante/fisiologia , Processos de Crescimento Celular/genética , Movimento Celular/genética , Fibroblastos/citologia , Fibroblastos/metabolismo , Humanos , Canais de Potássio KCNQ/metabolismo , Anotação de Sequência Molecular , Oligonucleotídeos Antissenso , RNA Longo não Codificante/antagonistas & inibidores , RNA Longo não Codificante/metabolismo , RNA Interferente Pequeno
4.
Nucleic Acids Res ; 48(12): e68, 2020 07 09.
Artigo em Inglês | MEDLINE | ID: mdl-32392348

RESUMO

While the methods available for single-cell ATAC-seq analysis are well optimized for clustering cell types, the question of how to integrate multiple scATAC-seq data sets and/or sequencing modalities is still open. We present an analysis framework that enables such integration across scATAC-seq data sets by applying the CoGAPS Matrix Factorization algorithm and the projectR transfer learning program to identify common regulatory patterns across scATAC-seq data sets. We additionally integrate our analysis with scRNA-seq data to identify orthogonal evidence for transcriptional regulators predicted by scATAC-seq analysis. Using publicly available scATAC-seq data, we find patterns that accurately characterize cell types both within and across data sets. Furthermore, we demonstrate that these patterns are both consistent with current biological understanding and reflective of novel regulatory biology.


Assuntos
Algoritmos , Sequenciamento de Cromatina por Imunoprecipitação/métodos , Perfilação da Expressão Gênica/métodos , Análise de Célula Única/métodos , Animais , Cromatina/genética , Conjuntos de Dados como Assunto , Humanos , Aprendizado de Máquina
5.
Proc Natl Acad Sci U S A ; 116(42): 21104-21112, 2019 10 15.
Artigo em Inglês | MEDLINE | ID: mdl-31578251

RESUMO

Influenza A virus (IAV) is a major public health problem and a pandemic threat. Its evolution is largely driven by diversifying positive selection so that relative fitness of different amino acid variants changes with time due to changes in herd immunity or genomic context, and novel amino acid variants attain fitness advantage. Here, we hypothesize that diversifying selection also has another manifestation: the fitness associated with a particular amino acid variant should decline with time since its origin, as the herd immunity adapts to it. By tracing the evolution of antigenic sites at IAV surface proteins, we show that an amino acid variant becomes progressively more likely to become replaced by another variant with time since its origin-a phenomenon we call "senescence." Senescence is particularly pronounced at experimentally validated antigenic sites, implying that it is largely driven by host immunity. By contrast, at internal sites, existing variants become more favorable with time, probably due to arising contingent mutations at other epistatically interacting sites. Our findings reveal a previously undescribed facet of adaptive evolution and suggest approaches for prediction of evolutionary dynamics of pathogens.


Assuntos
Aminoácidos/genética , Vírus da Influenza A/genética , Proteínas de Membrana/genética , Proteínas Virais/genética , Alelos , Aminoácidos/imunologia , Antígenos Virais/genética , Antígenos Virais/imunologia , Evolução Molecular , Variação Genética/genética , Variação Genética/imunologia , Vírus da Influenza A/imunologia , Proteínas de Membrana/imunologia , Pandemias , Proteínas Virais/imunologia
6.
Trends Genet ; 34(10): 790-805, 2018 10.
Artigo em Inglês | MEDLINE | ID: mdl-30143323

RESUMO

Omics data contain signals from the molecular, physical, and kinetic inter- and intracellular interactions that control biological systems. Matrix factorization (MF) techniques can reveal low-dimensional structure from high-dimensional data that reflect these interactions. These techniques can uncover new biological knowledge from diverse high-throughput omics data in applications ranging from pathway discovery to timecourse analysis. We review exemplary applications of MF for systems-level analyses. We discuss appropriate applications of these methods, their limitations, and focus on the analysis of results to facilitate optimal biological interpretation. The inference of biologically relevant features with MF enables discovery from high-throughput data beyond the limits of current biological knowledge - answering questions from high-dimensional data that we have not yet thought to ask.


Assuntos
Interpretação Estatística de Dados , Genômica/estatística & dados numéricos , Proteômica/estatística & dados numéricos , Algoritmos , Humanos , Biologia de Sistemas/estatística & dados numéricos
7.
Nucleic Acids Res ; 46(W1): W186-W193, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29873782

RESUMO

Functional genomics assays produce sets of genomic regions as one of their main outputs. To biologically interpret such region-sets, researchers often use colocalization analysis, where the statistical significance of colocalization (overlap, spatial proximity) between two or more region-sets is tested. Existing colocalization analysis tools vary in the statistical methodology and analysis approaches, thus potentially providing different conclusions for the same research question. As the findings of colocalization analysis are often the basis for follow-up experiments, it is helpful to use several tools in parallel and to compare the results. We developed the Coloc-stats web service to facilitate such analyses. Coloc-stats provides a unified interface to perform colocalization analysis across various analytical methods and method-specific options (e.g. colocalization measures, resolution, null models). Coloc-stats helps the user to find a method that supports their experimental requirements and allows for a straightforward comparison across methods. Coloc-stats is implemented as a web server with a graphical user interface that assists users with configuring their colocalization analyses. Coloc-stats is freely available at https://hyperbrowser.uio.no/coloc-stats/.


Assuntos
Genômica/métodos , Software , Imunoprecipitação da Cromatina , Fator de Transcrição GATA1/metabolismo , Internet , Análise de Sequência de DNA , Interface Usuário-Computador
8.
Nat Methods ; 13(4): 310-8, 2016 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-26901648

RESUMO

It remains unclear whether causal, rather than merely correlational, relationships in molecular networks can be inferred in complex biological settings. Here we describe the HPN-DREAM network inference challenge, which focused on learning causal influences in signaling networks. We used phosphoprotein data from cancer cell lines as well as in silico data from a nonlinear dynamical model. Using the phosphoprotein data, we scored more than 2,000 networks submitted by challenge participants. The networks spanned 32 biological contexts and were scored in terms of causal validity with respect to unseen interventional data. A number of approaches were effective, and incorporating known biology was generally advantageous. Additional sub-challenges considered time-course prediction and visualization. Our results suggest that learning causal relationships may be feasible in complex settings such as disease states. Furthermore, our scoring approach provides a practical way to empirically assess inferred molecular networks in a causal sense.


Assuntos
Causalidade , Redes Reguladoras de Genes , Neoplasias/genética , Mapeamento de Interação de Proteínas/métodos , Software , Biologia de Sistemas , Algoritmos , Biologia Computacional , Simulação por Computador , Perfilação da Expressão Gênica , Humanos , Modelos Biológicos , Transdução de Sinais , Células Tumorais Cultivadas
9.
Bioinformatics ; 34(11): 1859-1867, 2018 06 01.
Artigo em Inglês | MEDLINE | ID: mdl-29342249

RESUMO

Motivation: Current bioinformatics methods to detect changes in gene isoform usage in distinct phenotypes compare the relative expected isoform usage in phenotypes. These statistics model differences in isoform usage in normal tissues, which have stable regulation of gene splicing. Pathological conditions, such as cancer, can have broken regulation of splicing that increases the heterogeneity of the expression of splice variants. Inferring events with such differential heterogeneity in gene isoform usage requires new statistical approaches. Results: We introduce Splice Expression Variability Analysis (SEVA) to model increased heterogeneity of splice variant usage between conditions (e.g. tumor and normal samples). SEVA uses a rank-based multivariate statistic that compares the variability of junction expression profiles within one condition to the variability within another. Simulated data show that SEVA is unique in modeling heterogeneity of gene isoform usage, and benchmark SEVA's performance against EBSeq, DiffSplice and rMATS that model differential isoform usage instead of heterogeneity. We confirm the accuracy of SEVA in identifying known splice variants in head and neck cancer and perform cross-study validation of novel splice variants. A novel comparison of splice variant heterogeneity between subtypes of head and neck cancer demonstrated unanticipated similarity between the heterogeneity of gene isoform usage in HPV-positive and HPV-negative subtypes and anticipated increased heterogeneity among HPV-negative samples with mutations in genes that regulate the splice variant machinery. These results show that SEVA accurately models differential heterogeneity of gene isoform usage from RNA-seq data. Availability and implementation: SEVA is implemented in the R/Bioconductor package GSReg. Contact: bahman@jhu.edu or favorov@sensi.org or ejfertig@jhmi.edu. Supplementary information: Supplementary data are available at Bioinformatics online.


Assuntos
Processamento Alternativo , Neoplasias/genética , Isoformas de Proteínas/genética , Análise de Sequência de RNA/métodos , Software , Biologia Computacional/métodos , Regulação Neoplásica da Expressão Gênica , Neoplasias de Cabeça e Pescoço/genética , Humanos , Modelos Genéticos
10.
Int J Cancer ; 143(10): 2425-2436, 2018 11 15.
Artigo em Inglês | MEDLINE | ID: mdl-30070359

RESUMO

Human papillomavirus (HPV)-related oropharyngeal squamous cell carcinoma (OPSCC) exhibits a different composition of epigenetic alterations. In this study, we identified differentially methylated regions (DMRs) with potential utility in screening for HPV-positive OPSCC. Genome wide DNA methylation was measured using methyl-CpG binding domain protein-enriched genome sequencing (MBD-seq) in 50 HPV-positive OPSCC tissues and 25 normal tissues. Fifty-one DMRs were defined with maximal methylation specificity to cancer samples. The Cancer Genome Atlas (TCGA) methylation array data was used to evaluate the performance of the proposed candidates. Supervised hierarchical clustering of 51 DMRs found that HPV-positive OPSCC had significantly higher DNA methylation levels compared to normal samples, and non-HPV-related head and neck squamous cell carcinoma (HNSCC). The methylation levels of all top 20 DNA methylation biomarkers in HPV-positive OPSCC were significantly higher than those in normal samples. Further confirmation using quantitative methylation specific PCR (QMSP) in an independent set of 24 HPV-related OPSCCs and 22 controls showed that 16 of the 20 candidates had significant higher methylation levels in HPV-positive OPSCC samples compared with controls. One candidate, OR6S1, had a sensitivity of 100%, while 17 candidates (KCNA3, EMBP1, CCDC181, DPP4, ITGA4, BEND4, ELMO1, SFMBT2, C1QL3, MIR129-2, NID2, HOXB4, ZNF439, ZNF93, VSTM2B, ZNF137P and ZNF773) had specificities of 100%. The prediction accuracy of the 20 candidates rang from 56.2% to 99.8% by receiver operating characteristic analysis. We have defined 20 highly specific DMRs in HPV-related OPSCC, which can potentially be applied to molecular-based detection tests and improve disease management.


Assuntos
Metilação de DNA , Neoplasias Orofaríngeas/genética , Neoplasias Orofaríngeas/virologia , Infecções por Papillomavirus/genética , Carcinoma de Células Escamosas de Cabeça e Pescoço/genética , Carcinoma de Células Escamosas de Cabeça e Pescoço/virologia , Biomarcadores Tumorais/genética , Estudos de Casos e Controles , Estudos de Coortes , Epigênese Genética , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Neoplasias Orofaríngeas/patologia , Papillomaviridae/isolamento & purificação , Infecções por Papillomavirus/patologia , Carcinoma de Células Escamosas de Cabeça e Pescoço/patologia
11.
Bioinformatics ; 33(20): 3158-3165, 2017 Oct 15.
Artigo em Inglês | MEDLINE | ID: mdl-29028265

RESUMO

MOTIVATION: Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. RESULTS: Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. AVAILABILITY AND IMPLEMENTATION: The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. CONTACT: favorov@sensi.org. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Regulação da Expressão Gênica , Genômica/métodos , Análise de Sequência de DNA/métodos , Software , Algoritmos , Imunoprecipitação da Cromatina/métodos , Epigenômica/métodos , Genoma Humano , Humanos
12.
Bioinformatics ; 33(12): 1892-1894, 2017 Jun 15.
Artigo em Inglês | MEDLINE | ID: mdl-28174896

RESUMO

SUMMARY: Non-negative Matrix Factorization (NMF) algorithms associate gene expression with biological processes (e.g. time-course dynamics or disease subtypes). Compared with univariate associations, the relative weights of NMF solutions can obscure biomarkers. Therefore, we developed a novel patternMarkers statistic to extract genes for biological validation and enhanced visualization of NMF results. Finding novel and unbiased gene markers with patternMarkers requires whole-genome data. Therefore, we also developed Genome-Wide CoGAPS Analysis in Parallel Sets (GWCoGAPS), the first robust whole genome Bayesian NMF using the sparse, MCMC algorithm, CoGAPS. Additionally, a manual version of the GWCoGAPS algorithm contains analytic and visualization tools including patternMatcher, a Shiny web application. The decomposition in the manual pipeline can be replaced with any NMF algorithm, for further generalization of the software. Using these tools, we find granular brain-region and cell-type specific signatures with corresponding biomarkers in GTEx data, illustrating GWCoGAPS and patternMarkers ascertainment of data-driven biomarkers from whole-genome data. AVAILABILITY AND IMPLEMENTATION: PatternMarkers & GWCoGAPS are in the CoGAPS Bioconductor package (3.5) under the GPL license. CONTACT: gsteinobrien@jhmi.edu or ccolantu@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Algoritmos , Perfilação da Expressão Gênica/métodos , Software , Teorema de Bayes , Biomarcadores , Humanos , Análise de Sequência de RNA/métodos
13.
Mol Biol Rep ; 44(4): 315-321, 2017 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-28685248

RESUMO

Epidemiological genetics established that heritability in determining the risk of myocardial infarction (MI) is substantially greater when MI occurs early in life. However, the genetic architecture of early-onset and late-onset MI was not compared. We analyzed genotype frequencies of SNPs in/near 20 genes whose protein products are involved in the pathogenesis of atherosclerosis in two groups of Russian patients with MI: the first group included patients with age of first MI onset <60 years (N = 230) and the second group with onset ≥60 years (N = 174). The control group of corresponding ethnicity consisted of 193 unrelated volunteers without cardiovascular diseases (93 individuals were over 60 years). We found that in the group of patients with age of onset <60 years, SNPs FGB rs1800788*T, TGFB1 rs1982073*T/T, ENOS rs2070744*C and CRP rs1130864*T/T were associated with risk of MI, whereas in patients with age of onset ≥60 years, only TGFB1 rs1982073*T/T was associated with risk of MI. Using APSampler software, we found composite markers associated with MI only in patients with early onset: FGB rs1800788*T + TGFB1 rs1982073*T; FGB rs1800788*T + LPL rs328*C + IL4 rs2243250*C; FGB rs1800788*T + ENOS rs2070744*C (Fisher p values of 1.4 × 10-6 to 2.2 × 10-5; the permutation p values of 1.1 × 10-5 to 3.0 × 10-4; ORs = 2.67-2.54). Alleles included in the combinations were associated with MI less significantly and with lower ORs than the combinations themselves. The result showed a substantially greater contribution of the genetic component in the development of MI if it occurs early in life, and demonstrated the usefulness of genetic testing for young people.


Assuntos
Aterosclerose/genética , Infarto do Miocárdio/genética , Adulto , Fatores Etários , Idoso , Idoso de 80 Anos ou mais , Alelos , Biomarcadores/sangue , Feminino , Frequência do Gene/genética , Estudos de Associação Genética , Predisposição Genética para Doença/genética , Humanos , Masculino , Pessoa de Meia-Idade , Infarto do Miocárdio/epidemiologia , Polimorfismo de Nucleotídeo Único/genética , Fatores de Risco , Federação Russa
14.
BMC Genomics ; 16: 198, 2015 Mar 17.
Artigo em Inglês | MEDLINE | ID: mdl-25888292

RESUMO

BACKGROUND: Variation within splicing regulatory sequences often leads to differences in gene models among individuals within a species. Two alleles of the same gene may express transcripts with different exon/intron structures and consequently produce functionally different proteins. Matching genomic and transcriptomic data allows us to identify putative regulatory variants associated with changes in splicing patterns. RESULTS: Here we analyzed natural variation of splicing patterns in the transcriptomes of 81 natural strains of Drosophila melanogaster with known genotypes. We identified dozens of genotype-specific splicing patterns associated with putative cis-splicing quantitative trait loci (sQTL). The majority of changes can be explained by mutations in splice sites. Allelic-imbalance in splicing patterns confirmed that the majority are regulated mainly by cis-genetic effects. Remarkably, allele-specific splicing changes often lead to qualitative changes in gene models, yielding many isoforms not previously annotated. The observed alterations are typically outside protein-coding regions or affect only very short protein segments. CONCLUSIONS: Overall, the sets of gene models appear to be flexible within D. melanogaster populations. The observed variation in splicing patterns are predicted to have limited effects on the encoded protein sequences. To our knowledge, this is the first sQTL mapping study in Drosophila.


Assuntos
Drosophila melanogaster/genética , Variação Genética , Modelos Genéticos , Alelos , Desequilíbrio Alélico , Processamento Alternativo , Animais , Éxons , Perfilação da Expressão Gênica , Genótipo , Fases de Leitura Aberta , Polimorfismo de Nucleotídeo Único , Locos de Características Quantitativas , Sítios de Splice de RNA , Transcriptoma
15.
Bioinformatics ; 30(19): 2757-63, 2014 Oct.
Artigo em Inglês | MEDLINE | ID: mdl-24907368

RESUMO

MOTIVATION: Sample source, procurement process and other technical variations introduce batch effects into genomics data. Algorithms to remove these artifacts enhance differences between known biological covariates, but also carry potential concern of removing intragroup biological heterogeneity and thus any personalized genomic signatures. As a result, accurate identification of novel subtypes from batch-corrected genomics data is challenging using standard algorithms designed to remove batch effects for class comparison analyses. Nor can batch effects be corrected reliably in future applications of genomics-based clinical tests, in which the biological groups are by definition unknown a priori. RESULTS: Therefore, we assess the extent to which various batch correction algorithms remove true biological heterogeneity. We also introduce an algorithm, permuted-SVA (pSVA), using a new statistical model that is blind to biological covariates to correct for technical artifacts while retaining biological heterogeneity in genomic data. This algorithm facilitated accurate subtype identification in head and neck cancer from gene expression data in both formalin-fixed and frozen samples. When applied to predict Human Papillomavirus (HPV) status, pSVA improved cross-study validation even if the sample batches were highly confounded with HPV status in the training set. AVAILABILITY AND IMPLEMENTATION: All analyses were performed using R version 2.15.0. The code and data used to generate the results of this manuscript is available from https://sourceforge.net/projects/psva.


Assuntos
Algoritmos , Genômica/métodos , Neoplasias de Cabeça e Pescoço/genética , Infecções por Papillomavirus/diagnóstico , Artefatos , Biologia Computacional/métodos , Neoplasias de Cabeça e Pescoço/virologia , Humanos , Modelos Estatísticos , Reprodutibilidade dos Testes , Software
16.
Nucleic Acids Res ; 40(12): e93, 2012 Jul.
Artigo em Inglês | MEDLINE | ID: mdl-22422836

RESUMO

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.


Assuntos
Regulação da Expressão Gênica , Elementos Reguladores de Transcrição , Análise de Sequência de DNA , Algoritmos , Animais , Padronização Corporal/genética , Drosophila/embriologia , Drosophila/genética , Drosophila/metabolismo , Elementos Facilitadores Genéticos , Regulação da Expressão Gênica no Desenvolvimento , Músculos/metabolismo , Matrizes de Pontuação de Posição Específica , Software
17.
Cancer Res Commun ; 2024 Oct 29.
Artigo em Inglês | MEDLINE | ID: mdl-39470352

RESUMO

BACKGROUND: Aberrant alternative splicing can generate neoantigens, which can themselves stimulate immune responses and surveillance. Previous methods for quantifying splicing-derived neoantigens are limited by independent references and potential batch effects. RESULTS: Here, we introduce SpliceMutr, a bioinformatics approach and pipeline for identifying splicing derived neoantigens from tumor and normal data. SpliceMutr facilitates the identification of tumor-specific antigenic splice variants, predicts MHC-binding affinity, and estimates splicing antigenicity scores per gene. By applying this tool to genomic data from The Cancer Genome Atlas (TCGA), we generate splicing-derived neoantigens and neoantigenicity scores per sample and across all cancer types and find numerous correlations between splicing antigenicity and well-established biomarkers of anti-tumor immunity. Notably, carriers of mutations within splicing machinery genes have higher splicing antigenicity, which provides support for our approach. Further analysis of splicing antigenicity in cohorts of melanoma patients treated with mono- or combined immune checkpoint inhibition suggests that the abundance of splicing antigens is reduced post-treatment from baseline in patients who progress. We also observe increased splicing antigenicity in responders to immunotherapy, which may relate to an increased capacity to mount an immune response to splicing-derived antigens. CONCLUSIONS: We find the splicing antigenicity to be higher in tumor samples when compared to normal, that mutations in the splicing machinery result in increased splicing antigenicity in some cancers, and higher splicing antigenicity is associated with positive response to immune checkpoint inhibitor therapies. Further, this new computational pipeline provides novel analytical capabilities for splicing antigenicity and is openly available for further immuno-oncology analysis.

18.
PLoS Comput Biol ; 8(5): e1002529, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22693437

RESUMO

UNLABELLED: We have created a statistically grounded tool for determining the correlation of genomewide data with other datasets or known biological features, intended to guide biological exploration of high-dimensional datasets, rather than providing immediate answers. The software enables several biologically motivated approaches to these data and here we describe the rationale and implementation for each approach. Our models and statistics are implemented in an R package that efficiently calculates the spatial correlation between two sets of genomic intervals (data and/or annotated features), for use as a metric of functional interaction. The software handles any type of pointwise or interval data and instead of running analyses with predefined metrics, it computes the significance and direction of several types of spatial association; this is intended to suggest potentially relevant relationships between the datasets. AVAILABILITY AND IMPLEMENTATION: The package, GenometriCorr, can be freely downloaded at http://genometricorr.sourceforge.net/. Installation guidelines and examples are available from the sourceforge repository. The package is pending submission to Bioconductor.


Assuntos
Bases de Dados Genéticas , Genômica/métodos , Armazenamento e Recuperação da Informação , Modelos Genéticos , Modelos Estatísticos , Software , Animais , Cromossomos , Epigenômica , Loci Gênicos , Genoma , Humanos , Internet , RNA de Transferência/genética , Estatísticas não Paramétricas , Interface Usuário-Computador
19.
Cell Syst ; 14(4): 285-301.e4, 2023 04 19.
Artigo em Inglês | MEDLINE | ID: mdl-37080163

RESUMO

Recent advances in spatial transcriptomics (STs) enable gene expression measurements from a tissue sample while retaining its spatial context. This technology enables unprecedented in situ resolution of the regulatory pathways that underlie the heterogeneity in the tumor as well as the tumor microenvironment (TME). The direct characterization of cellular co-localization with spatial technologies facilities quantification of the molecular changes resulting from direct cell-cell interaction, as it occurs in tumor-immune interactions. We present SpaceMarkers, a bioinformatics algorithm to infer molecular changes from cell-cell interactions from latent space analysis of ST data. We apply this approach to infer the molecular changes from tumor-immune interactions in Visium spatial transcriptomics data of metastasis, invasive and precursor lesions, and immunotherapy treatment. Further transfer learning in matched scRNA-seq data enabled further quantification of the specific cell types in which SpaceMarkers are enriched. Altogether, SpaceMarkers can identify the location and context-specific molecular interactions within the TME from ST data.


Assuntos
Algoritmos , Microambiente Tumoral , Comunicação Celular , Biologia Computacional , Perfilação da Expressão Gênica
20.
Cancers (Basel) ; 14(19)2022 Sep 25.
Artigo em Inglês | MEDLINE | ID: mdl-36230586

RESUMO

Polyunsaturated fatty acid (PUFA) metabolism is currently a focus in cancer research due to PUFAs functioning as structural components of the membrane matrix, as fuel sources for energy production, and as sources of secondary messengers, so called oxylipins, important players of inflammatory processes. Although breast cancer (BC) is the leading cause of cancer death among women worldwide, no systematic study of PUFA metabolism as a system of interrelated processes in this disease has been carried out. Here, we implemented a Boruta-based feature selection algorithm to determine the list of most important PUFA metabolism genes altered in breast cancer tissues compared with in normal tissues. A rank-based Random Forest (RF) model was built on the selected gene list (33 genes) and applied to predict the cancer phenotype to ascertain the PUFA genes involved in cancerogenesis. It showed high-performance of dichotomic classification (balanced accuracy of 0.94, ROC AUC 0.99) We also retrieved a list of the important PUFA genes (46 genes) that differed between molecular subtypes at the level of breast cancer molecular subtypes. The balanced accuracy of the classification model built on the specified genes was 0.82, while the ROC AUC for the sensitivity analysis was 0.85. Specific patterns of PUFA metabolic changes were obtained for each molecular subtype of breast cancer. These results show evidence that (1) PUFA metabolism genes are critical for the pathogenesis of breast cancer; (2) BC subtypes differ in PUFA metabolism genes expression; and (3) the lists of genes selected in the models are enriched with genes involved in the metabolism of signaling lipids.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA