Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
Más filtros

Banco de datos
País/Región como asunto
Tipo del documento
Intervalo de año de publicación
1.
Cell ; 153(3): 575-89, 2013 Apr 25.
Artículo en Inglés | MEDLINE | ID: mdl-23622242

RESUMEN

Adenosine deaminases acting on RNA (ADARs) are involved in RNA editing that converts adenosine residues to inosine specifically in double-stranded RNAs. In this study, we investigated the interaction of the RNA editing mechanism with the RNA interference (RNAi) machinery and found that ADAR1 forms a complex with Dicer through direct protein-protein interaction. Most importantly, ADAR1 increases the maximum rate (Vmax) of pre-microRNA (miRNA) cleavage by Dicer and facilitates loading of miRNA onto RNA-induced silencing complexes, identifying a new role of ADAR1 in miRNA processing and RNAi mechanisms. ADAR1 differentiates its functions in RNA editing and RNAi by the formation of either ADAR1/ADAR1 homodimer or Dicer/ADAR1 heterodimer complexes, respectively. As expected, the expression of miRNAs is globally inhibited in ADAR1(-/-) mouse embryos, which, in turn, alters the expression of their target genes and might contribute to their embryonic lethal phenotype.


Asunto(s)
Adenosina Desaminasa/metabolismo , ARN Helicasas DEAD-box/metabolismo , Interferencia de ARN , Procesamiento Postranscripcional del ARN , Ribonucleasa III/metabolismo , Adenosina Desaminasa/química , Adenosina Desaminasa/genética , Animales , Secuencia de Bases , ARN Helicasas DEAD-box/química , Dimerización , Embrión de Mamíferos/metabolismo , Células HEK293 , Células HeLa , Humanos , Ratones , MicroARNs/metabolismo , Datos de Secuencia Molecular , Dominios y Motivos de Interacción de Proteínas , ARN Interferente Pequeño/metabolismo , Proteínas de Unión al ARN , Ribonucleasa III/química , Regulación hacia Arriba
2.
Anal Chem ; 95(19): 7779-7787, 2023 05 16.
Artículo en Inglés | MEDLINE | ID: mdl-37141575

RESUMEN

The cascade of immune responses involves activation of diverse immune cells and release of a large amount of cytokines, which leads to either normal, balanced inflammation or hyperinflammatory responses and even organ damage by sepsis. Conventional diagnosis of immunological disorders based on multiple cytokines in the blood serum has varied accuracy, and it is difficult to distinguish normal inflammation from sepsis. Herein, we present an approach to detect immunological disorders through rapid, ultrahigh-multiplex analysis of T cells using single-cell multiplex in situ tagging (scMIST) technology. scMIST permits simultaneous detection of 46 markers and cytokines from single cells without the assistance of special instruments. A cecal ligation and puncture sepsis model was built to supply T cells from two groups of mice that survived the surgery or died after 1 day. The scMIST assays have captured the T cell features and the dynamics over the course of recovery. Compared with cytokines in the peripheral blood, T cell markers show different dynamics and cytokine levels. We have applied a random forest machine learning model to single T cells from two groups of mice. Through training, the model has been able to predict the group of mice through T cell classification and majority rule with 94% accuracy. Our approach pioneers the direction of single-cell omics and could be widely applicable to human diseases.


Asunto(s)
Enfermedades del Sistema Inmune , Sepsis , Humanos , Ratones , Animales , Citocinas , Inflamación , Linfocitos T , Sepsis/diagnóstico , Modelos Animales de Enfermedad
3.
Bioinformatics ; 37(20): 3412-3420, 2021 Oct 25.
Artículo en Inglés | MEDLINE | ID: mdl-34014317

RESUMEN

MOTIVATION: Access to large-scale genomics and transcriptomics data from various tissues and cell lines allowed the discovery of wide-spread alternative splicing events and alternative promoter usage in mammalians. Between human and mouse, gene-level orthology is currently present for nearly 16k protein-coding genes spanning a diverse repertoire of over 200k total transcript isoforms. RESULTS: Here, we describe a novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level. ExTraMapper identifies more than 350k exon mappings, as well as 30k transcript mappings between human and mouse using only sequence and gene annotation information. We demonstrate that ExTraMapper identifies a larger number of exon and transcript mappings compared to previous methods. Further, it identifies exon fusions, splits and losses due to splice site mutations, and finds mappings between microexons that are previously missed. By reanalysis of RNA-seq data from 13 matched human and mouse tissues, we show that ExTraMapper improves the correlation of transcript-specific expression levels suggesting a more accurate mapping of human and mouse transcripts. We also applied the method to detect conserved exon and transcript pairs between human and rhesus macaque genomes to highlight the point that ExTraMapper is applicable to any pair of organisms that have orthologous gene pairs. AVAILABILITY AND IMPLEMENTATION: The source code and the results are available at https://github.com/ay-lab/ExTraMapper and http://ay-lab-tools.lji.org/extramapper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Bioinformatics ; 37(15): 2112-2120, 2021 Aug 09.
Artículo en Inglés | MEDLINE | ID: mdl-33538820

RESUMEN

MOTIVATION: Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. RESULTS: To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. AVAILABILITY AND IMPLEMENTATION: The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.
Brief Bioinform ; 18(2): 260-269, 2017 03 01.
Artículo en Inglés | MEDLINE | ID: mdl-26944083

RESUMEN

Given that the majority of multi-exon genes generate diverse functional products, it is important to evaluate expression at the isoform level. Previous studies have demonstrated strong gene-level correlations between RNA sequencing (RNA-seq) and microarray platforms, but have not studied their concordance at the isoform level. We performed transcript abundance estimation on raw RNA-seq and exon-array expression profiles available for common glioblastoma multiforme samples from The Cancer Genome Atlas using different analysis pipelines, and compared both the isoform- and gene-level expression estimates between programs and platforms. The results showed better concordance between RNA-seq/exon-array and reverse transcription-quantitative polymerase chain reaction (RT-qPCR) platforms for fold change estimates than for raw abundance estimates, suggesting that fold change normalization against a control is an important step for integrating expression data across platforms. Based on RT-qPCR validations, eXpress and Multi-Mapping Bayesian Gene eXpression (MMBGX) programs achieved the best performance for RNA-seq and exon-array platforms, respectively, for deriving the isoform-level fold change values. While eXpress achieved the highest correlation with the RT-qPCR and exon-array (MMBGX) results overall, RSEM was more highly correlated with MMBGX for the subset of transcripts that are highly variable across the samples. eXpress appears to be most successful in discriminating lowly expressed transcripts, but IsoformEx and RSEM correlate more strongly with MMBGX for highly expressed transcripts. The results also reinforce how potentially important isoform-level expression changes can be masked by gene-level estimates, and demonstrate that exon arrays yield comparable results to RNA-seq for evaluating isoform-level expression changes.


Asunto(s)
Algoritmos , Teorema de Bayes , Exones , Perfilación de la Expresión Génica , Humanos , Análisis de Secuencia por Matrices de Oligonucleótidos , Isoformas de Proteínas , ARN , Análisis de Secuencia de ARN
6.
Genome Res ; 24(6): 1039-50, 2014 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-24676094

RESUMEN

Mapping genome-wide data to human subtelomeres has been problematic due to the incomplete assembly and challenges of low-copy repetitive DNA elements. Here, we provide updated human subtelomere sequence assemblies that were extended by filling telomere-adjacent gaps using clone-based resources. A bioinformatic pipeline incorporating multiread mapping for annotation of the updated assemblies using short-read data sets was developed and implemented. Annotation of subtelomeric sequence features as well as mapping of CTCF and cohesin binding sites using ChIP-seq data sets from multiple human cell types confirmed that CTCF and cohesin bind within 3 kb of the start of terminal repeat tracts at many, but not all, subtelomeres. CTCF and cohesin co-occupancy were also enriched near internal telomere-like sequence (ITS) islands and the nonterminal boundaries of subtelomere repeat elements (SREs) in transformed lymphoblastoid cell lines (LCLs) and human embryonic stem cell (ES) lines, but were not significantly enriched in the primary fibroblast IMR90 cell line. Subtelomeric CTCF and cohesin sites predicted by ChIP-seq using our bioinformatics pipeline (but not predicted when only uniquely mapping reads were considered) were consistently validated by ChIP-qPCR. The colocalized CTCF and cohesin sites in SRE regions are candidates for mediating long-range chromatin interactions in the transcript-rich SRE region. A public browser for the integrated display of short-read sequence-based annotations relative to key subtelomere features such as the start of each terminal repeat tract, SRE identity and organization, and subtelomeric gene models was established.


Asunto(s)
Proteínas de Ciclo Celular/genética , Proteínas Cromosómicas no Histona/genética , Genoma Humano , Proteínas Represoras/genética , Telómero/genética , Secuencias Repetidas Terminales , Secuencia de Bases , Factor de Unión a CCCTC , Línea Celular , Células Madre Embrionarias/metabolismo , Fibroblastos/metabolismo , Humanos , Anotación de Secuencia Molecular/métodos , Datos de Secuencia Molecular , Unión Proteica , Proteínas Represoras/metabolismo , Cohesinas
7.
Proc Natl Acad Sci U S A ; 111(1): 291-6, 2014 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-24368849

RESUMEN

Glioblastoma multiforme (GBM) and the mesenchymal GBM subtype in particular are highly malignant tumors that frequently exhibit regions of severe hypoxia and necrosis. Because these features correlate with poor prognosis, we investigated microRNAs whose expression might regulate hypoxic GBM cell survival and growth. We determined that the expression of microRNA-218 (miR-218) is decreased significantly in highly necrotic mesenchymal GBM, and orthotopic tumor studies revealed that reduced miR-218 levels confer GBM resistance to chemotherapy. Importantly, miR-218 targets multiple components of receptor tyrosine kinase (RTK) signaling pathways, and miR-218 repression increases the abundance and activity of multiple RTK effectors. This elevated RTK signaling also promotes the activation of hypoxia-inducible factor (HIF), most notably HIF2α. We further show that RTK-mediated HIF2α regulation is JNK dependent, via jun proto-oncogene. Collectively, our results identify an miR-218-RTK-HIF2α signaling axis that promotes GBM cell survival and tumor angiogenesis, particularly in necrotic mesenchymal tumors.


Asunto(s)
Factores de Transcripción con Motivo Hélice-Asa-Hélice Básico/metabolismo , Neoplasias Encefálicas/metabolismo , Glioblastoma/metabolismo , Mesodermo/metabolismo , MicroARNs/metabolismo , Proteínas Tirosina Quinasas Receptoras/metabolismo , Adulto , Anciano , Anciano de 80 o más Años , Animales , Antineoplásicos/farmacología , Supervivencia Celular , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Humanos , Hipoxia , Ratones , Ratones Desnudos , Persona de Mediana Edad , Necrosis , Trasplante de Neoplasias , Neovascularización Patológica , Análisis de Secuencia por Matrices de Oligonucleótidos , Proto-Oncogenes Mas , Transducción de Señal , Adulto Joven
8.
EMBO J ; 31(21): 4165-78, 2012 Nov 05.
Artículo en Inglés | MEDLINE | ID: mdl-23010778

RESUMEN

The contribution of human subtelomeric DNA and chromatin organization to telomere integrity and chromosome end protection is not yet understood in molecular detail. Here, we show by ChIP-Seq that most human subtelomeres contain a CTCF- and cohesin-binding site within ∼1-2 kb of the TTAGGG repeat tract and adjacent to a CpG-islands implicated in TERRA transcription control. ChIP-Seq also revealed that RNA polymerase II (RNAPII) was enriched at sites adjacent to the CTCF sites and extending towards the telomere repeat tracts. Mutation of CTCF-binding sites in plasmid-borne promoters reduced transcriptional activity in an orientation-dependent manner. Depletion of CTCF by shRNA led to a decrease in TERRA transcription, and a loss of cohesin and RNAPII binding to the subtelomeres. Depletion of either CTCF or cohesin subunit Rad21 caused telomere-induced DNA damage foci (TIF) formation, and destabilized TRF1 and TRF2 binding to the TTAGGG proximal subtelomere DNA. These findings indicate that CTCF and cohesin are integral components of most human subtelomeres, and important for the regulation of TERRA transcription and telomere end protection.


Asunto(s)
Proteínas de Ciclo Celular/metabolismo , Cromatina/genética , Proteínas Cromosómicas no Histona/metabolismo , Proteínas de Unión al ADN/genética , Regulación de la Expresión Génica , Proteínas Represoras/metabolismo , Telómero/genética , Factores de Transcripción/genética , Transcripción Genética , Factor de Unión a CCCTC , Proteínas de Ciclo Celular/genética , Células Cultivadas , Inmunoprecipitación de Cromatina , Proteínas Cromosómicas no Histona/genética , Islas de CpG/genética , Ensayo de Cambio de Movilidad Electroforética , Técnica del Anticuerpo Fluorescente , Humanos , Luciferasas/metabolismo , Neoplasias/genética , Neoplasias/patología , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Regiones Promotoras Genéticas/genética , ARN Polimerasa II/genética , ARN Polimerasa II/metabolismo , ARN Mensajero/genética , Reacción en Cadena en Tiempo Real de la Polimerasa , Proteínas Represoras/genética , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Cohesinas
9.
Genome Res ; 23(9): 1446-61, 2013 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-23796952

RESUMEN

The functional roles of SNPs within the 8q24 gene desert in the cancer phenotype are not yet well understood. Here, we report that CCAT2, a novel long noncoding RNA transcript (lncRNA) encompassing the rs6983267 SNP, is highly overexpressed in microsatellite-stable colorectal cancer and promotes tumor growth, metastasis, and chromosomal instability. We demonstrate that MYC, miR-17-5p, and miR-20a are up-regulated by CCAT2 through TCF7L2-mediated transcriptional regulation. We further identify the physical interaction between CCAT2 and TCF7L2 resulting in an enhancement of WNT signaling activity. We show that CCAT2 is itself a WNT downstream target, which suggests the existence of a feedback loop. Finally, we demonstrate that the SNP status affects CCAT2 expression and the risk allele G produces more CCAT2 transcript. Our results support a new mechanism of MYC and WNT regulation by the novel lncRNA CCAT2 in colorectal cancer pathogenesis, and provide an alternative explanation of the SNP-conferred cancer risk.


Asunto(s)
Inestabilidad Cromosómica , Cromosomas Humanos Par 8/genética , Neoplasias del Colon/genética , ARN Largo no Codificante/genética , ARN Largo no Codificante/metabolismo , Animales , Neoplasias de la Mama/genética , Neoplasias de la Mama/metabolismo , Estudios de Casos y Controles , Línea Celular Tumoral , Neoplasias del Colon/metabolismo , Neoplasias del Colon/patología , Femenino , Regulación Neoplásica de la Expresión Génica , Humanos , Masculino , Ratones , MicroARNs/genética , MicroARNs/metabolismo , Metástasis de la Neoplasia/genética , Polimorfismo de Nucleótido Simple , Proteínas Proto-Oncogénicas c-myc/genética , Proteínas Proto-Oncogénicas c-myc/metabolismo , Proteína 1 Similar al Factor de Transcripción 7/genética , Proteína 1 Similar al Factor de Transcripción 7/metabolismo , Transcripción Genética , Vía de Señalización Wnt
10.
J Virol ; 89(1): 799-810, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25355877

RESUMEN

UNLABELLED: Although monocytes and macrophages are targets of HIV-1-mediated immunopathology, the impact of high viremia on activation-induced monocyte apoptosis relative to monocyte and macrophage activation changes remains undetermined. In this study, we determined constitutive and oxidative stress-induced monocyte apoptosis in uninfected and HIV(+) individuals across a spectrum of viral loads (n = 35; range, 2,243 to 1,355,998 HIV-1 RNA copies/ml) and CD4 counts (range, 26 to 801 cells/mm(3)). Both constitutive apoptosis and oxidative stress-induced apoptosis were positively associated with viral load and negatively associated with CD4, with an elevation in apoptosis occurring in patients with more than 40,000 (4.6 log) copies/ml. As expected, expression of Rb1 and interferon-stimulated genes (ISGs), plasma soluble CD163 (sCD163) concentration, and the proportion of CD14(++) CD16(+) intermediate monocytes were elevated in viremic patients compared to those in uninfected controls. Although CD14(++) CD16(+) frequencies, sCD14, sCD163, and most ISG expression were not directly associated with a change in apoptosis, sCD14 and ISG expression showed an association with increasing viral load. Multivariable analysis of clinical values and monocyte gene expression identified changes in IFI27, IFITM2, Rb1, and Bcl2 expression as determinants of constitutive apoptosis (P = 3.77 × 10(-5); adjusted R(2) = 0.5983), while changes in viral load, IFITM2, Rb1, and Bax expression were determinants of oxidative stress-induced apoptosis (P = 5.59 × 10(-5); adjusted R(2) = 0.5996). Our data demonstrate differential activation states in monocytes between levels of viremia in association with differences in apoptosis that may contribute to greater monocyte turnover with high viremia. IMPORTANCE: This study characterized differential monocyte activation, apoptosis, and apoptosis-related gene expression in low- versus high-level viremic HIV-1 patients, suggesting a shift in apoptosis regulation that may be associated with disease state. Using single and multivariable analysis of monocyte activation parameters and gene expression, we supported the hypothesis that monocyte apoptosis in HIV disease is a reflection of viremia and activation state with contributions from gene expression changes within the ISG and Bcl2 gene families. Understanding monocyte apoptosis response may inform HIV immunopathogenesis, retention of infected macrophages, and monocyte turnover in low- or high-viral-load states.


Asunto(s)
Apoptosis , Infecciones por VIH/inmunología , Infecciones por VIH/virología , VIH-1/inmunología , Monocitos/inmunología , Carga Viral , Adulto , Anciano de 80 o más Años , Enfermedad Crónica , Femenino , Perfilación de la Expresión Génica , Humanos , Masculino , Persona de Mediana Edad , Monocitos/fisiología , Proteínas Proto-Oncogénicas c-bcl-2/biosíntesis , Proteína de Retinoblastoma/biosíntesis , Adulto Joven
11.
Nucleic Acids Res ; 42(8): e64, 2014 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-24503249

RESUMEN

Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients' molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification.


Asunto(s)
Neoplasias Encefálicas/clasificación , Perfilación de la Expresión Génica/métodos , Glioblastoma/clasificación , Isoformas de Proteínas/genética , Adulto , Anciano , Algoritmos , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/mortalidad , Femenino , Glioblastoma/genética , Glioblastoma/mortalidad , Humanos , Masculino , Persona de Mediana Edad , Pronóstico , Isoformas de Proteínas/metabolismo , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa
12.
BMC Genomics ; 16 Suppl 11: S3, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26576613

RESUMEN

BACKGROUND: Many supervised learning algorithms have been applied in deriving gene signatures for patient stratification from gene expression data. However, transferring the multi-gene signatures from one analytical platform to another without loss of classification accuracy is a major challenge. Here, we compared three unsupervised data discretization methods--Equal-width binning, Equal-frequency binning, and k-means clustering--in accurately classifying the four known subtypes of glioblastoma multiforme (GBM) when the classification algorithms were trained on the isoform-level gene expression profiles from exon-array platform and tested on the corresponding profiles from RNA-seq data. RESULTS: We applied an integrated machine learning framework that involves three sequential steps; feature selection, data discretization, and classification. For models trained and tested on exon-array data, the addition of data discretization step led to robust and accurate predictive models with fewer number of variables in the final models. For models trained on exon-array data and tested on RNA-seq data, the addition of data discretization step dramatically improved the classification accuracies with Equal-frequency binning showing the highest improvement with more than 90% accuracies for all the models with features chosen by Random Forest based feature selection. Overall, SVM classifier coupled with Equal-frequency binning achieved the best accuracy (> 95%). Without data discretization, however, only 73.6% accuracy was achieved at most. CONCLUSIONS: The classification algorithms, trained and tested on data from the same platform, yielded similar accuracies in predicting the four GBM subgroups. However, when dealing with cross-platform data, from exon-array to RNA-seq, the classifiers yielded stable models with highest classification accuracies on data transformed by Equal frequency binning. The approach presented here is generally applicable to other cancer types for classification and identification of molecular subgroups by integrating data across different gene expression platforms.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Glioblastoma/clasificación , Glioblastoma/genética , Aprendizaje Automático , Isoformas de ARN/genética , Algoritmos , Análisis por Conglomerados , Humanos
13.
Proc Natl Acad Sci U S A ; 109(22): 8646-51, 2012 May 29.
Artículo en Inglés | MEDLINE | ID: mdl-22586128

RESUMEN

A genome-wide association study of papillary thyroid carcinoma (PTC) pinpointed two independent SNPs (rs944289 and rs965513) located in regions containing no annotated genes (14q13.3 and 9q22.33, respectively). Here, we describe a unique, long, intergenic, noncoding RNA gene (lincRNA) named Papillary Thyroid Carcinoma Susceptibility Candidate 3 (PTCSC3) located 3.2 kb downstream of rs944289 at 14q.13.3 and the expression of which is strictly thyroid specific. By quantitative PCR, PTCSC3 expression was strongly down-regulated (P = 2.84 × 10(-14)) in thyroid tumor tissue of 46 PTC patients and the risk allele (T) was associated with the strongest suppression (genotype [TT] (n = 21) vs. [CT] (n = 19), P = 0.004). In adjacent unaffected thyroid tissue, the genotype [TT] was associated with up-regulation of PTCSC3 ([TT] (n = 21) vs. [CT] (n = 19), P = 0.034). The SNP rs944289 was located in a binding site for the CCAAT/enhancer binding proteins (C/EBP) α and ß. The risk allele destroyed the binding site in silico. Both C/EBPα and C/EBPß activated the PTCSC3 promoter in reporter assays (P = 0.0009 and P = 0.0014, respectively) and the risk allele reduced the activation compared with the nonrisk allele (C) (P = 0.026 and P = 0.048, respectively). Restoration of PTCSC3 expression in PTC cell line cells (TPC-1 and BCPAP) inhibited cell growth (P = 0.002 and P = 0.019, respectively) and affected the expression of genes involved in DNA replication, recombination and repair, cellular movement, tumor morphology, and cell death. Our data suggest that SNP rs944289 predisposes to PTC through a previously uncharacterized, long intergenic noncoding RNA gene (PTCSC3) that has the characteristics of a tumor suppressor.


Asunto(s)
Carcinoma Papilar/genética , Polimorfismo de Nucleótido Simple , ARN no Traducido/genética , Neoplasias de la Tiroides/genética , Animales , Sitios de Unión/genética , Northern Blotting , Proteína beta Potenciadora de Unión a CCAAT/metabolismo , Células COS , Carcinoma Papilar/patología , Línea Celular Tumoral , Proliferación Celular , Chlorocebus aethiops , Cromosomas Humanos Par 14/genética , Perfilación de la Expresión Génica , Regulación Neoplásica de la Expresión Génica , Genes Supresores de Tumor , Predisposición Genética a la Enfermedad/genética , Genotipo , Células HEK293 , Humanos , Datos de Secuencia Molecular , Análisis de Secuencia por Matrices de Oligonucleótidos , Reacción en Cadena de la Polimerasa de Transcriptasa Inversa , Neoplasias de la Tiroides/patología
14.
Genome Res ; 21(8): 1260-72, 2011 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-21712398

RESUMEN

Despite our growing knowledge that many mammalian genes generate multiple transcript variants that may encode functionally distinct protein isoforms, the transcriptomes of various tissues and their developmental stages are poorly defined. Identifying the transcriptome and its regulation in a cell/tissue is the key to deciphering the cell/tissue-specific functions of a gene. We built a genome-wide inventory of noncoding and protein-coding transcripts (transcriptomes), their promoters (promoteromes) and histone modification states (epigenomes) for developing, and adult cerebella using integrative massive-parallel sequencing and bioinformatics approach. The data consists of 61,525 (12,796 novel) distinct mRNAs transcribed by 29,589 (4792 novel) promoters corresponding to 15,669 protein-coding and 7624 noncoding genes. Importantly, our results show that the transcript variants from a gene are predominantly generated using alternative transcriptional rather than splicing mechanisms, highlighting alternative promoters and transcriptional terminations as major sources of transcriptome diversity. Moreover, H3K4me3, and not H3K27me3, defined the use of alternative promoters, and we identified a combinatorial role of H3K4me3 and H3K27me3 in regulating the expression of transcripts, including transcript variants of a gene during development. We observed a strong bias of both H3K4me3 and H3K27me3 for CpG-rich promoters and an exponential relationship between their enrichment and corresponding transcript expression. Furthermore, the majority of genes associated with neurological diseases expressed multiple transcripts through alternative promoters, and we demonstrated aberrant use of alternative promoters in medulloblastoma, cancer arising in the cerebellum. The transcriptomes of developing and adult cerebella presented in this study emphasize the importance of analyzing gene regulation and function at the isoform level.


Asunto(s)
Empalme Alternativo , Cerebelo/crecimiento & desarrollo , Transcripción Genética , Transcriptoma , Animales , Neoplasias Cerebelosas/genética , Neoplasias Cerebelosas/metabolismo , Cerebelo/metabolismo , Biología Computacional , Epigénesis Genética , Regulación del Desarrollo de la Expresión Génica , Genoma , Meduloblastoma/genética , Meduloblastoma/metabolismo , Ratones , Ratones Endogámicos , Regiones Promotoras Genéticas , ARN Mensajero/metabolismo
15.
ArXiv ; 2024 Feb 15.
Artículo en Inglés | MEDLINE | ID: mdl-38410647

RESUMEN

Effective DNA embedding remains crucial in genomic analysis, particularly in scenarios lacking labeled data for model fine-tuning, despite the significant advancements in genome foundation models. A prime example is metagenomics binning, a critical process in microbiome research that aims to group DNA sequences by their species from a complex mixture of DNA sequences derived from potentially thousands of distinct, often uncharacterized species. To fill the lack of effective DNA embedding models, we introduce DNABERT-S, a genome foundation model that specializes in creating species-aware DNA embeddings. To encourage effective embeddings to error-prone long-read DNA sequences, we introduce Manifold Instance Mixup (MI-Mix), a contrastive objective that mixes the hidden representations of DNA sequences at randomly selected layers and trains the model to recognize and differentiate these mixed proportions at the output layer. We further enhance it with the proposed Curriculum Contrastive Learning (C2LR) strategy. Empirical results on 18 diverse datasets showed DNABERT-S's remarkable performance. It outperforms the top baseline's performance in 10-shot species classification with just a 2-shot training while doubling the Adjusted Rand Index (ARI) in species clustering and substantially increasing the number of correctly identified species in metagenomics binning. The code, data, and pre-trained model are publicly available at https://github.com/Zhihan1996/DNABERT_S.

16.
BMC Bioinformatics ; 14: 262, 2013 Aug 27.
Artículo en Inglés | MEDLINE | ID: mdl-23981227

RESUMEN

BACKGROUND: RNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts. The comparative analyses of the data require appropriate statistical methods to estimate the differential expression of transcript variants across different cell/tissue types and disease conditions. RESULTS: We developed a novel nonparametric empirical Bayesian-based approach (NPEBseq) to model the RNA-seq data. The prior distribution of the Bayesian model is empirically estimated from the data without any parametric assumption, and hence the method is "nonparametric" in nature. Based on this model, we proposed a method for detecting differentially expressed genes across different conditions. We also extended this method to detect differential usage of exons from RNA-seq data. The evaluation of NPEBseq on both simulated and publicly available RNA-seq datasets and comparison with three popular methods showed improved results for experiments with or without biological replicates. CONCLUSIONS: NPEBseq can successfully detect differential expression between different conditions not only at gene level but also at exon level from RNA-seq datasets. In addition, NPEBSeq performs significantly better than current methods and can be applied to genome-wide RNA-seq datasets. Sample datasets and R package are available at http://bioinformatics.wistar.upenn.edu/NPEBseq.


Asunto(s)
Teorema de Bayes , Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Análisis de Secuencia de ARN/métodos , Programas Informáticos , Secuenciación de Nucleótidos de Alto Rendimiento , ARN/análisis , ARN/genética , Alineación de Secuencia , Estadísticas no Paramétricas
17.
J Virol ; 86(10): 5752-62, 2012 May.
Artículo en Inglés | MEDLINE | ID: mdl-22419807

RESUMEN

LANA is essential for tethering the Kaposi's sarcoma-associated herpesvirus (KSHV) genome to metaphase chromosomes and for modulating host-cell gene expression, but the binding sites in the host-chromosome remain unknown. Here, we use LANA-specific chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) to identify LANA binding sites in the viral and host-cell genomes of a latently infected pleural effusion lymphoma cell line BCBL1. LANA bound with high occupancy to the KSHV genome terminal repeats (TR) and to a few minor binding sites in the KSHV genome, including the LANA promoter region. We identified 256 putative LANA binding site peaks with P < 0.01 and overlap in two independent ChIP-Seq experiments. We validated several of the high-occupancy binding sites by conventional ChIP assays and quantitative PCR. Candidate cellular LANA binding motifs were identified and assayed for binding to purified recombinant LANA protein in vitro but bound with low affinity compared to the viral TR binding site. More than half of the LANA binding sites (170/256) could be mapped to within 2.5 kb of a cellular gene transcript. Pathways and Gene Ontogeny (GO) analysis revealed that LANA binds to genes within the p53 and tumor necrosis factor (TNF) regulatory network. Further analysis revealed partial overlap of LANA and STAT1 binding sites in several gamma interferon (IFN-γ)-regulated genes. We show that ectopic expression of LANA can downmodulate IFN-γ-mediated activation of a subset of genes, including the TAP1 peptide transporter and proteasome subunit beta type 9 (PSMB9), both of which are required for class I antigen presentation. Our data provide a potential mechanism through which LANA may regulate several host cell pathways by direct binding to gene regulatory elements.


Asunto(s)
Antígenos Virales/metabolismo , Cromosomas Humanos/virología , Herpesvirus Humano 8/metabolismo , Proteínas Nucleares/metabolismo , Sarcoma de Kaposi/genética , Sarcoma de Kaposi/virología , Antígenos Virales/química , Antígenos Virales/genética , Secuencia de Bases , Sitios de Unión , Línea Celular , Cromosomas Humanos/química , Regulación de la Expresión Génica , Marcación de Gen , Herpesvirus Humano 8/química , Herpesvirus Humano 8/genética , Interacciones Huésped-Patógeno , Humanos , Datos de Secuencia Molecular , Proteínas Nucleares/química , Proteínas Nucleares/genética , Unión Proteica
18.
Nucleic Acids Res ; 39(Database issue): D92-7, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-21097880

RESUMEN

MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 different data sets that include six human cell-types and 10 distinct mouse cell/tissues. The updated MPromDb version consists of computationally predicted (novel) and known active RNAP-II promoters (42,893 human and 48,366 mouse promoters) from various data sets freely available at NCBI GEO database. We found that 36% and 40% of protein-coding genes have alternative promoters in human and mouse genomes and ∼40% of promoters are tissue/cell specific. The identified RNAP-II promoters were annotated using various known and novel gene models. Additionally, for novel promoters we looked into other evidences-GenBank mRNAs, spliced ESTs, CAGE promoter tags and mRNA-seq reads. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters. We have also integrated GBrowse genome browser with MPromDb for visualization of ChIP-seq profiles and to display the annotations. The current release of MPromDb can be accessed at http://bioinformatics.wistar.upenn.edu/MPromDb/.


Asunto(s)
Inmunoprecipitación de Cromatina , Bases de Datos de Ácidos Nucleicos , Regiones Promotoras Genéticas , Animales , Gráficos por Computador , Humanos , Ratones , ARN Polimerasa II/metabolismo , Análisis de Secuencia de ADN , Integración de Sistemas
19.
Nucleic Acids Res ; 39(1): 190-201, 2011 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-20843783

RESUMEN

Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes.


Asunto(s)
Regiones Promotoras Genéticas , ARN Polimerasa II/metabolismo , Animales , Inmunoprecipitación de Cromatina , Mapeo Cromosómico , Genoma , Ratones , Análisis de Secuencia de ADN/normas , Transcripción Genética
20.
BMC Genomics ; 13: 440, 2012 Aug 31.
Artículo en Inglés | MEDLINE | ID: mdl-22938532

RESUMEN

BACKGROUND: With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala. RESULTS: We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance. CONCLUSIONS: This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population.


Asunto(s)
Pueblo Asiatico/genética , Mapeo Cromosómico , Genoma Humano , Genoma Mitocondrial , Polimorfismo de Nucleótido Simple , Anticoagulantes/efectos adversos , Variaciones en el Número de Copia de ADN , Diabetes Mellitus/genética , Diabetes Mellitus/prevención & control , Femenino , Predisposición Genética a la Enfermedad , Variación Genética , Haplotipos , Hemorragia/inducido químicamente , Hemorragia/genética , Hemorragia/prevención & control , Humanos , Hipoglucemiantes/uso terapéutico , India , Metformina/uso terapéutico , Persona de Mediana Edad , Esclerosis Múltiple/genética , Esclerosis Múltiple/prevención & control , Análisis de Secuencia de ADN , Warfarina/efectos adversos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA