Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 93
Filtrar
1.
Cell ; 153(3): 575-89, 2013 Apr 25.
Artigo em Inglês | MEDLINE | ID: mdl-23622242

RESUMO

Adenosine deaminases acting on RNA (ADARs) are involved in RNA editing that converts adenosine residues to inosine specifically in double-stranded RNAs. In this study, we investigated the interaction of the RNA editing mechanism with the RNA interference (RNAi) machinery and found that ADAR1 forms a complex with Dicer through direct protein-protein interaction. Most importantly, ADAR1 increases the maximum rate (Vmax) of pre-microRNA (miRNA) cleavage by Dicer and facilitates loading of miRNA onto RNA-induced silencing complexes, identifying a new role of ADAR1 in miRNA processing and RNAi mechanisms. ADAR1 differentiates its functions in RNA editing and RNAi by the formation of either ADAR1/ADAR1 homodimer or Dicer/ADAR1 heterodimer complexes, respectively. As expected, the expression of miRNAs is globally inhibited in ADAR1(-/-) mouse embryos, which, in turn, alters the expression of their target genes and might contribute to their embryonic lethal phenotype.


Assuntos
Adenosina Desaminase/metabolismo , RNA Helicases DEAD-box/metabolismo , Interferência de RNA , Processamento Pós-Transcricional do RNA , Ribonuclease III/metabolismo , Adenosina Desaminase/química , Adenosina Desaminase/genética , Animais , Sequência de Bases , RNA Helicases DEAD-box/química , Dimerização , Embrião de Mamíferos/metabolismo , Células HEK293 , Células HeLa , Humanos , Camundongos , MicroRNAs/metabolismo , Dados de Sequência Molecular , Domínios e Motivos de Interação entre Proteínas , RNA Interferente Pequeno/metabolismo , Proteínas de Ligação a RNA , Ribonuclease III/química , Regulação para Cima
2.
Anal Chem ; 95(19): 7779-7787, 2023 05 16.
Artigo em Inglês | MEDLINE | ID: mdl-37141575

RESUMO

The cascade of immune responses involves activation of diverse immune cells and release of a large amount of cytokines, which leads to either normal, balanced inflammation or hyperinflammatory responses and even organ damage by sepsis. Conventional diagnosis of immunological disorders based on multiple cytokines in the blood serum has varied accuracy, and it is difficult to distinguish normal inflammation from sepsis. Herein, we present an approach to detect immunological disorders through rapid, ultrahigh-multiplex analysis of T cells using single-cell multiplex in situ tagging (scMIST) technology. scMIST permits simultaneous detection of 46 markers and cytokines from single cells without the assistance of special instruments. A cecal ligation and puncture sepsis model was built to supply T cells from two groups of mice that survived the surgery or died after 1 day. The scMIST assays have captured the T cell features and the dynamics over the course of recovery. Compared with cytokines in the peripheral blood, T cell markers show different dynamics and cytokine levels. We have applied a random forest machine learning model to single T cells from two groups of mice. Through training, the model has been able to predict the group of mice through T cell classification and majority rule with 94% accuracy. Our approach pioneers the direction of single-cell omics and could be widely applicable to human diseases.


Assuntos
Doenças do Sistema Imunitário , Sepse , Humanos , Camundongos , Animais , Citocinas , Inflamação , Linfócitos T , Sepse/diagnóstico , Modelos Animais de Doenças
3.
Bioinformatics ; 37(20): 3412-3420, 2021 Oct 25.
Artigo em Inglês | MEDLINE | ID: mdl-34014317

RESUMO

MOTIVATION: Access to large-scale genomics and transcriptomics data from various tissues and cell lines allowed the discovery of wide-spread alternative splicing events and alternative promoter usage in mammalians. Between human and mouse, gene-level orthology is currently present for nearly 16k protein-coding genes spanning a diverse repertoire of over 200k total transcript isoforms. RESULTS: Here, we describe a novel method, ExTraMapper, which leverages sequence conservation between exons of a pair of organisms and identifies a fine-scale orthology mapping at the exon and then transcript level. ExTraMapper identifies more than 350k exon mappings, as well as 30k transcript mappings between human and mouse using only sequence and gene annotation information. We demonstrate that ExTraMapper identifies a larger number of exon and transcript mappings compared to previous methods. Further, it identifies exon fusions, splits and losses due to splice site mutations, and finds mappings between microexons that are previously missed. By reanalysis of RNA-seq data from 13 matched human and mouse tissues, we show that ExTraMapper improves the correlation of transcript-specific expression levels suggesting a more accurate mapping of human and mouse transcripts. We also applied the method to detect conserved exon and transcript pairs between human and rhesus macaque genomes to highlight the point that ExTraMapper is applicable to any pair of organisms that have orthologous gene pairs. AVAILABILITY AND IMPLEMENTATION: The source code and the results are available at https://github.com/ay-lab/ExTraMapper and http://ay-lab-tools.lji.org/extramapper. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

4.
Bioinformatics ; 37(15): 2112-2120, 2021 Aug 09.
Artigo em Inglês | MEDLINE | ID: mdl-33538820

RESUMO

MOTIVATION: Deciphering the language of non-coding DNA is one of the fundamental problems in genome research. Gene regulatory code is highly complex due to the existence of polysemy and distant semantic relationship, which previous informatics methods often fail to capture especially in data-scarce scenarios. RESULTS: To address this challenge, we developed a novel pre-trained bidirectional encoder representation, named DNABERT, to capture global and transferrable understanding of genomic DNA sequences based on up and downstream nucleotide contexts. We compared DNABERT to the most widely used programs for genome-wide regulatory elements prediction and demonstrate its ease of use, accuracy and efficiency. We show that the single pre-trained transformers model can simultaneously achieve state-of-the-art performance on prediction of promoters, splice sites and transcription factor binding sites, after easy fine-tuning using small task-specific labeled data. Further, DNABERT enables direct visualization of nucleotide-level importance and semantic relationship within input sequences for better interpretability and accurate identification of conserved sequence motifs and functional genetic variant candidates. Finally, we demonstrate that pre-trained DNABERT with human genome can even be readily applied to other organisms with exceptional performance. We anticipate that the pre-trained DNABERT model can be fined tuned to many other sequence analyses tasks. AVAILABILITY AND IMPLEMENTATION: The source code, pretrained and finetuned model for DNABERT are available at GitHub (https://github.com/jerryji1993/DNABERT). SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.

5.
Brief Bioinform ; 18(2): 260-269, 2017 03 01.
Artigo em Inglês | MEDLINE | ID: mdl-26944083

RESUMO

Given that the majority of multi-exon genes generate diverse functional products, it is important to evaluate expression at the isoform level. Previous studies have demonstrated strong gene-level correlations between RNA sequencing (RNA-seq) and microarray platforms, but have not studied their concordance at the isoform level. We performed transcript abundance estimation on raw RNA-seq and exon-array expression profiles available for common glioblastoma multiforme samples from The Cancer Genome Atlas using different analysis pipelines, and compared both the isoform- and gene-level expression estimates between programs and platforms. The results showed better concordance between RNA-seq/exon-array and reverse transcription-quantitative polymerase chain reaction (RT-qPCR) platforms for fold change estimates than for raw abundance estimates, suggesting that fold change normalization against a control is an important step for integrating expression data across platforms. Based on RT-qPCR validations, eXpress and Multi-Mapping Bayesian Gene eXpression (MMBGX) programs achieved the best performance for RNA-seq and exon-array platforms, respectively, for deriving the isoform-level fold change values. While eXpress achieved the highest correlation with the RT-qPCR and exon-array (MMBGX) results overall, RSEM was more highly correlated with MMBGX for the subset of transcripts that are highly variable across the samples. eXpress appears to be most successful in discriminating lowly expressed transcripts, but IsoformEx and RSEM correlate more strongly with MMBGX for highly expressed transcripts. The results also reinforce how potentially important isoform-level expression changes can be masked by gene-level estimates, and demonstrate that exon arrays yield comparable results to RNA-seq for evaluating isoform-level expression changes.


Assuntos
Algoritmos , Teorema de Bayes , Éxons , Perfilação da Expressão Gênica , Humanos , Análise de Sequência com Séries de Oligonucleotídeos , Isoformas de Proteínas , RNA , Análise de Sequência de RNA
6.
Genome Res ; 24(6): 1039-50, 2014 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-24676094

RESUMO

Mapping genome-wide data to human subtelomeres has been problematic due to the incomplete assembly and challenges of low-copy repetitive DNA elements. Here, we provide updated human subtelomere sequence assemblies that were extended by filling telomere-adjacent gaps using clone-based resources. A bioinformatic pipeline incorporating multiread mapping for annotation of the updated assemblies using short-read data sets was developed and implemented. Annotation of subtelomeric sequence features as well as mapping of CTCF and cohesin binding sites using ChIP-seq data sets from multiple human cell types confirmed that CTCF and cohesin bind within 3 kb of the start of terminal repeat tracts at many, but not all, subtelomeres. CTCF and cohesin co-occupancy were also enriched near internal telomere-like sequence (ITS) islands and the nonterminal boundaries of subtelomere repeat elements (SREs) in transformed lymphoblastoid cell lines (LCLs) and human embryonic stem cell (ES) lines, but were not significantly enriched in the primary fibroblast IMR90 cell line. Subtelomeric CTCF and cohesin sites predicted by ChIP-seq using our bioinformatics pipeline (but not predicted when only uniquely mapping reads were considered) were consistently validated by ChIP-qPCR. The colocalized CTCF and cohesin sites in SRE regions are candidates for mediating long-range chromatin interactions in the transcript-rich SRE region. A public browser for the integrated display of short-read sequence-based annotations relative to key subtelomere features such as the start of each terminal repeat tract, SRE identity and organization, and subtelomeric gene models was established.


Assuntos
Proteínas de Ciclo Celular/genética , Proteínas Cromossômicas não Histona/genética , Genoma Humano , Proteínas Repressoras/genética , Telômero/genética , Sequências Repetidas Terminais , Sequência de Bases , Fator de Ligação a CCCTC , Linhagem Celular , Células-Tronco Embrionárias/metabolismo , Fibroblastos/metabolismo , Humanos , Anotação de Sequência Molecular/métodos , Dados de Sequência Molecular , Ligação Proteica , Proteínas Repressoras/metabolismo , Coesinas
7.
Proc Natl Acad Sci U S A ; 111(1): 291-6, 2014 Jan 07.
Artigo em Inglês | MEDLINE | ID: mdl-24368849

RESUMO

Glioblastoma multiforme (GBM) and the mesenchymal GBM subtype in particular are highly malignant tumors that frequently exhibit regions of severe hypoxia and necrosis. Because these features correlate with poor prognosis, we investigated microRNAs whose expression might regulate hypoxic GBM cell survival and growth. We determined that the expression of microRNA-218 (miR-218) is decreased significantly in highly necrotic mesenchymal GBM, and orthotopic tumor studies revealed that reduced miR-218 levels confer GBM resistance to chemotherapy. Importantly, miR-218 targets multiple components of receptor tyrosine kinase (RTK) signaling pathways, and miR-218 repression increases the abundance and activity of multiple RTK effectors. This elevated RTK signaling also promotes the activation of hypoxia-inducible factor (HIF), most notably HIF2α. We further show that RTK-mediated HIF2α regulation is JNK dependent, via jun proto-oncogene. Collectively, our results identify an miR-218-RTK-HIF2α signaling axis that promotes GBM cell survival and tumor angiogenesis, particularly in necrotic mesenchymal tumors.


Assuntos
Fatores de Transcrição Hélice-Alça-Hélice Básicos/metabolismo , Neoplasias Encefálicas/metabolismo , Glioblastoma/metabolismo , Mesoderma/metabolismo , MicroRNAs/metabolismo , Receptores Proteína Tirosina Quinases/metabolismo , Adulto , Idoso , Idoso de 80 Anos ou mais , Animais , Antineoplásicos/farmacologia , Sobrevivência Celular , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Humanos , Hipóxia , Camundongos , Camundongos Nus , Pessoa de Meia-Idade , Necrose , Transplante de Neoplasias , Neovascularização Patológica , Análise de Sequência com Séries de Oligonucleotídeos , Proto-Oncogene Mas , Transdução de Sinais , Adulto Jovem
8.
EMBO J ; 31(21): 4165-78, 2012 Nov 05.
Artigo em Inglês | MEDLINE | ID: mdl-23010778

RESUMO

The contribution of human subtelomeric DNA and chromatin organization to telomere integrity and chromosome end protection is not yet understood in molecular detail. Here, we show by ChIP-Seq that most human subtelomeres contain a CTCF- and cohesin-binding site within ∼1-2 kb of the TTAGGG repeat tract and adjacent to a CpG-islands implicated in TERRA transcription control. ChIP-Seq also revealed that RNA polymerase II (RNAPII) was enriched at sites adjacent to the CTCF sites and extending towards the telomere repeat tracts. Mutation of CTCF-binding sites in plasmid-borne promoters reduced transcriptional activity in an orientation-dependent manner. Depletion of CTCF by shRNA led to a decrease in TERRA transcription, and a loss of cohesin and RNAPII binding to the subtelomeres. Depletion of either CTCF or cohesin subunit Rad21 caused telomere-induced DNA damage foci (TIF) formation, and destabilized TRF1 and TRF2 binding to the TTAGGG proximal subtelomere DNA. These findings indicate that CTCF and cohesin are integral components of most human subtelomeres, and important for the regulation of TERRA transcription and telomere end protection.


Assuntos
Proteínas de Ciclo Celular/metabolismo , Cromatina/genética , Proteínas Cromossômicas não Histona/metabolismo , Proteínas de Ligação a DNA/genética , Regulação da Expressão Gênica , Proteínas Repressoras/metabolismo , Telômero/genética , Fatores de Transcrição/genética , Transcrição Gênica , Fator de Ligação a CCCTC , Proteínas de Ciclo Celular/genética , Células Cultivadas , Imunoprecipitação da Cromatina , Proteínas Cromossômicas não Histona/genética , Ilhas de CpG/genética , Ensaio de Desvio de Mobilidade Eletroforética , Imunofluorescência , Humanos , Luciferases/metabolismo , Neoplasias/genética , Neoplasias/patologia , Proteínas Nucleares/genética , Proteínas Nucleares/metabolismo , Fosfoproteínas/genética , Fosfoproteínas/metabolismo , Regiões Promotoras Genéticas/genética , RNA Polimerase II/genética , RNA Polimerase II/metabolismo , RNA Mensageiro/genética , Reação em Cadeia da Polimerase em Tempo Real , Proteínas Repressoras/genética , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Coesinas
9.
Genome Res ; 23(9): 1446-61, 2013 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-23796952

RESUMO

The functional roles of SNPs within the 8q24 gene desert in the cancer phenotype are not yet well understood. Here, we report that CCAT2, a novel long noncoding RNA transcript (lncRNA) encompassing the rs6983267 SNP, is highly overexpressed in microsatellite-stable colorectal cancer and promotes tumor growth, metastasis, and chromosomal instability. We demonstrate that MYC, miR-17-5p, and miR-20a are up-regulated by CCAT2 through TCF7L2-mediated transcriptional regulation. We further identify the physical interaction between CCAT2 and TCF7L2 resulting in an enhancement of WNT signaling activity. We show that CCAT2 is itself a WNT downstream target, which suggests the existence of a feedback loop. Finally, we demonstrate that the SNP status affects CCAT2 expression and the risk allele G produces more CCAT2 transcript. Our results support a new mechanism of MYC and WNT regulation by the novel lncRNA CCAT2 in colorectal cancer pathogenesis, and provide an alternative explanation of the SNP-conferred cancer risk.


Assuntos
Instabilidade Cromossômica , Cromossomos Humanos Par 8/genética , Neoplasias do Colo/genética , RNA Longo não Codificante/genética , RNA Longo não Codificante/metabolismo , Animais , Neoplasias da Mama/genética , Neoplasias da Mama/metabolismo , Estudos de Casos e Controles , Linhagem Celular Tumoral , Neoplasias do Colo/metabolismo , Neoplasias do Colo/patologia , Feminino , Regulação Neoplásica da Expressão Gênica , Humanos , Masculino , Camundongos , MicroRNAs/genética , MicroRNAs/metabolismo , Metástase Neoplásica/genética , Polimorfismo de Nucleotídeo Único , Proteínas Proto-Oncogênicas c-myc/genética , Proteínas Proto-Oncogênicas c-myc/metabolismo , Proteína 1 Semelhante ao Fator 7 de Transcrição/genética , Proteína 1 Semelhante ao Fator 7 de Transcrição/metabolismo , Transcrição Gênica , Via de Sinalização Wnt
10.
J Virol ; 89(1): 799-810, 2015 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-25355877

RESUMO

UNLABELLED: Although monocytes and macrophages are targets of HIV-1-mediated immunopathology, the impact of high viremia on activation-induced monocyte apoptosis relative to monocyte and macrophage activation changes remains undetermined. In this study, we determined constitutive and oxidative stress-induced monocyte apoptosis in uninfected and HIV(+) individuals across a spectrum of viral loads (n = 35; range, 2,243 to 1,355,998 HIV-1 RNA copies/ml) and CD4 counts (range, 26 to 801 cells/mm(3)). Both constitutive apoptosis and oxidative stress-induced apoptosis were positively associated with viral load and negatively associated with CD4, with an elevation in apoptosis occurring in patients with more than 40,000 (4.6 log) copies/ml. As expected, expression of Rb1 and interferon-stimulated genes (ISGs), plasma soluble CD163 (sCD163) concentration, and the proportion of CD14(++) CD16(+) intermediate monocytes were elevated in viremic patients compared to those in uninfected controls. Although CD14(++) CD16(+) frequencies, sCD14, sCD163, and most ISG expression were not directly associated with a change in apoptosis, sCD14 and ISG expression showed an association with increasing viral load. Multivariable analysis of clinical values and monocyte gene expression identified changes in IFI27, IFITM2, Rb1, and Bcl2 expression as determinants of constitutive apoptosis (P = 3.77 × 10(-5); adjusted R(2) = 0.5983), while changes in viral load, IFITM2, Rb1, and Bax expression were determinants of oxidative stress-induced apoptosis (P = 5.59 × 10(-5); adjusted R(2) = 0.5996). Our data demonstrate differential activation states in monocytes between levels of viremia in association with differences in apoptosis that may contribute to greater monocyte turnover with high viremia. IMPORTANCE: This study characterized differential monocyte activation, apoptosis, and apoptosis-related gene expression in low- versus high-level viremic HIV-1 patients, suggesting a shift in apoptosis regulation that may be associated with disease state. Using single and multivariable analysis of monocyte activation parameters and gene expression, we supported the hypothesis that monocyte apoptosis in HIV disease is a reflection of viremia and activation state with contributions from gene expression changes within the ISG and Bcl2 gene families. Understanding monocyte apoptosis response may inform HIV immunopathogenesis, retention of infected macrophages, and monocyte turnover in low- or high-viral-load states.


Assuntos
Apoptose , Infecções por HIV/imunologia , Infecções por HIV/virologia , HIV-1/imunologia , Monócitos/imunologia , Carga Viral , Adulto , Idoso de 80 Anos ou mais , Doença Crônica , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Monócitos/fisiologia , Proteínas Proto-Oncogênicas c-bcl-2/biossíntese , Proteína do Retinoblastoma/biossíntese , Adulto Jovem
11.
Nucleic Acids Res ; 42(8): e64, 2014 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-24503249

RESUMO

Molecular stratification of tumors is essential for developing personalized therapies. Although patient stratification strategies have been successful; computational methods to accurately translate the gene-signature from high-throughput platform to a clinically adaptable low-dimensional platform are currently lacking. Here, we describe PIGExClass (platform-independent isoform-level gene-expression based classification-system), a novel computational approach to derive and then transfer gene-signatures from one analytical platform to another. We applied PIGExClass to design a reverse transcriptase-quantitative polymerase chain reaction (RT-qPCR) based molecular-subtyping assay for glioblastoma multiforme (GBM), the most aggressive primary brain tumors. Unsupervised clustering of TCGA (the Cancer Genome Altas Consortium) GBM samples, based on isoform-level gene-expression profiles, recaptured the four known molecular subgroups but switched the subtype for 19% of the samples, resulting in significant (P = 0.0103) survival differences among the refined subgroups. PIGExClass derived four-class classifier, which requires only 121 transcript-variants, assigns GBM patients' molecular subtype with 92% accuracy. This classifier was translated to an RT-qPCR assay and validated in an independent cohort of 206 GBM samples. Our results demonstrate the efficacy of PIGExClass in the design of clinically adaptable molecular subtyping assay and have implications for developing robust diagnostic assays for cancer patient stratification.


Assuntos
Neoplasias Encefálicas/classificação , Perfilação da Expressão Gênica/métodos , Glioblastoma/classificação , Isoformas de Proteínas/genética , Adulto , Idoso , Algoritmos , Neoplasias Encefálicas/genética , Neoplasias Encefálicas/mortalidade , Feminino , Glioblastoma/genética , Glioblastoma/mortalidade , Humanos , Masculino , Pessoa de Meia-Idade , Prognóstico , Isoformas de Proteínas/metabolismo , Reação em Cadeia da Polimerase Via Transcriptase Reversa
12.
BMC Genomics ; 16 Suppl 11: S3, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26576613

RESUMO

BACKGROUND: Many supervised learning algorithms have been applied in deriving gene signatures for patient stratification from gene expression data. However, transferring the multi-gene signatures from one analytical platform to another without loss of classification accuracy is a major challenge. Here, we compared three unsupervised data discretization methods--Equal-width binning, Equal-frequency binning, and k-means clustering--in accurately classifying the four known subtypes of glioblastoma multiforme (GBM) when the classification algorithms were trained on the isoform-level gene expression profiles from exon-array platform and tested on the corresponding profiles from RNA-seq data. RESULTS: We applied an integrated machine learning framework that involves three sequential steps; feature selection, data discretization, and classification. For models trained and tested on exon-array data, the addition of data discretization step led to robust and accurate predictive models with fewer number of variables in the final models. For models trained on exon-array data and tested on RNA-seq data, the addition of data discretization step dramatically improved the classification accuracies with Equal-frequency binning showing the highest improvement with more than 90% accuracies for all the models with features chosen by Random Forest based feature selection. Overall, SVM classifier coupled with Equal-frequency binning achieved the best accuracy (> 95%). Without data discretization, however, only 73.6% accuracy was achieved at most. CONCLUSIONS: The classification algorithms, trained and tested on data from the same platform, yielded similar accuracies in predicting the four GBM subgroups. However, when dealing with cross-platform data, from exon-array to RNA-seq, the classifiers yielded stable models with highest classification accuracies on data transformed by Equal frequency binning. The approach presented here is generally applicable to other cancer types for classification and identification of molecular subgroups by integrating data across different gene expression platforms.


Assuntos
Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Glioblastoma/classificação , Glioblastoma/genética , Aprendizado de Máquina , Isoformas de RNA/genética , Algoritmos , Análise por Conglomerados , Humanos
13.
Proc Natl Acad Sci U S A ; 109(22): 8646-51, 2012 May 29.
Artigo em Inglês | MEDLINE | ID: mdl-22586128

RESUMO

A genome-wide association study of papillary thyroid carcinoma (PTC) pinpointed two independent SNPs (rs944289 and rs965513) located in regions containing no annotated genes (14q13.3 and 9q22.33, respectively). Here, we describe a unique, long, intergenic, noncoding RNA gene (lincRNA) named Papillary Thyroid Carcinoma Susceptibility Candidate 3 (PTCSC3) located 3.2 kb downstream of rs944289 at 14q.13.3 and the expression of which is strictly thyroid specific. By quantitative PCR, PTCSC3 expression was strongly down-regulated (P = 2.84 × 10(-14)) in thyroid tumor tissue of 46 PTC patients and the risk allele (T) was associated with the strongest suppression (genotype [TT] (n = 21) vs. [CT] (n = 19), P = 0.004). In adjacent unaffected thyroid tissue, the genotype [TT] was associated with up-regulation of PTCSC3 ([TT] (n = 21) vs. [CT] (n = 19), P = 0.034). The SNP rs944289 was located in a binding site for the CCAAT/enhancer binding proteins (C/EBP) α and ß. The risk allele destroyed the binding site in silico. Both C/EBPα and C/EBPß activated the PTCSC3 promoter in reporter assays (P = 0.0009 and P = 0.0014, respectively) and the risk allele reduced the activation compared with the nonrisk allele (C) (P = 0.026 and P = 0.048, respectively). Restoration of PTCSC3 expression in PTC cell line cells (TPC-1 and BCPAP) inhibited cell growth (P = 0.002 and P = 0.019, respectively) and affected the expression of genes involved in DNA replication, recombination and repair, cellular movement, tumor morphology, and cell death. Our data suggest that SNP rs944289 predisposes to PTC through a previously uncharacterized, long intergenic noncoding RNA gene (PTCSC3) that has the characteristics of a tumor suppressor.


Assuntos
Carcinoma Papilar/genética , Polimorfismo de Nucleotídeo Único , RNA não Traduzido/genética , Neoplasias da Glândula Tireoide/genética , Animais , Sítios de Ligação/genética , Northern Blotting , Proteína beta Intensificadora de Ligação a CCAAT/metabolismo , Células COS , Carcinoma Papilar/patologia , Linhagem Celular Tumoral , Proliferação de Células , Chlorocebus aethiops , Cromossomos Humanos Par 14/genética , Perfilação da Expressão Gênica , Regulação Neoplásica da Expressão Gênica , Genes Supressores de Tumor , Predisposição Genética para Doença/genética , Genótipo , Células HEK293 , Humanos , Dados de Sequência Molecular , Análise de Sequência com Séries de Oligonucleotídeos , Reação em Cadeia da Polimerase Via Transcriptase Reversa , Neoplasias da Glândula Tireoide/patologia
14.
Genome Res ; 21(8): 1260-72, 2011 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-21712398

RESUMO

Despite our growing knowledge that many mammalian genes generate multiple transcript variants that may encode functionally distinct protein isoforms, the transcriptomes of various tissues and their developmental stages are poorly defined. Identifying the transcriptome and its regulation in a cell/tissue is the key to deciphering the cell/tissue-specific functions of a gene. We built a genome-wide inventory of noncoding and protein-coding transcripts (transcriptomes), their promoters (promoteromes) and histone modification states (epigenomes) for developing, and adult cerebella using integrative massive-parallel sequencing and bioinformatics approach. The data consists of 61,525 (12,796 novel) distinct mRNAs transcribed by 29,589 (4792 novel) promoters corresponding to 15,669 protein-coding and 7624 noncoding genes. Importantly, our results show that the transcript variants from a gene are predominantly generated using alternative transcriptional rather than splicing mechanisms, highlighting alternative promoters and transcriptional terminations as major sources of transcriptome diversity. Moreover, H3K4me3, and not H3K27me3, defined the use of alternative promoters, and we identified a combinatorial role of H3K4me3 and H3K27me3 in regulating the expression of transcripts, including transcript variants of a gene during development. We observed a strong bias of both H3K4me3 and H3K27me3 for CpG-rich promoters and an exponential relationship between their enrichment and corresponding transcript expression. Furthermore, the majority of genes associated with neurological diseases expressed multiple transcripts through alternative promoters, and we demonstrated aberrant use of alternative promoters in medulloblastoma, cancer arising in the cerebellum. The transcriptomes of developing and adult cerebella presented in this study emphasize the importance of analyzing gene regulation and function at the isoform level.


Assuntos
Processamento Alternativo , Cerebelo/crescimento & desenvolvimento , Transcrição Gênica , Transcriptoma , Animais , Neoplasias Cerebelares/genética , Neoplasias Cerebelares/metabolismo , Cerebelo/metabolismo , Biologia Computacional , Epigênese Genética , Regulação da Expressão Gênica no Desenvolvimento , Genoma , Meduloblastoma/genética , Meduloblastoma/metabolismo , Camundongos , Camundongos Endogâmicos , Regiões Promotoras Genéticas , RNA Mensageiro/metabolismo
15.
ArXiv ; 2024 Feb 15.
Artigo em Inglês | MEDLINE | ID: mdl-38410647

RESUMO

Effective DNA embedding remains crucial in genomic analysis, particularly in scenarios lacking labeled data for model fine-tuning, despite the significant advancements in genome foundation models. A prime example is metagenomics binning, a critical process in microbiome research that aims to group DNA sequences by their species from a complex mixture of DNA sequences derived from potentially thousands of distinct, often uncharacterized species. To fill the lack of effective DNA embedding models, we introduce DNABERT-S, a genome foundation model that specializes in creating species-aware DNA embeddings. To encourage effective embeddings to error-prone long-read DNA sequences, we introduce Manifold Instance Mixup (MI-Mix), a contrastive objective that mixes the hidden representations of DNA sequences at randomly selected layers and trains the model to recognize and differentiate these mixed proportions at the output layer. We further enhance it with the proposed Curriculum Contrastive Learning (C2LR) strategy. Empirical results on 18 diverse datasets showed DNABERT-S's remarkable performance. It outperforms the top baseline's performance in 10-shot species classification with just a 2-shot training while doubling the Adjusted Rand Index (ARI) in species clustering and substantially increasing the number of correctly identified species in metagenomics binning. The code, data, and pre-trained model are publicly available at https://github.com/Zhihan1996/DNABERT_S.

16.
BMC Bioinformatics ; 14: 262, 2013 Aug 27.
Artigo em Inglês | MEDLINE | ID: mdl-23981227

RESUMO

BACKGROUND: RNA-seq, a massive parallel-sequencing-based transcriptome profiling method, provides digital data in the form of aligned sequence read counts. The comparative analyses of the data require appropriate statistical methods to estimate the differential expression of transcript variants across different cell/tissue types and disease conditions. RESULTS: We developed a novel nonparametric empirical Bayesian-based approach (NPEBseq) to model the RNA-seq data. The prior distribution of the Bayesian model is empirically estimated from the data without any parametric assumption, and hence the method is "nonparametric" in nature. Based on this model, we proposed a method for detecting differentially expressed genes across different conditions. We also extended this method to detect differential usage of exons from RNA-seq data. The evaluation of NPEBseq on both simulated and publicly available RNA-seq datasets and comparison with three popular methods showed improved results for experiments with or without biological replicates. CONCLUSIONS: NPEBseq can successfully detect differential expression between different conditions not only at gene level but also at exon level from RNA-seq datasets. In addition, NPEBSeq performs significantly better than current methods and can be applied to genome-wide RNA-seq datasets. Sample datasets and R package are available at http://bioinformatics.wistar.upenn.edu/NPEBseq.


Assuntos
Teorema de Bayes , Biologia Computacional/métodos , Perfilação da Expressão Gênica/métodos , Análise de Sequência de RNA/métodos , Software , Sequenciamento de Nucleotídeos em Larga Escala , RNA/análise , RNA/genética , Alinhamento de Sequência , Estatísticas não Paramétricas
17.
J Virol ; 86(10): 5752-62, 2012 May.
Artigo em Inglês | MEDLINE | ID: mdl-22419807

RESUMO

LANA is essential for tethering the Kaposi's sarcoma-associated herpesvirus (KSHV) genome to metaphase chromosomes and for modulating host-cell gene expression, but the binding sites in the host-chromosome remain unknown. Here, we use LANA-specific chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-Seq) to identify LANA binding sites in the viral and host-cell genomes of a latently infected pleural effusion lymphoma cell line BCBL1. LANA bound with high occupancy to the KSHV genome terminal repeats (TR) and to a few minor binding sites in the KSHV genome, including the LANA promoter region. We identified 256 putative LANA binding site peaks with P < 0.01 and overlap in two independent ChIP-Seq experiments. We validated several of the high-occupancy binding sites by conventional ChIP assays and quantitative PCR. Candidate cellular LANA binding motifs were identified and assayed for binding to purified recombinant LANA protein in vitro but bound with low affinity compared to the viral TR binding site. More than half of the LANA binding sites (170/256) could be mapped to within 2.5 kb of a cellular gene transcript. Pathways and Gene Ontogeny (GO) analysis revealed that LANA binds to genes within the p53 and tumor necrosis factor (TNF) regulatory network. Further analysis revealed partial overlap of LANA and STAT1 binding sites in several gamma interferon (IFN-γ)-regulated genes. We show that ectopic expression of LANA can downmodulate IFN-γ-mediated activation of a subset of genes, including the TAP1 peptide transporter and proteasome subunit beta type 9 (PSMB9), both of which are required for class I antigen presentation. Our data provide a potential mechanism through which LANA may regulate several host cell pathways by direct binding to gene regulatory elements.


Assuntos
Antígenos Virais/metabolismo , Cromossomos Humanos/virologia , Herpesvirus Humano 8/metabolismo , Proteínas Nucleares/metabolismo , Sarcoma de Kaposi/genética , Sarcoma de Kaposi/virologia , Antígenos Virais/química , Antígenos Virais/genética , Sequência de Bases , Sítios de Ligação , Linhagem Celular , Cromossomos Humanos/química , Regulação da Expressão Gênica , Marcação de Genes , Herpesvirus Humano 8/química , Herpesvirus Humano 8/genética , Interações Hospedeiro-Patógeno , Humanos , Dados de Sequência Molecular , Proteínas Nucleares/química , Proteínas Nucleares/genética , Ligação Proteica
18.
Nucleic Acids Res ; 39(Database issue): D92-7, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21097880

RESUMO

MPromDb (Mammalian Promoter Database) is a curated database that strives to annotate gene promoters identified from ChIP-seq results with the goal of providing an integrated resource for mammalian transcriptional regulation and epigenetics. We analyzed 507 million uniquely aligned RNAP-II ChIP-seq reads from 26 different data sets that include six human cell-types and 10 distinct mouse cell/tissues. The updated MPromDb version consists of computationally predicted (novel) and known active RNAP-II promoters (42,893 human and 48,366 mouse promoters) from various data sets freely available at NCBI GEO database. We found that 36% and 40% of protein-coding genes have alternative promoters in human and mouse genomes and ∼40% of promoters are tissue/cell specific. The identified RNAP-II promoters were annotated using various known and novel gene models. Additionally, for novel promoters we looked into other evidences-GenBank mRNAs, spliced ESTs, CAGE promoter tags and mRNA-seq reads. Users can search the database based on gene id/symbol, or by specific tissue/cell type and filter results based on any combination of tissue/cell specificity, Known/Novel, CpG/NonCpG, and protein-coding/non-coding gene promoters. We have also integrated GBrowse genome browser with MPromDb for visualization of ChIP-seq profiles and to display the annotations. The current release of MPromDb can be accessed at http://bioinformatics.wistar.upenn.edu/MPromDb/.


Assuntos
Imunoprecipitação da Cromatina , Bases de Dados de Ácidos Nucleicos , Regiões Promotoras Genéticas , Animais , Gráficos por Computador , Humanos , Camundongos , RNA Polimerase II/metabolismo , Análise de Sequência de DNA , Integração de Sistemas
19.
Nucleic Acids Res ; 39(1): 190-201, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-20843783

RESUMO

Alternative promoters that are differentially used in various cellular contexts and tissue types add to the transcriptional complexity in mammalian genome. Identification of alternative promoters and the annotation of their activity in different tissues is one of the major challenges in understanding the transcriptional regulation of the mammalian genes and their isoforms. To determine the use of alternative promoters in different tissues, we performed ChIP-seq experiments using antibody against RNA Pol-II, in five adult mouse tissues (brain, liver, lung, spleen and kidney). Our analysis identified 38 639 Pol-II promoters, including 12 270 novel promoters, for both protein coding and non-coding mouse genes. Of these, 6384 promoters are tissue specific which are CpG poor and we find that only 34% of the novel promoters are located in CpG-rich regions, suggesting that novel promoters are mostly tissue specific. By identifying the Pol-II bound promoter(s) of each annotated gene in a given tissue, we found that 37% of the protein coding genes use alternative promoters in the five mouse tissues. The promoter annotations and ChIP-seq data presented here will aid ongoing efforts of characterizing gene regulatory regions in mammalian genomes.


Assuntos
Regiões Promotoras Genéticas , RNA Polimerase II/metabolismo , Animais , Imunoprecipitação da Cromatina , Mapeamento Cromossômico , Genoma , Camundongos , Análise de Sequência de DNA/normas , Transcrição Gênica
20.
BMC Genomics ; 13: 440, 2012 Aug 31.
Artigo em Inglês | MEDLINE | ID: mdl-22938532

RESUMO

BACKGROUND: With over 1.3 billion people, India is estimated to contain three times more genetic diversity than does Europe. Next-generation sequencing technologies have facilitated the understanding of diversity by enabling whole genome sequencing at greater speed and lower cost. While genomes from people of European and Asian descent have been sequenced, only recently has a single male genome from the Indian subcontinent been published at sufficient depth and coverage. In this study we have sequenced and analyzed the genome of a South Asian Indian female (SAIF) from the Indian state of Kerala. RESULTS: We identified over 3.4 million SNPs in this genome including over 89,873 private variations. Comparison of the SAIF genome with several published personal genomes revealed that this individual shared ~50% of the SNPs with each of these genomes. Analysis of the SAIF mitochondrial genome showed that it was closely related to the U1 haplogroup which has been previously observed in Kerala. We assessed the SAIF genome for SNPs with health and disease consequences and found that the individual was at a higher risk for multiple sclerosis and a few other diseases. In analyzing SNPs that modulate drug response, we found a variation that predicts a favorable response to metformin, a drug used to treat diabetes. SNPs predictive of adverse reaction to warfarin indicated that the SAIF individual is not at risk for bleeding if treated with typical doses of warfarin. In addition, we report the presence of several additional SNPs of medical relevance. CONCLUSIONS: This is the first study to report the complete whole genome sequence of a female from the state of Kerala in India. The availability of this complete genome and variants will further aid studies aimed at understanding genetic diversity, identifying clinically relevant changes and assessing disease burden in the Indian population.


Assuntos
Povo Asiático/genética , Mapeamento Cromossômico , Genoma Humano , Genoma Mitocondrial , Polimorfismo de Nucleotídeo Único , Anticoagulantes/efeitos adversos , Variações do Número de Cópias de DNA , Diabetes Mellitus/genética , Diabetes Mellitus/prevenção & controle , Feminino , Predisposição Genética para Doença , Variação Genética , Haplótipos , Hemorragia/induzido quimicamente , Hemorragia/genética , Hemorragia/prevenção & controle , Humanos , Hipoglicemiantes/uso terapêutico , Índia , Metformina/uso terapêutico , Pessoa de Meia-Idade , Esclerose Múltipla/genética , Esclerose Múltipla/prevenção & controle , Análise de Sequência de DNA , Varfarina/efeitos adversos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA