Pesquisa | BVS Violência e Saúde

Identification of the human DPR core promoter element using machine learning.

Vo Ngoc, Long; Huang, Cassidy Yunjing; Cassidy, California Jack; Medrano, Claudia; Kadonaga, James T.

Nature ; 585(7825): 459-463, 2020 09.

Artigo em Inglês | MEDLINE | ID: mdl-32908305

RESUMO

The RNA polymerase II (Pol II) core promoter is the strategic site of convergence of the signals that lead to the initiation of DNA transcription1-5, but the downstream core promoter in humans has been difficult to understand1-3. Here we analyse the human Pol II core promoter and use machine learning to generate predictive models for the downstream core promoter region (DPR) and the TATA box. We developed a method termed HARPE (high-throughput analysis of randomized promoter elements) to create hundreds of thousands of DPR (or TATA box) variants, each with known transcriptional strength. We then analysed the HARPE data by support vector regression (SVR) to provide comprehensive models for the sequence motifs, and found that the SVR-based approach is more effective than a consensus-based method for predicting transcriptional activity. These results show that the DPR is a functionally important core promoter element that is widely used in human promoters. Notably, there appears to be a duality between the DPR and the TATA box, as many promoters contain one or the other element. More broadly, these findings show that functional DNA motifs can be identified by machine learning analysis of a comprehensive set of sequence variants.

Assuntos

Sequência Consenso/genética , Regulação da Expressão Gênica/genética , Regiões Promotoras Genéticas/genética , RNA Polimerase II/metabolismo , Máquina de Vetores de Suporte , Transcrição Gênica , Sequência de Bases , Células/metabolismo , Simulação por Computador , Conjuntos de Dados como Assunto , Células HeLa , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Modelos Genéticos , Mutagênese , TATA Box/genética

The human initiator is a distinct and abundant element that is precisely positioned in focused core promoters.

Vo Ngoc, Long; Cassidy, California Jack; Huang, Cassidy Yunjing; Duttke, Sascha H C; Kadonaga, James T.

Genes Dev ; 31(1): 6-11, 2017 01 01.

Artigo em Inglês | MEDLINE | ID: mdl-28108474

RESUMO

DNA sequence signals in the core promoter, such as the initiator (Inr), direct transcription initiation by RNA polymerase II. Here we show that the human Inr has the consensus of BBCA+1BW at focused promoters in which transcription initiates at a single site or a narrow cluster of sites. The analysis of 7678 focused transcription start sites revealed 40% with a perfect match to the Inr and 16% with a single mismatch outside of the CA+1 core. TATA-like sequences are underrepresented in Inr promoters. This consensus is a key component of the DNA sequence rules that specify transcription initiation in humans.

Assuntos

Regiões Promotoras Genéticas/genética , Sítio de Iniciação de Transcrição , Sequência Conservada/genética , Análise Mutacional de DNA , Humanos , Células MCF-7 , Mutação , Homologia de Sequência do Ácido Nucleico , TATA Box/genética

An automated proteogenomic method uses mass spectrometry to reveal novel genes in Zea mays.

Castellana, Natalie E; Shen, Zhouxin; He, Yupeng; Walley, Justin W; Cassidy, California Jack; Briggs, Steven P; Bafna, Vineet.

Mol Cell Proteomics ; 13(1): 157-67, 2014 Jan.

Artigo em Inglês | MEDLINE | ID: mdl-24142994

RESUMO

New technologies in genomics and proteomics have influenced the emergence of proteogenomics, a field at the confluence of genomics, transcriptomics, and proteomics. First generation proteogenomic toolkits employ peptide mass spectrometry to identify novel protein coding regions. We extend first generation proteogenomic tools to achieve greater accuracy and enable the analysis of large, complex genomes. We apply our pipeline to Zea mays, which has a genome comparable in size to human. Our pipeline begins with the comparison of mass spectra to a putative translation of the genome. We select novel peptides, those that match a region of the genome that was not previously known to be protein coding, for grouping into refinement events. We present a novel, probabilistic framework for evaluating the accuracy of each event. Our calculated event probability, or eventProb, considers the number of supporting peptides and spectra, and the quality of each supporting peptide-spectrum match. Our pipeline predicts 165 novel protein-coding genes and proposes updated models for 741 additional genes.

Assuntos

Genômica , Proteômica , Zea mays/genética , Genoma de Planta , Humanos , Espectrometria de Massas , Fases de Leitura Aberta

RESUMO

Assuntos

RESUMO

Assuntos

RESUMO

Assuntos

ENVIAR RESULTADO:

SELEÇÃO DE REFERÊNCIAS

DETALHE DA PESQUISA