Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 35
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nature ; 599(7886): 684-691, 2021 11.
Artigo em Inglês | MEDLINE | ID: mdl-34789882

RESUMO

The three-dimensional (3D) structure of chromatin is intrinsically associated with gene regulation and cell function1-3. Methods based on chromatin conformation capture have mapped chromatin structures in neuronal systems such as in vitro differentiated neurons, neurons isolated through fluorescence-activated cell sorting from cortical tissues pooled from different animals and from dissociated whole hippocampi4-6. However, changes in chromatin organization captured by imaging, such as the relocation of Bdnf away from the nuclear periphery after activation7, are invisible with such approaches8. Here we developed immunoGAM, an extension of genome architecture mapping (GAM)2,9, to map 3D chromatin topology genome-wide in specific brain cell types, without tissue disruption, from single animals. GAM is a ligation-free technology that maps genome topology by sequencing the DNA content from thin (about 220 nm) nuclear cryosections. Chromatin interactions are identified from the increased probability of co-segregation of contacting loci across a collection of nuclear slices. ImmunoGAM expands the scope of GAM to enable the selection of specific cell types using low cell numbers (approximately 1,000 cells) within a complex tissue and avoids tissue dissociation2,10. We report cell-type specialized 3D chromatin structures at multiple genomic scales that relate to patterns of gene expression. We discover extensive 'melting' of long genes when they are highly expressed and/or have high chromatin accessibility. The contacts most specific of neuron subtypes contain genes associated with specialized processes, such as addiction and synaptic plasticity, which harbour putative binding sites for neuronal transcription factors within accessible chromatin regions. Moreover, sensory receptor genes are preferentially found in heterochromatic compartments in brain cells, which establish strong contacts across tens of megabases. Our results demonstrate that highly specific chromatin conformations in brain cells are tightly related to gene regulation mechanisms and specialized functions.


Assuntos
Encéfalo/citologia , Células/classificação , Montagem e Desmontagem da Cromatina , Cromatina/química , Cromatina/genética , Genes , Conformação Molecular , Animais , Sítios de Ligação , Células/metabolismo , Cromatina/metabolismo , Regulação da Expressão Gênica , Masculino , Camundongos , Família Multigênica/genética , Neurônios/classificação , Neurônios/metabolismo , Desnaturação de Ácido Nucleico , Fatores de Transcrição/metabolismo
2.
Nat Methods ; 20(7): 1037-1047, 2023 07.
Artigo em Inglês | MEDLINE | ID: mdl-37336949

RESUMO

Technology for measuring 3D genome topology is increasingly important for studying gene regulation, for genome assembly and for mapping of genome rearrangements. Hi-C and other ligation-based methods have become routine but have specific biases. Here, we develop multiplex-GAM, a faster and more affordable version of genome architecture mapping (GAM), a ligation-free technique that maps chromatin contacts genome-wide. We perform a detailed comparison of multiplex-GAM and Hi-C using mouse embryonic stem cells. When examining the strongest contacts detected by either method, we find that only one-third of these are shared. The strongest contacts specifically found in GAM often involve 'active' regions, including many transcribed genes and super-enhancers, whereas in Hi-C they more often contain 'inactive' regions. Our work shows that active genomic regions are involved in extensive complex contacts that are currently underestimated in ligation-based approaches, and highlights the need for orthogonal advances in genome-wide contact mapping technologies.


Assuntos
Cromatina , Genoma , Animais , Camundongos , Cromatina/genética , Mapeamento Cromossômico/métodos , Cromossomos , Genômica/métodos
3.
Bioinformatics ; 40(Supplement_1): i11-i19, 2024 Jun 28.
Artigo em Inglês | MEDLINE | ID: mdl-38940154

RESUMO

MOTIVATION: Wikipedia is a vital open educational resource in computational biology. The quality of computational biology coverage in English-language Wikipedia has improved steadily in recent years. However, there is an increasingly large 'knowledge gap' between computational biology resources in English-language Wikipedia, and Wikipedias in non-English languages. Reducing this knowledge gap by providing educational resources in non-English languages would reduce language barriers which disadvantage non-native English speaking learners across multiple dimensions in computational biology. RESULTS: Here, we provide a comprehensive assessment of computational biology coverage in Spanish-language Wikipedia, the second most accessed Wikipedia worldwide. Using Spanish-language Wikipedia as a case study, we generate quantitative and qualitative data before and after a targeted educational event, specifically, a Spanish-focused student editing competition. Our data demonstrates how such events and activities can narrow the knowledge gap between English and non-English educational resources, by improving existing articles and creating new articles. Finally, based on our analysis, we suggest ways to prioritize future initiatives to improve open educational resources in other languages. AVAILABILITY AND IMPLEMENTATION: Scripts for data analysis are available at: https://github.com/ISCBWikiTeam/spanish.


Assuntos
Biologia Computacional , Biologia Computacional/métodos , Humanos , Idioma , Internet
4.
Bioinformatics ; 38(Suppl 1): i19-i27, 2022 06 24.
Artigo em Inglês | MEDLINE | ID: mdl-35758800

RESUMO

MOTIVATION: Wikipedia is one of the most important channels for the public communication of science and is frequently accessed as an educational resource in computational biology. Joint efforts between the International Society for Computational Biology (ISCB) and the Computational Biology taskforce of WikiProject Molecular Biology (a group of expert Wikipedia editors) have considerably improved computational biology representation on Wikipedia in recent years. However, there is still an urgent need for further improvement in quality, especially when compared to related scientific fields such as genetics and medicine. Facilitating involvement of members from ISCB Communities of Special Interest (COSIs) would improve a vital open education resource in computational biology, additionally allowing COSIs to provide a quality educational resource highly specific to their subfield. RESULTS: We generate a list of around 1500 English Wikipedia articles relating to computational biology and describe the development of a binary COSI-Article matrix, linking COSIs to relevant articles and thereby defining domain-specific open educational resources. Our analysis of the COSI-Article matrix data provides a quantitative assessment of computational biology representation on Wikipedia against other fields and at a COSI-specific level. Furthermore, we conducted similarity analysis and subsequent clustering of COSI-Article data to provide insight into potential relationships between COSIs. Finally, based on our analysis, we suggest courses of action to improve the quality of computational biology representation on Wikipedia.


Assuntos
Biologia Computacional , Análise por Conglomerados
5.
Bioinformatics ; 36(4): 1044-1051, 2020 02 15.
Artigo em Inglês | MEDLINE | ID: mdl-31665223

RESUMO

MOTIVATION: De novo motif discovery algorithms find statistically over-represented sequence motifs that may function as transcription factor binding sites. Current methods often report large numbers of motifs, making it difficult to perform further analyses and experimental validation. The motif selection problem seeks to identify a minimal set of putative regulatory motifs that characterize sequences of interest (e.g. ChIP-Seq binding regions). RESULTS: In this study, the motif selection problem is mapped to variants of the set cover problem that are solved via tabu search and by relaxed integer linear programing (RILP). The algorithms are employed to analyze 349 ChIP-Seq experiments from the ENCODE project, yielding a small number of high-quality motifs that represent putative binding sites of primary factors and cofactors. Specifically, when compared with the motifs reported by Kheradpour and Kellis, the set cover-based algorithms produced motif sets covering 35% more peaks for 11 TFs and identified 4 more putative cofactors for 6 TFs. Moreover, a systematic evaluation using nested cross-validation revealed that the RILP algorithm selected fewer motifs and was able to cover 6% more peaks and 3% fewer background regions, which reduced the error rate by 7%. AVAILABILITY AND IMPLEMENTATION: The source code of the algorithms and all the datasets are available at https://github.com/YichaoOU/Set_cover_tools. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Software , Algoritmos , Sítios de Ligação , Imunoprecipitação da Cromatina , Motivos de Nucleotídeos , Análise de Sequência de DNA , Fatores de Transcrição
6.
BMC Cancer ; 21(1): 768, 2021 Jul 03.
Artigo em Inglês | MEDLINE | ID: mdl-34215221

RESUMO

BACKGROUND: The heterogeneous subtypes and stages of epithelial ovarian cancer (EOC) differ in their biological features, invasiveness, and response to chemotherapy, but the transcriptional regulators causing their differences remain nebulous. METHODS: In this study, we compared high-grade serous ovarian cancers (HGSOCs) to low malignant potential or serous borderline tumors (SBTs). Our aim was to discover new regulatory factors causing distinct biological properties of HGSOCs and SBTs. RESULTS: In a discovery dataset, we identified 11 differentially expressed genes (DEGs) between SBTs and HGSOCs. Their expression correctly classified 95% of 267 validation samples. Two of the DEGs, TMEM30B and TSPAN1, were significantly associated with worse overall survival in patients with HGSOC. We also identified 17 DEGs that distinguished stage II vs. III HGSOC. In these two DEG promoter sets, we identified significant enrichment of predicted transcription factor binding sites, including those of RARA, FOXF1, BHLHE41, and PITX1. Using published ChIP-seq data acquired from multiple non-ovarian cell types, we showed additional regulatory factors, including AP2-gamma/TFAP2C, FOXA1, and BHLHE40, bound at the majority of DEG promoters. Several of the factors are known to cooperate with and predict the presence of nuclear hormone receptor estrogen receptor alpha (ER-alpha). We experimentally confirmed ER-alpha and PITX1 presence at the DEGs by performing ChIP-seq analysis using the ovarian cancer cell line PEO4. Finally, RNA-seq analysis identified recurrent gene fusion events in our EOC tumor set. Some of these fusions were significantly associated with survival in HGSOC patients; however, the fusion genes are not regulated by the transcription factors identified for the DEGs. CONCLUSIONS: These data implicate an estrogen-responsive regulatory network in the differential gene expression between ovarian cancer subtypes and stages, which includes PITX1. Importantly, the transcription factors associated with our DEG promoters are known to form the MegaTrans complex in breast cancer. This is the first study to implicate the MegaTrans complex in contributing to the distinct biological trajectories of malignant and indolent ovarian cancer subtypes.


Assuntos
Carcinoma Epitelial do Ovário/genética , Receptor alfa de Estrogênio/metabolismo , Regulação Neoplásica da Expressão Gênica/genética , Redes Reguladoras de Genes/genética , Fatores de Transcrição Box Pareados/metabolismo , Carcinoma Epitelial do Ovário/patologia , Feminino , Humanos
7.
PLoS Comput Biol ; 14(2): e1005772, 2018 02.
Artigo em Inglês | MEDLINE | ID: mdl-29390004

RESUMO

Bioinformatics is recognized as part of the essential knowledge base of numerous career paths in biomedical research and healthcare. However, there is little agreement in the field over what that knowledge entails or how best to provide it. These disagreements are compounded by the wide range of populations in need of bioinformatics training, with divergent prior backgrounds and intended application areas. The Curriculum Task Force of the International Society of Computational Biology (ISCB) Education Committee has sought to provide a framework for training needs and curricula in terms of a set of bioinformatics core competencies that cut across many user personas and training programs. The initial competencies developed based on surveys of employers and training programs have since been refined through a multiyear process of community engagement. This report describes the current status of the competencies and presents a series of use cases illustrating how they are being applied in diverse training contexts. These use cases are intended to demonstrate how others can make use of the competencies and engage in the process of their continuing refinement and application. The report concludes with a consideration of remaining challenges and future plans.


Assuntos
Biologia Computacional/educação , Currículo , Educação de Pós-Graduação , Biologia de Sistemas/educação , Comitês Consultivos , África , Algoritmos , Predisposição Genética para Doença , Illinois , New South Wales , Ohio , Pennsylvania , Software , Inquéritos e Questionários , Reino Unido , Universidades
8.
Clin Proteomics ; 14: 10, 2017.
Artigo em Inglês | MEDLINE | ID: mdl-28360826

RESUMO

BACKGROUND: Recent epidemiological studies indicate that only 30-50% of undiagnosed type 2 diabetes mellitus (T2DM) patients are identified using glycated hemoglobin (HbA1c) and elevated fasting plasma glucose (FPG) levels. Thus, novel biomarkers for early diagnosis and prognosis are urgently needed for providing early and personalized treatment. METHODS: Here, we studied the glycation degrees of 27 glycation sites representing nine plasma proteins in 48 newly diagnosed male T2DM patients and 48 non-diabetic men matched for age (range 35-65 years). Samples were digested with trypsin and enriched for glycated peptides using boronic acid affinity chromatography. Quantification relied on mass spectrometry (multiple reaction monitoring) using isotope-labelled peptides as internal standard. RESULTS: The combination of glycated lysine-141 of haptoglobin (HP K141) and HbA1c provided a sensitivity of 94%, a specificity of 98%, and an accuracy of 96% to identify T2DM. A set of 15 features considering three glycation sites in human serum albumin, HP K141, and 11 routine laboratory measures of T2DM, metabolic syndrome, obesity, inflammation, and insulin resistance provided a sensitivity of 98%, a specificity of 100%, and an accuracy of 99% for newly diagnosed T2DM patients. CONCLUSIONS: Our studies demonstrated the great potential of glycation sites in plasma proteins providing an additional diagnostic tool for T2DM and elucidating that the combination of these sites with HbA1c and FPG could improve the diagnosis of T2DM.

9.
BMC Plant Biol ; 16(1): 229, 2016 10 21.
Artigo em Inglês | MEDLINE | ID: mdl-27769192

RESUMO

BACKGROUND: Hydroxyproline-rich glycoproteins (HRGPs) constitute a plant cell wall protein superfamily that functions in diverse aspects of growth and development. This superfamily contains three members: the highly glycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). Chimeric and hybrid HRGPs, however, also exist. A bioinformatics approach is employed here to identify and classify AGPs, EXTs, PRPs, chimeric HRGPs, and hybrid HRGPs from the proteins predicted by the completed genome sequence of poplar (Populus trichocarpa). This bioinformatics approach is based on searching for biased amino acid compositions and for particular protein motifs associated with known HRGPs with a newly revised and improved BIO OHIO 2.0 program. Proteins detected by the program are subsequently analyzed to identify the following: 1) repeating amino acid sequences, 2) signal peptide sequences, 3) glycosylphosphatidylinositol lipid anchor addition sequences, and 4) similar HRGPs using the Basic Local Alignment Search Tool (BLAST). RESULTS: The program was used to identify and classify 271 HRGPs from poplar including 162 AGPs, 60 EXTs, and 49 PRPs, which are each divided into various classes. This is in contrast to a previous analysis of the Arabidopsis proteome which identified 162 HRGPs consisting of 85 AGPs, 59 EXTs, and 18 PRPs. Poplar was observed to have fewer classical EXTs, to have more fasciclin-like AGPs, plastocyanin AGPs and AG peptides, and to contain a novel class of PRPs referred to as the proline-rich peptides. CONCLUSIONS: The newly revised and improved BIO OHIO 2.0 bioinformatics program was used to identify and classify the inventory of HRGPs in poplar in order to facilitate and guide basic and applied research on plant cell walls. The newly identified poplar HRGPs can now be examined to determine their respective structural and functional roles, including their possible applications in the areas plant biofuel and natural products for medicinal or industrial uses. Additionally, other plants whose genomes are sequenced can now be examined in a similar way using this bioinformatics program which will provide insight to the evolution of the HRGP family in the plant kingdom.


Assuntos
Glicoproteínas/genética , Proteínas de Plantas/genética , Populus/genética , Sequência de Aminoácidos , Biologia Computacional , Glicoproteínas/análise , Glicoproteínas/química , Glicoproteínas/metabolismo , Hidroxiprolina/metabolismo , Proteínas de Plantas/análise , Proteínas de Plantas/química , Proteínas de Plantas/metabolismo , Populus/metabolismo
13.
Nucleic Acids Res ; 39(Database issue): D1118-22, 2011 Jan.
Artigo em Inglês | MEDLINE | ID: mdl-21059685

RESUMO

The Arabidopsis Gene Regulatory Information Server (AGRIS; http://arabidopsis.med.ohio-state.edu/) provides a comprehensive resource for gene regulatory studies in the model plant Arabidopsis thaliana. Three interlinked databases, AtTFDB, AtcisDB and AtRegNet, furnish comprehensive and updated information on transcription factors (TFs), predicted and experimentally verified cis-regulatory elements (CREs) and their interactions, respectively. In addition to significant contributions in the identification of the entire set of TF-DNA interactions, which are the key to understand the gene regulatory networks that govern Arabidopsis gene expression, tools recently incorporated into AGRIS include the complete set of words length 5-15 present in the Arabidopsis genome and the integration of AtRegNet with visualization tools, such as the recently developed ReIN application. All the information in AGRIS is publicly available and downloadable upon registration.


Assuntos
Arabidopsis/genética , Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Redes Reguladoras de Genes , Regiões Promotoras Genéticas , Fatores de Transcrição/metabolismo
14.
Nucleic Acids Res ; 39(6): 2175-87, 2011 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-21071415

RESUMO

Eukaryotic core promoters are often characterized by the presence of consensus motifs such as the TATA box or initiator elements, which attract and direct the transcriptional machinery to the transcription start site. However, many human promoters have none of the known core promoter motifs, suggesting that undiscovered promoter motifs exist in the genome. We previously identified a mutation in the human Ankyrin-1 (ANK-1) promoter that causes the disease ankyrin-deficient Hereditary Spherocytosis (HS). Although the ANK-1 promoter is CpG rich, no discernable basal promoter elements had been identified. We showed that the HS mutation disrupted the binding of the transcription factor TFIID, the major component of the pre-initiation complex. We hypothesized that the mutation identified a candidate promoter element with a more widespread role in gene regulation. We examined 17,181 human promoters for the experimentally validated binding site, called the TFIID localization sequence (DLS) and found three times as many promoters containing DLS than TATA motifs. Mutational analyses of DLS sequences confirmed their functional significance, as did the addition of a DLS site to a minimal Sp1 promoter. Our results demonstrate that novel promoter elements can be identified on a genome-wide scale through observations of regulatory disruptions that cause human disease.


Assuntos
Anquirinas/genética , Mutação , Regiões Promotoras Genéticas , Esferocitose Hereditária/genética , Fator de Transcrição TFIID/metabolismo , Sequência de Bases , Sítios de Ligação , Sequência Consenso , Genoma Humano , Humanos , Células K562 , Sítio de Iniciação de Transcrição
16.
Plant Physiol ; 153(2): 485-513, 2010 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-20395450

RESUMO

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall proteins that function in diverse aspects of plant growth and development. This superfamily consists of three members: hyperglycosylated arabinogalactan proteins (AGPs), moderately glycosylated extensins (EXTs), and lightly glycosylated proline-rich proteins (PRPs). Hybrid and chimeric versions of HRGP molecules also exist. In order to "mine" genomic databases for HRGPs and to facilitate and guide research in the field, the BIO OHIO software program was developed that identifies and classifies AGPs, EXTs, PRPs, hybrid HRGPs, and chimeric HRGPs from proteins predicted from DNA sequence data. This bioinformatics program is based on searching for biased amino acid compositions and for particular protein motifs associated with known HRGPs. HRGPs identified by the program are subsequently analyzed to elucidate the following: (1) repeating amino acid sequences, (2) signal peptide and glycosylphosphatidylinositol lipid anchor addition sequences, (3) similar HRGPs via Basic Local Alignment Search Tool, (4) expression patterns of their genes, (5) other HRGPs, glycosyl transferase, prolyl 4-hydroxylase, and peroxidase genes coexpressed with their genes, and (6) gene structure and whether genetic mutants exist in their genes. The program was used to identify and classify 166 HRGPs from Arabidopsis (Arabidopsis thaliana) as follows: 85 AGPs (including classical AGPs, lysine-rich AGPs, arabinogalactan peptides, fasciclin-like AGPs, plastocyanin AGPs, and other chimeric AGPs), 59 EXTs (including SP(5) EXTs, SP(5)/SP(4) EXTs, SP(4) EXTs, SP(4)/SP(3) EXTs, a SP(3) EXT, "short" EXTs, leucine-rich repeat-EXTs, proline-rich extensin-like receptor kinases, and other chimeric EXTs), 18 PRPs (including PRPs and chimeric PRPs), and AGP/EXT hybrid HRGPs.


Assuntos
Biologia Computacional/métodos , Glicoproteínas/química , Glicoproteínas/classificação , Proteínas de Plantas/química , Proteínas de Plantas/classificação , Motivos de Aminoácidos , Sequência de Aminoácidos , Arabidopsis/metabolismo , Mineração de Dados , Bases de Dados de Proteínas , Genes de Plantas , Dados de Sequência Molecular , Análise de Sequência de Proteína , Software
17.
BMC Bioinformatics ; 11 Suppl 12: S6, 2010 Dec 21.
Artigo em Inglês | MEDLINE | ID: mdl-21210985

RESUMO

BACKGROUND: An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. METHODS: This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. RESULTS: A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. CONCLUSION: WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.


Assuntos
DNA/química , Genômica/métodos , Sequências Reguladoras de Ácido Nucleico , Software , Algoritmos , Arabidopsis/genética , Genoma de Planta , Cadeias de Markov , Análise de Sequência de DNA
19.
Methods Mol Biol ; 2149: 463-481, 2020.
Artigo em Inglês | MEDLINE | ID: mdl-32617951

RESUMO

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall proteins that function in diverse aspects of plant growth and development. This superfamily consists of three members: arabinogalactan-proteins (AGPs), extensins (EXTs), and proline-rich proteins (PRPs). Hybrid and chimeric HRGPs also exist. A bioinformatic software program, BIO OHIO 2.0, was developed to expedite the genome-wide identification and classification of AGPs, EXTs, and PRPs based on characteristic HRGP motifs and biased amino acid compositions. This chapter explains the principles of identifying HRGPs and provides a stepwise tutorial for using the BIO OHIO 2.0 program with genomic/proteomic data. Here, as an example, the genome/proteome of the common bean (Phaseolus vulgaris) is analyzed using the BIO OHIO 2.0 program to identify and characterize its set of HRGPs.


Assuntos
Biologia Computacional/métodos , Glicoproteínas/química , Glicoproteínas/classificação , Proteínas de Plantas/classificação , Software , Genoma de Planta , Glicoproteínas/genética , Mucoproteínas/química , Mucoproteínas/classificação , Mucoproteínas/genética , Phaseolus/química , Phaseolus/genética , Proteínas de Plantas/química , Proteínas de Plantas/genética , Domínios Proteicos Ricos em Prolina , Proteoma/análise , Análise de Sequência de Proteína/métodos
20.
Plants (Basel) ; 9(12)2020 Dec 10.
Artigo em Inglês | MEDLINE | ID: mdl-33322028

RESUMO

Hydroxyproline-rich glycoproteins (HRGPs) are a superfamily of plant cell wall structural proteins that function in various aspects of plant growth and development, including pollen tube growth. We have previously characterized protein sequence signatures for three family members in the HRGP superfamily: the hyperglycosylated arabinogalactan-proteins (AGPs), the moderately glycosylated extensins (EXTs), and the lightly glycosylated proline-rich proteins (PRPs). However, the mechanism of pollen-specific HRGP gene expression remains unexplored. To this end, we developed an integrative analysis pipeline combining RNA-seq gene expression and promoter sequences to identify cis-regulatory motifs responsible for pollen-specific expression of HRGP genes in Arabidopsis thaliana. Specifically, we mined the public RNA-seq datasets and identified 13 pollen-specific HRGP genes. Ensemble motif discovery identified 15 conserved promoter elements between A.thaliana and A. lyrata. Motif scanning revealed two pollen related transcription factors: GATA12 and brassinosteroid (BR) signaling pathway regulator BZR1. Finally, we performed a regression analysis and demonstrated that the 15 motifs provided a good model of HRGP gene expression in pollen (R = 0.61). In conclusion, we performed the first integrative analysis of cis-regulatory motifs in pollen-specific HRGP genes, revealing important insights into transcriptional regulation in pollen tissue.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA