Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 78
Filtrar
Mais filtros

Base de dados
Tipo de documento
Intervalo de ano de publicação
1.
Nucleic Acids Res ; 50(W1): W670-W676, 2022 07 05.
Artigo em Inglês | MEDLINE | ID: mdl-35544234

RESUMO

RSAT (Regulatory Sequence Analysis Tools) enables the detection and the analysis of cis-regulatory elements in genomic sequences. This software suite performs (i) de novo motif discovery (including from genome-wide datasets like ChIP-seq/ATAC-seq) (ii) genomic sequences scanning with known motifs, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations and (v) comparative genomics. RSAT comprises 50 tools. Six public Web servers (including a teaching server) are offered to meet the needs of different biological communities. RSAT philosophy and originality are: (i) a multi-modal access depending on the user needs, through web forms, command-line for local installation and programmatic web services, (ii) a support for virtually any genome (animals, bacteria, plants, totalizing over 10 000 genomes directly accessible). Since the 2018 NAR Web Software Issue, we have developed a large REST API, extended the support for additional genomes and external motif collections, enhanced some tools and Web forms, and developed a novel tool that builds or refine gene regulatory networks using motif scanning (network-interactions). The RSAT website provides extensive documentation, tutorials and published protocols. RSAT code is under open-source license and now hosted in GitHub. RSAT is available at http://www.rsat.eu/.


Assuntos
Genômica , Fatores de Transcrição , Animais , Fatores de Transcrição/genética , Genômica/métodos , Software , Análise de Sequência de DNA/métodos , Redes Reguladoras de Genes
2.
Nucleic Acids Res ; 50(D1): D765-D770, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34634797

RESUMO

The COVID-19 pandemic has seen unprecedented use of SARS-CoV-2 genome sequencing for epidemiological tracking and identification of emerging variants. Understanding the potential impact of these variants on the infectivity of the virus and the efficacy of emerging therapeutics and vaccines has become a cornerstone of the fight against the disease. To support the maximal use of genomic information for SARS-CoV-2 research, we launched the Ensembl COVID-19 browser; the first virus to be encompassed within the Ensembl platform. This resource incorporates a new Ensembl gene set, multiple variant sets, and annotation from several relevant resources aligned to the reference SARS-CoV-2 assembly. Since the first release in May 2020, the content has been regularly updated using our new rapid release workflow, and tools such as the Ensembl Variant Effect Predictor have been integrated. The Ensembl COVID-19 browser is freely available at https://covid-19.ensembl.org.


Assuntos
COVID-19/virologia , Bases de Dados Genéticas , SARS-CoV-2/genética , Navegador , Coronaviridae/genética , Variação Genética , Genoma Viral , Humanos , Anotação de Sequência Molecular
3.
Nucleic Acids Res ; 50(D1): D996-D1003, 2022 01 07.
Artigo em Inglês | MEDLINE | ID: mdl-34791415

RESUMO

Ensembl Genomes (https://www.ensemblgenomes.org) provides access to non-vertebrate genomes and analysis complementing vertebrate resources developed by the Ensembl project (https://www.ensembl.org). The two resources collectively present genome annotation through a consistent set of interfaces spanning the tree of life presenting genome sequence, annotation, variation, transcriptomic data and comparative analysis. Here, we present our largest increase in plant, metazoan and fungal genomes since the project's inception creating one of the world's most comprehensive genomic resources and describe our efforts to reduce genome redundancy in our Bacteria portal. We detail our new efforts in gene annotation, our emerging support for pangenome analysis, our efforts to accelerate data dissemination through the Ensembl Rapid Release resource and our new AlphaFold visualization. Finally, we present details of our future plans including updates on our integration with Ensembl, and how we plan to improve our support for the microbial research community. Software and data are made available without restriction via our website, online tools platform and programmatic interfaces (available under an Apache 2.0 license). Data updates are synchronised with Ensembl's release cycle.


Assuntos
Bases de Dados Genéticas , Genômica , Internet , Software , Animais , Biologia Computacional , Genoma Bacteriano/genética , Genoma Fúngico/genética , Genoma de Planta/genética , Plantas/classificação , Plantas/genética , Vertebrados/classificação , Vertebrados/genética
4.
Nucleic Acids Res ; 49(D1): D1452-D1463, 2021 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-33170273

RESUMO

Gramene (http://www.gramene.org), a knowledgebase founded on comparative functional analyses of genomic and pathway data for model plants and major crops, supports agricultural researchers worldwide. The resource is committed to open access and reproducible science based on the FAIR data principles. Since the last NAR update, we made nine releases; doubled the genome portal's content; expanded curated genes, pathways and expression sets; and implemented the Domain Informational Vocabulary Extraction (DIVE) algorithm for extracting gene function information from publications. The current release, #63 (October 2020), hosts 93 reference genomes-over 3.9 million genes in 122 947 families with orthologous and paralogous classifications. Plant Reactome portrays pathway networks using a combination of manual biocuration in rice (320 reference pathways) and orthology-based projections to 106 species. The Reactome platform facilitates comparison between reference and projected pathways, gene expression analyses and overlays of gene-gene interactions. Gramene integrates ontology-based protein structure-function annotation; information on genetic, epigenetic, expression, and phenotypic diversity; and gene functional annotations extracted from plant-focused journals using DIVE. We train plant researchers in biocuration of genes and pathways; host curated maize gene structures as tracks in the maize genome browser; and integrate curated rice genes and pathways in the Plant Reactome.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica de Plantas , Genoma de Planta , Genômica/métodos , Proteínas de Plantas/genética , Plantas/genética , Produtos Agrícolas , Elementos de DNA Transponíveis , Duplicação Gênica , Ontologia Genética , Redes Reguladoras de Genes , Internet , Bases de Conhecimento , Redes e Vias Metabólicas , Anotação de Sequência Molecular , Oryza/genética , Oryza/metabolismo , Proteínas de Plantas/metabolismo , Plantas/classificação , Plantas/metabolismo , Poliploidia , Mapeamento de Interação de Proteínas , Software , Zea mays/genética , Zea mays/metabolismo
5.
Mol Ecol ; 31(20): 5285-5306, 2022 10.
Artigo em Inglês | MEDLINE | ID: mdl-35976181

RESUMO

Natural populations are characterized by abundant genetic diversity driven by a range of different types of mutation. The tractability of sequencing complete genomes has allowed new insights into the variable composition of genomes, summarized as a species pan-genome. These analyses demonstrate that many genes are absent from the first reference genomes, whose analysis dominated the initial years of the genomic era. Our field now turns towards understanding the functional consequence of these highly variable genomes. Here, we analysed weighted gene coexpression networks from leaf transcriptome data for drought response in the purple false brome Brachypodium distachyon and the differential expression of genes putatively involved in adaptation to this stressor. We specifically asked whether genes with variable "occupancy" in the pan-genome - genes which are either present in all studied genotypes or missing in some genotypes - show different distributions among coexpression modules. Coexpression analysis united genes expressed in drought-stressed plants into nine modules covering 72 hub genes (87 hub isoforms), and genes expressed under controlled water conditions into 13 modules, covering 190 hub genes (251 hub isoforms). We find that low occupancy pan-genes are under-represented among several modules, while other modules are over-enriched for low-occupancy pan-genes. We also provide new insight into the regulation of drought response in B. distachyon, specifically identifying one module with an apparent role in primary metabolism that is strongly responsive to drought. Our work shows the power of integrating pan-genomic analysis with transcriptomic data using factorial experiments to understand the functional genomics of environmental response.


Assuntos
Brachypodium , Brachypodium/genética , Secas , Genes de Plantas , Transcriptoma/genética , Água
6.
Plant Physiol ; 185(3): 1242-1258, 2021 04 02.
Artigo em Inglês | MEDLINE | ID: mdl-33744946

RESUMO

The identification of functional elements encoded in plant genomes is necessary to understand gene regulation. Although much attention has been paid to model species like Arabidopsis (Arabidopsis thaliana), little is known about regulatory motifs in other plants. Here, we describe a bottom-up approach for de novo motif discovery using peach (Prunus persica) as an example. These predictions require pre-computed gene clusters grouped by their expression similarity. After optimizing the boundaries of proximal promoter regions, two motif discovery algorithms from RSAT::Plants (http://plants.rsat.eu) were tested (oligo and dyad analysis). Overall, 18 out of 45 co-expressed modules were enriched in motifs typical of well-known transcription factor (TF) families (bHLH, bZip, BZR, CAMTA, DOF, E2FE, AP2-ERF, Myb-like, NAC, TCP, and WRKY) and a few uncharacterized motifs. Our results indicate that small modules and promoter window of [-500 bp, +200 bp] relative to the transcription start site (TSS) maximize the number of motifs found and reduce low-complexity signals in peach. The distribution of discovered regulatory sites was unbalanced, as they accumulated around the TSS. This approach was benchmarked by testing two different expression-based clustering algorithms (network-based and hierarchical) and, as control, genes grouped for harboring ChIPseq peaks of the same Arabidopsis TF. The method was also verified on maize (Zea mays), a species with a large genome. In summary, this article presents a glimpse of the peach regulatory components at genome scale and provides a general protocol that can be applied to other species. A Docker software container is released to facilitate the reproduction of these analyses.


Assuntos
Regiões Promotoras Genéticas/genética , Prunus persica/genética , Algoritmos , Arabidopsis/genética , Biologia Computacional , Regulação da Expressão Gênica de Plantas/genética , Regulação da Expressão Gênica de Plantas/fisiologia , Família Multigênica/genética , Família Multigênica/fisiologia , Proteínas de Plantas/genética , Proteínas de Plantas/metabolismo , Fatores de Transcrição/genética , Fatores de Transcrição/metabolismo
7.
Nucleic Acids Res ; 48(D1): D689-D695, 2020 01 08.
Artigo em Inglês | MEDLINE | ID: mdl-31598706

RESUMO

Ensembl Genomes (http://www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, complementing the resources for vertebrate genomics developed in the context of the Ensembl project (http://www.ensembl.org). Together, the two resources provide a consistent set of interfaces to genomic data across the tree of life, including reference genome sequence, gene models, transcriptional data, genetic variation and comparative analysis. Data may be accessed via our website, online tools platform and programmatic interfaces, with updates made four times per year (in synchrony with Ensembl). Here, we provide an overview of Ensembl Genomes, with a focus on recent developments. These include the continued growth, more robust and reproducible sets of orthologues and paralogues, and enriched views of gene expression and gene function in plants. Finally, we report on our continued deeper integration with the Ensembl project, which forms a key part of our future strategy for dealing with the increasing quantity of available genome-scale data across the tree of life.


Assuntos
Biologia Computacional/métodos , Bases de Dados Genéticas , Variação Genética , Genoma Bacteriano , Genoma Fúngico , Genoma de Planta , Algoritmos , Animais , Caenorhabditis elegans/genética , Genômica , Internet , Anotação de Sequência Molecular , Fenótipo , Plantas/genética , Valores de Referência , Software , Interface Usuário-Computador
8.
Nucleic Acids Res ; 46(W1): W209-W214, 2018 07 02.
Artigo em Inglês | MEDLINE | ID: mdl-29722874

RESUMO

RSAT (Regulatory Sequence Analysis Tools) is a suite of modular tools for the detection and the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, including from genome-wide datasets like ChIP-seq/ATAC-seq, (ii) motif scanning, (iii) motif analysis (quality assessment, comparisons and clustering), (iv) analysis of regulatory variations, (v) comparative genomics. Six public servers jointly support 10 000 genomes from all kingdoms. Six novel or refactored programs have been added since the 2015 NAR Web Software Issue, including updated programs to analyse regulatory variants (retrieve-variation-seq, variation-scan, convert-variations), along with tools to extract sequences from a list of coordinates (retrieve-seq-bed), to select motifs from motif collections (retrieve-matrix), and to extract orthologs based on Ensembl Compara (get-orthologs-compara). Three use cases illustrate the integration of new and refactored tools to the suite. This Anniversary update gives a 20-year perspective on the software suite. RSAT is well-documented and available through Web sites, SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services, virtual machines and stand-alone programs at http://www.rsat.eu/.


Assuntos
Sequências Reguladoras de Ácido Nucleico , Software , Variação Genética , Genômica/história , Sequenciamento de Nucleotídeos em Larga Escala/história , História do Século XX , História do Século XXI , Internet , Motivos de Nucleotídeos , Software/história
9.
BMC Plant Biol ; 19(1): 113, 2019 Mar 25.
Artigo em Inglês | MEDLINE | ID: mdl-30909882

RESUMO

BACKGROUND: In winter barley plants, vernalization and photoperiod cues have to be integrated to promote flowering. Plant development and expression of different flowering promoter (HvVRN1, HvCO2, PPD-H1, HvFT1, HvFT3) and repressor (HvVRN2, HvCO9 and HvOS2) genes were evaluated in two winter barley varieties under: (1) natural increasing photoperiod, without vernalization, and (2) under short day conditions in three insufficient vernalization treatments. These challenging conditions were chosen to capture non-optimal and natural responses, representative of those experienced in the Mediterranean area. RESULTS: In absence of vernalization and under increasing photoperiods, HvVRN2 expression increased with day-length, mainly between 12 and 13 h photoperiods in our latitudes. The flowering promoter gene in short days, HvFT3, was only expressed after receiving induction of cold or plant age, which was associated with low transcript levels of HvVRN2 and HvOS2. Under the sub-optimal conditions here described, great differences in development were found between the two winter barley varieties used in the study. Delayed development in 'Barberousse' was associated with increased expression levels of HvOS2. Novel variation for HvCO9 and HvOS2 is reported and might explain such differences. CONCLUSIONS: The balance between the expression of flowering promoters and repressor genes regulates the promotion towards flowering or the maintenance of the vegetative state. HvOS2, an ortholog of FLC, appears as a strong candidate to mediate in the vernalization response of barley. Natural variation found would help to exploit the plasticity in development to obtain better-adapted varieties for current and future climate conditions.


Assuntos
Flores/fisiologia , Hordeum/fisiologia , Proteínas de Plantas/genética , Flores/genética , Regulação da Expressão Gênica de Plantas , Hordeum/genética , Fotoperíodo , Polimorfismo de Nucleotídeo Único , Proteínas Repressoras/genética , Espanha
10.
Mol Ecol ; 28(8): 1994-2012, 2019 04.
Artigo em Inglês | MEDLINE | ID: mdl-30614595

RESUMO

Landraces are local populations of crop plants adapted to a particular environment. Extant landraces are surviving genetic archives, keeping signatures of the selection processes experienced by them until settling in their current niches. This study intends to establish relationships between genetic diversity of barley (Hordeum vulgare L.) landraces collected in Spain and the climate of their collection sites. A high-resolution climatic data set (5 × 5 km spatial, 1-day temporal grid) was computed from over 2,000 temperature and 7,000 precipitation stations across peninsular Spain. This data set, spanning the period 1981-2010, was used to derive agroclimatic variables meaningful for cereal production at the collection sites of 135 barley landraces. Variables summarize temperature, precipitation, evapotranspiration, potential vernalization and frost probability at different times of the year and time scales (season and month). SNP genotyping of the landraces was carried out combining Illumina Infinium assays and genotyping-by-sequencing, yielding 9,920 biallelic markers (7,479 with position on the barley reference genome). The association of these SNPs with agroclimatic variables was analysed at two levels of genetic diversity, with and without taking into account population structure. The whole data sets and analysis pipelines are documented and available at https://eead-csic-compbio.github.io/barley-agroclimatic-association. We found differential adaptation of the germplasm groups identified to be dominated by reactions to cold temperature and late-season frost occurrence, as well as to water availability. Several significant associations pointing at specific adaptations to agroclimatic features related to temperature and water availability were observed, and candidate genes underlying some of the main regions are proposed.


Assuntos
Adaptação Fisiológica/genética , Clima , Hordeum/genética , Seleção Genética/genética , Meio Ambiente , Europa (Continente) , Variação Genética/genética , Genoma de Planta/genética , Genótipo , Hordeum/crescimento & desenvolvimento , Repetições de Microssatélites/genética , Fenótipo , Estações do Ano , Espanha
12.
New Phytol ; 218(4): 1631-1644, 2018 06.
Artigo em Inglês | MEDLINE | ID: mdl-29206296

RESUMO

Few pan-genomic studies have been conducted in plants, and none of them have focused on the intraspecific diversity and evolution of their plastid genomes. We address this issue in Brachypodium distachyon and its close relatives B. stacei and B. hybridum, for which a large genomic data set has been compiled. We analyze inter- and intraspecific plastid comparative genomics and phylogenomic relationships within a family-wide framework. Major indel differences were detected between Brachypodium plastomes. Within B. distachyon, we detected two main lineages, a mostly Extremely Delayed Flowering (EDF+) clade and a mostly Spanish (S+) - Turkish (T+) clade, plus nine chloroplast capture and two plastid DNA (ptDNA) introgression and micro-recombination events. Early Oligocene (30.9 million yr ago (Ma)) and Late Miocene (10.1 Ma) divergence times were inferred for the respective stem and crown nodes of Brachypodium and a very recent Mid-Pleistocene (0.9 Ma) time for the B. distachyon split. Flowering time variation is a main factor driving rapid intraspecific divergence in B. distachyon, although it is counterbalanced by repeated introgression between previously isolated lineages. Swapping of plastomes between the three different genomic groups, EDF+, T+, S+, probably resulted from random backcrossing followed by stabilization through selection pressure.


Assuntos
Brachypodium/classificação , Brachypodium/genética , Ecótipo , Flores/fisiologia , Genomas de Plastídeos , Genômica , Filogenia , Recombinação Genética/genética , Sequência de Bases , Evolução Molecular , Genes de Plantas , Variação Genética , Geografia , Haplótipos/genética , Região do Mediterrâneo , Fatores de Tempo
13.
Nucleic Acids Res ; 43(W1): W50-6, 2015 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-25904632

RESUMO

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/.


Assuntos
Elementos Reguladores de Transcrição , Software , Sítios de Ligação , Variação Genética , Genômica , Humanos , Internet , Motivos de Nucleotídeos , Fatores de Transcrição/metabolismo
14.
J Proteome Res ; 15(8): 2510-24, 2016 08 05.
Artigo em Inglês | MEDLINE | ID: mdl-27321140

RESUMO

In the present study we have used label-free shotgun proteomic analysis to examine the effects of Fe deficiency on the protein profiles of highly pure sugar beet root plasma membrane (PM) preparations and detergent-resistant membranes (DRMs), the latter as an approach to study microdomains. Altogether, 545 proteins were detected, with 52 and 68 of them changing significantly with Fe deficiency in PM and DRM, respectively. Functional categorization of these proteins showed that signaling and general and vesicle-related transport accounted for approximately 50% of the differences in both PM and DRM, indicating that from a qualitative point of view changes induced by Fe deficiency are similar in both preparations. Results indicate that Fe deficiency has an impact in phosphorylation processes at the PM level and highlight the involvement of signaling proteins, especially those from the 14-3-3 family. Lipid profiling revealed Fe-deficiency-induced decreases in phosphatidic acid derivatives, which may impair vesicle formation, in agreement with the decreases measured in proteins related to intracellular trafficking and secretion. The modifications induced by Fe deficiency in the relative enrichment of proteins in DRMs revealed the existence of a group of cytoplasmic proteins that appears to be more attached to the PM in conditions of Fe deficiency.


Assuntos
Beta vulgaris/química , Membrana Celular/química , Deficiências de Ferro , Microdomínios da Membrana/química , Proteínas de Plantas/análise , Proteômica/métodos , Membrana Celular/metabolismo , Lipídeos/análise , Microdomínios da Membrana/metabolismo , Ácidos Fosfatídicos , Fosforilação , Raízes de Plantas/química
15.
Microbiology (Reading) ; 162(3): 552-563, 2016 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-26813656

RESUMO

In Gram-negative bacteria, tyrosine phosphorylation has been shown to play a role in the control of exopolysaccharide (EPS) production. This study demonstrated that the chromosomal ORF SMc02309 from Sinorhizobium meliloti 2011 encodes a protein with significant sequence similarity to low molecular mass protein-tyrosine phosphatases (LMW-PTPs), such as the Escherichia coli Wzb. Unlike other well-characterized EPS biosynthesis gene clusters, which contain neighbouring LMW-PTPs and kinase, the S. meliloti succinoglycan (EPS I) gene cluster located on megaplasmid pSymB does not encode a phosphatase. Biochemical assays revealed that the SMc02309 protein hydrolyses p-nitrophenyl phosphate (p-NPP) with kinetic parameters similar to other bacterial LMW-PTPs. Furthermore, we show evidence that SMc02309 is not the LMW-PTP of the bacterial tyrosine-kinase (BY-kinase) ExoP. Nevertheless, ExoN, a UDP-glucose pyrophosphorylase involved in the first stages of EPS I biosynthesis, is phosphorylated at tyrosine residues and constitutes an endogenous substrate of the SMc02309 protein. Additionally, we show that the UDP-glucose pyrophosphorylase activity is modulated by SMc02309-mediated tyrosine dephosphorylation. Moreover, a mutation in the SMc02309 gene decreases EPS I production and delays nodulation on Medicago sativa roots.


Assuntos
Polissacarídeos Bacterianos/biossíntese , Proteínas Tirosina Fosfatases/metabolismo , Sinorhizobium meliloti/enzimologia , Sinorhizobium meliloti/metabolismo , UTP-Glucose-1-Fosfato Uridililtransferase/metabolismo , Medicago sativa/microbiologia , Nodulação , Raízes de Plantas/microbiologia
16.
Bioinformatics ; 30(2): 258-65, 2014 Jan 15.
Artigo em Inglês | MEDLINE | ID: mdl-24234003

RESUMO

MOTIVATION: Traditional and high-throughput techniques for determining transcription factor (TF) binding specificities are generating large volumes of data of uneven quality, which are scattered across individual databases. RESULTS: FootprintDB integrates some of the most comprehensive freely available libraries of curated DNA binding sites and systematically annotates the binding interfaces of the corresponding TFs. The first release contains 2422 unique TF sequences, 10 112 DNA binding sites and 3662 DNA motifs. A survey of the included data sources, organisms and TF families was performed together with proprietary database TRANSFAC, finding that footprintDB has a similar coverage of multicellular organisms, while also containing bacterial regulatory data. A search engine has been designed that drives the prediction of DNA motifs for input TFs, or conversely of TF sequences that might recognize input regulatory sequences, by comparison with database entries. Such predictions can also be extended to a single proteome chosen by the user, and results are ranked in terms of interface similarity. Benchmark experiments with bacterial, plant and human data were performed to measure the predictive power of footprintDB searches, which were able to correctly recover 10, 55 and 90% of the tested sequences, respectively. Correctly predicted TFs had a higher interface similarity than the average, confirming its diagnostic value. AVAILABILITY AND IMPLEMENTATION: Web site implemented in PHP,Perl, MySQL and Apache. Freely available from http://floresta.eead.csic.es/footprintdb.


Assuntos
Bases de Dados Genéticas , Regulação da Expressão Gênica , Motivos de Nucleotídeos/genética , Sequências Reguladoras de Ácido Nucleico/genética , Software , Fatores de Transcrição/metabolismo , Animais , Arabidopsis/genética , Bacillus subtilis/genética , Sítios de Ligação , Drosophila melanogaster/genética , Escherichia coli K12/genética , Humanos , Anotação de Sequência Molecular , Ligação Proteica , Proteoma/análise , Fatores de Transcrição/genética
18.
Nucleic Acids Res ; 41(3): 1438-49, 2013 Feb 01.
Artigo em Inglês | MEDLINE | ID: mdl-23268451

RESUMO

Sequence alignment of proteins and nucleic acids is a routine task in bioinformatics. Although the comparison of complete peptides, genes or genomes can be undertaken with a great variety of tools, the alignment of short DNA sequences and motifs entails pitfalls that have not been fully addressed yet. Here we confront the structural superposition of transcription factors with the sequence alignment of their recognized cis elements. Our goals are (i) to test TFcompare (http://floresta.eead.csic.es/tfcompare), a structural alignment method for protein-DNA complexes; (ii) to benchmark the pairwise alignment of regulatory elements; (iii) to define the confidence limits and the twilight zone of such alignments and (iv) to evaluate the relevance of these thresholds with elements obtained experimentally. We find that the structure of cis elements and protein-DNA interfaces is significantly more conserved than their sequence and measures how this correlates with alignment errors when only sequence information is considered. Our results confirm that DNA motifs in the form of matrices produce better alignments than individual sequences. Finally, we report that empirical and theoretically derived twilight thresholds are useful for estimating the natural plasticity of regulatory sequences, and hence for filtering out unreliable alignments.


Assuntos
DNA/química , Elementos Reguladores de Transcrição , Alinhamento de Sequência/métodos , Fatores de Transcrição/química , Sítios de Ligação , DNA/metabolismo , Motivos de Nucleotídeos , Matrizes de Pontuação de Posição Específica , Fatores de Transcrição/metabolismo
19.
BMC Genomics ; 15: 317, 2014 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-24773781

RESUMO

BACKGROUND: Using motif detection programs it is fairly straightforward to identify conserved cis-sequences in promoters of co-regulated genes. In contrast, the identification of the transcription factors (TFs) interacting with these cis-sequences is much more elaborate. To facilitate this, we explore the possibility of using several bioinformatic and experimental approaches for TF identification. This starts with the selection of co-regulated gene sets and leads first to the prediction and then to the experimental validation of TFs interacting with cis-sequences conserved in the promoters of these co-regulated genes. RESULTS: Using the PathoPlant database, 32 up-regulated gene groups were identified with microarray data for drought-responsive gene expression from Arabidopsis thaliana. Application of the binding site estimation suite of tools (BEST) discovered 179 conserved sequence motifs within the corresponding promoters. Using the STAMP web-server, 49 sequence motifs were classified into 7 motif families for which similarities with known cis-regulatory sequences were identified. All motifs were subjected to a footprintDB analysis to predict interacting DNA binding domains from plant TF families. Predictions were confirmed by using a yeast-one-hybrid approach to select interacting TFs belonging to the predicted TF families. TF-DNA interactions were further experimentally validated in yeast and with a Physcomitrella patens transient expression system, leading to the discovery of several novel TF-DNA interactions. CONCLUSIONS: The present work demonstrates the successful integration of several bioinformatic resources with experimental approaches to predict and validate TFs interacting with conserved sequence motifs in co-regulated genes.


Assuntos
Biologia Computacional , Regulação da Expressão Gênica de Plantas , Fatores de Transcrição/metabolismo , Arabidopsis/genética , Arabidopsis/fisiologia , Secas , Genes de Plantas
20.
BMC Genomics ; 14: 772, 2013 Nov 09.
Artigo em Inglês | MEDLINE | ID: mdl-24206529

RESUMO

BACKGROUND: Intrinsically disordered proteins, found in all living organisms, are essential for basic cellular functions and complement the function of ordered proteins. It has been shown that protein disorder is linked to the G + C content of the genome. Furthermore, recent investigations have suggested that the evolutionary dynamics of the plant nucleus adds disordered segments to open reading frames alike, and these segments are not necessarily conserved among orthologous genes. RESULTS: In the present work the distribution of intrinsically disordered proteins along the chromosomes of several representative plants was analyzed. The reported results support a non-random distribution of disordered proteins along the chromosomes of Arabidopsis thaliana and Oryza sativa, two model eudicot and monocot plant species, respectively. In fact, for most chromosomes positive correlations between the frequency of disordered segments of 30+ amino acids and both recombination rates and G + C content were observed. CONCLUSIONS: These analyses demonstrate that the presence of disordered segments among plant proteins is associated with the rates of genetic recombination of their encoding genes. Altogether, these findings suggest that high recombination rates, as well as chromosomal rearrangements, could induce disordered segments in proteins during evolution.


Assuntos
Aminoácidos/genética , Evolução Molecular , Proteínas de Plantas/genética , Recombinação Genética , Arabidopsis/genética , Composição de Bases/genética , Biologia Computacional , Fases de Leitura Aberta , Oryza/genética , Filogenia , Proteínas de Plantas/química , Proteoma
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA