Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 18 de 18
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
1.
Nucleic Acids Res ; 52(D1): D174-D182, 2024 Jan 05.
Artículo en Inglés | MEDLINE | ID: mdl-37962376

RESUMEN

JASPAR (https://jaspar.elixir.no/) is a widely-used open-access database presenting manually curated high-quality and non-redundant DNA-binding profiles for transcription factors (TFs) across taxa. In this 10th release and 20th-anniversary update, the CORE collection has expanded with 329 new profiles. We updated three existing profiles and provided orthogonal support for 72 profiles from the previous release's UNVALIDATED collection. Altogether, the JASPAR 2024 update provides a 20% increase in CORE profiles from the previous release. A trimming algorithm enhanced profiles by removing low information content flanking base pairs, which were likely uninformative (within the capacity of the PFM models) for TFBS predictions and modelling TF-DNA interactions. This release includes enhanced metadata, featuring a refined classification for plant TFs' structural DNA-binding domains. The new JASPAR collections prompt updates to the genomic tracks of predicted TF binding sites (TFBSs) in 8 organisms, with human and mouse tracks available as native tracks in the UCSC Genome browser. All data are available through the JASPAR web interface and programmatically through its API and the updated Bioconductor and pyJASPAR packages. Finally, a new TFBS extraction tool enables users to retrieve predicted JASPAR TFBSs intersecting their genomic regions of interest.


Asunto(s)
Bases de Datos Genéticas , Unión Proteica , Factores de Transcripción , Animales , Humanos , Ratones , Bases de Datos Genéticas/normas , Bases de Datos Genéticas/tendencias , Factores de Transcripción/genética , Factores de Transcripción/metabolismo , Plantas/genética
2.
Dis Model Mech ; 16(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37529920

RESUMEN

In the past decades, the zebrafish has become a disease model with increasing popularity owing to its advantages that include fast development, easy genetic manipulation, simplicity for imaging, and sharing conserved disease-associated genes and pathways with those of human. In parallel, studies of disease mechanisms are increasingly focusing on non-coding mutations, which require genome annotation maps of regulatory elements, such as enhancers and promoters. In line with this, genomic resources for zebrafish research are expanding, producing a variety of genomic data that help in defining regulatory elements and their conservation between zebrafish and humans. Here, we discuss recent developments in generating functional annotation maps for regulatory elements of the zebrafish genome and how this can be applied to human diseases. We highlight community-driven developments, such as DANIO-CODE, in generating a centralised and standardised catalogue of zebrafish genomics data and functional annotations; consider the advantages and limitations of current annotation maps; and offer considerations for interpreting and integrating existing maps with comparative genomics tools. We also discuss the need for developing standardised genomics protocols and bioinformatic pipelines and provide suggestions for the development of analysis and visualisation tools that will integrate various multiomic bulk sequencing data together with fast-expanding data on single-cell methods, such as single-cell assay for transposase-accessible chromatin with sequencing. Such integration tools are essential to exploit the multiomic chromatin characterisation offered by bulk genomics together with the cell-type resolution offered by emerging single-cell methods. Together, these advances will build an expansive toolkit for interrogating the mechanisms of human disease in zebrafish.


Asunto(s)
Genómica , Pez Cebra , Animales , Humanos , Pez Cebra/genética , Genómica/métodos , Genoma , Cromatina , Regeneración/genética
3.
Nat Commun ; 14(1): 2784, 2023 05 15.
Artículo en Inglés | MEDLINE | ID: mdl-37188674

RESUMEN

DNA methylation variations are prevalent in human obesity but evidence of a causative role in disease pathogenesis is limited. Here, we combine epigenome-wide association and integrative genomics to investigate the impact of adipocyte DNA methylation variations in human obesity. We discover extensive DNA methylation changes that are robustly associated with obesity (N = 190 samples, 691 loci in subcutaneous and 173 loci in visceral adipocytes, P < 1 × 10-7). We connect obesity-associated methylation variations to transcriptomic changes at >500 target genes, and identify putative methylation-transcription factor interactions. Through Mendelian Randomisation, we infer causal effects of methylation on obesity and obesity-induced metabolic disturbances at 59 independent loci. Targeted methylation sequencing, CRISPR-activation and gene silencing in adipocytes, further identifies regional methylation variations, underlying regulatory elements and novel cellular metabolic effects. Our results indicate DNA methylation is an important determinant of human obesity and its metabolic complications, and reveal mechanisms through which altered methylation may impact adipocyte functions.


Asunto(s)
Metilación de ADN , Diabetes Mellitus , Humanos , Adipocitos/metabolismo , Obesidad/metabolismo , Diabetes Mellitus/metabolismo , Genómica , Epigénesis Genética
4.
Nat Genet ; 54(7): 1037-1050, 2022 07.
Artículo en Inglés | MEDLINE | ID: mdl-35789323

RESUMEN

Zebrafish, a popular organism for studying embryonic development and for modeling human diseases, has so far lacked a systematic functional annotation program akin to those in other animal models. To address this, we formed the international DANIO-CODE consortium and created a central repository to store and process zebrafish developmental functional genomic data. Our data coordination center ( https://danio-code.zfin.org ) combines a total of 1,802 sets of unpublished and re-analyzed published genomic data, which we used to improve existing annotations and show its utility in experimental design. We identified over 140,000 cis-regulatory elements throughout development, including classes with distinct features dependent on their activity in time and space. We delineated the distinct distance topology and chromatin features between regulatory elements active during zygotic genome activation and those active during organogenesis. Finally, we matched regulatory elements and epigenomic landscapes between zebrafish and mouse and predicted functional relationships between them beyond sequence similarity, thus extending the utility of zebrafish developmental genomics to mammals.


Asunto(s)
Bases de Datos Genéticas , Regulación del Desarrollo de la Expresión Génica , Genoma , Genómica , Secuencias Reguladoras de Ácidos Nucleicos , Proteínas de Pez Cebra , Pez Cebra , Animales , Cromatina/genética , Genoma/genética , Humanos , Ratones , Anotación de Secuencia Molecular , Organogénesis/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Pez Cebra/embriología , Pez Cebra/genética , Proteínas de Pez Cebra/genética
5.
Microbiol Spectr ; 10(2): e0243421, 2022 04 27.
Artículo en Inglés | MEDLINE | ID: mdl-35377231

RESUMEN

Streptomyces rimosus ATCC 10970 is the parental strain of industrial strains used for the commercial production of the important antibiotic oxytetracycline. As an actinobacterium with a large linear chromosome containing numerous long repeat regions, high GC content, and a single giant linear plasmid (GLP), these genomes are challenging to assemble. Here, we apply a hybrid sequencing approach relying on the combination of short- and long-read next-generation sequencing platforms and whole-genome restriction analysis by using pulsed-field gel electrophoresis (PFGE) to produce a high-quality reference genome for this biotechnologically important bacterium. By using PFGE to separate and isolate plasmid DNA from chromosomal DNA, we successfully sequenced the GLP using Nanopore data alone. Using this approach, we compared the sequence of GLP in the parent strain ATCC 10970 with those found in two semi-industrial progenitor strains, R6-500 and M4018. Sequencing of the GLP of these three S. rimosus strains shed light on several rearrangements accompanied by transposase genes, suggesting that transposases play an important role in plasmid and genome plasticity in S. rimosus. The polished annotation of secondary metabolite biosynthetic pathways compared to metabolite analysis in the ATCC 10970 strain also refined our knowledge of the secondary metabolite arsenal of these strains. The proposed methodology is highly applicable to a variety of sequencing projects, as evidenced by the reliable assemblies obtained. IMPORTANCE The genomes of Streptomyces species are difficult to assemble due to long repeats, extrachromosomal elements (giant linear plasmids [GLPs]), rearrangements, and high GC content. To improve the quality of the S. rimosus ATCC 10970 genome, producer of oxytetracycline, we validated the assembly of GLPs by applying a new approach to combine pulsed-field gel electrophoresis separation and GLP isolation and sequenced the isolated GLP with Oxford Nanopore technology. By examining the sequenced plasmids of ATCC 10970 and two industrial progenitor strains, R6-500 and M4018, we identified large GLP rearrangements. Analysis of the assembled plasmid sequences shed light on the role of transposases in genome plasticity of this species. The new methodological approach developed for Nanopore sequencing is highly applicable to a variety of sequencing projects. In addition, we present the annotated reference genome sequence of ATCC 10970 with a detailed analysis of the biosynthetic gene clusters.


Asunto(s)
Secuenciación de Nanoporos , Oxitetraciclina , Streptomyces rimosus , Genoma Bacteriano , Secuenciación de Nucleótidos de Alto Rendimiento/métodos , Oxitetraciclina/metabolismo , Plásmidos/genética , Streptomyces rimosus/genética , Streptomyces rimosus/metabolismo , Transposasas/genética , Transposasas/metabolismo
6.
Nucleic Acids Res ; 50(D1): D165-D173, 2022 01 07.
Artículo en Inglés | MEDLINE | ID: mdl-34850907

RESUMEN

JASPAR (http://jaspar.genereg.net/) is an open-access database containing manually curated, non-redundant transcription factor (TF) binding profiles for TFs across six taxonomic groups. In this 9th release, we expanded the CORE collection with 341 new profiles (148 for plants, 101 for vertebrates, 85 for urochordates, and 7 for insects), which corresponds to a 19% expansion over the previous release. We added 298 new profiles to the Unvalidated collection when no orthogonal evidence was found in the literature. All the profiles were clustered to provide familial binding profiles for each taxonomic group. Moreover, we revised the structural classification of DNA binding domains to consider plant-specific TFs. This release introduces word clouds to represent the scientific knowledge associated with each TF. We updated the genome tracks of TFBSs predicted with JASPAR profiles in eight organisms; the human and mouse TFBS predictions can be visualized as native tracks in the UCSC Genome Browser. Finally, we provide a new tool to perform JASPAR TFBS enrichment analysis in user-provided genomic regions. All the data is accessible through the JASPAR website, its associated RESTful API, the R/Bioconductor data package, and a new Python package, pyJASPAR, that facilitates serverless access to the data.


Asunto(s)
Bases de Datos Genéticas , Genómica/clasificación , Programas Informáticos , Factores de Transcripción/genética , Animales , Sitios de Unión/genética , Biología Computacional , Genoma/genética , Humanos , Ratones , Plantas/genética , Unión Proteica/genética , Factores de Transcripción/clasificación , Vertebrados/genética
7.
Nucleic Acids Res ; 48(D1): D87-D92, 2020 01 08.
Artículo en Inglés | MEDLINE | ID: mdl-31701148

RESUMEN

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) for TFs across multiple species in six taxonomic groups. In this 8th release of JASPAR, the CORE collection has been expanded with 245 new PFMs (169 for vertebrates, 42 for plants, 17 for nematodes, 10 for insects, and 7 for fungi), and 156 PFMs were updated (125 for vertebrates, 28 for plants and 3 for insects). These new profiles represent an 18% expansion compared to the previous release. JASPAR 2020 comes with a novel collection of unvalidated TF-binding profiles for which our curators did not find orthogonal supporting evidence in the literature. This collection has a dedicated web form to engage the community in the curation of unvalidated TF-binding profiles. Moreover, we created a Q&A forum to ease the communication between the user community and JASPAR curators. Finally, we updated the genomic tracks, inference tool, and TF-binding profile similarity clusters. All the data is available through the JASPAR website, its associated RESTful API, and through the JASPAR2020 R/Bioconductor package.


Asunto(s)
Sitios de Unión , Biología Computacional , Bases de Datos Genéticas , Programas Informáticos , Factores de Transcripción , Animales , Genómica/métodos , Unión Proteica , Factores de Transcripción/metabolismo , Interfaz Usuario-Computador , Navegador Web
8.
Food Technol Biotechnol ; 56(2): 270-277, 2018 Jun.
Artículo en Inglés | MEDLINE | ID: mdl-30228802

RESUMEN

Three metagenomic libraries were constructed using surface sediment samples from the northern Adriatic Sea. Two of the samples were taken from a highly polluted and an unpolluted site respectively. The third sample from a polluted site had been enriched using crude oil. The results of the metagenome analyses were incorporated in the REDPET relational database (http://redpet.bioinfo.pbf.hr/REDPET), which was generated using the previously developed MEGGASENSE platform. The database includes taxonomic data to allow the assessment of the biodiversity of metagenomic libraries and a general functional analysis of genes using hidden Markov model (HMM) profiles based on the KEGG database. A set of 22 specialised HMM profiles was developed to detect putative genes for hydrocarbon-degrading enzymes. Use of these profiles showed that the metagenomic library generated after selection on crude oil had enriched genes for aerobic n-alkane degradation. The use of this system for bioprospecting was exemplified using potential alkB and almA genes from this library.

10.
Nucleic Acids Res ; 46(D1): D260-D266, 2018 01 04.
Artículo en Inglés | MEDLINE | ID: mdl-29140473

RESUMEN

JASPAR (http://jaspar.genereg.net) is an open-access database of curated, non-redundant transcription factor (TF)-binding profiles stored as position frequency matrices (PFMs) and TF flexible models (TFFMs) for TFs across multiple species in six taxonomic groups. In the 2018 release of JASPAR, the CORE collection has been expanded with 322 new PFMs (60 for vertebrates and 262 for plants) and 33 PFMs were updated (24 for vertebrates, 8 for plants and 1 for insects). These new profiles represent a 30% expansion compared to the 2016 release. In addition, we have introduced 316 TFFMs (95 for vertebrates, 218 for plants and 3 for insects). This release incorporates clusters of similar PFMs in each taxon and each TF class per taxon. The JASPAR 2018 CORE vertebrate collection of PFMs was used to predict TF-binding sites in the human genome. The predictions are made available to the scientific community through a UCSC Genome Browser track data hub. Finally, this update comes with a new web framework with an interactive and responsive user-interface, along with new features. All the underlying data can be retrieved programmatically using a RESTful API and through the JASPAR 2018 R/Bioconductor package.


Asunto(s)
Bases de Datos Genéticas , Factores de Transcripción/metabolismo , Animales , Sitios de Unión/genética , Genómica , Humanos , Internet , Plantas/genética , Plantas/metabolismo , Posición Específica de Matrices de Puntuación , Unión Proteica/genética , Interfaz Usuario-Computador , Vertebrados/genética , Vertebrados/metabolismo
11.
Syst Appl Microbiol ; 38(3): 189-97, 2015 May.
Artículo en Inglés | MEDLINE | ID: mdl-25857844

RESUMEN

Samples were collected from sea sediments at seven sites in the northern Adriatic Sea that included six sites next to industrial complexes and one from a tourist site (recreational beach). The samples were assayed for alkanes and polycyclic aromatic hydrocarbons. The composition of the hydrocarbon samples suggested that industrial pollution was present in most cases. A sample from one site was also grown aerobically under crude oil enrichment in order to evaluate the response of indigenous bacterial populations to crude oil exposure. Analysis of 16S rRNA gene sequences showed varying microbial biodiversity depending on the level of pollution--ranging from low (200 detected genera) to high (1000+ genera) biodiversity, with lowest biodiversity observed in polluted samples. This indicated that there was considerable biodiversity in all sediment samples but it was severely restricted after exposure to crude oil selection pressure. Phylogenetic analysis of putative alkB genes showed high evolutionary diversity of the enzymes in the samples and suggested great potential for bioremediation and bioprospecting. The first systematic analysis of bacterial communities from sediments of the northern Adriatic Sea is presented, and it will provide a baseline assessment that may serve as a reference point for ecosystem changes and hydrocarbon degrading potential--a potential that could soon gain importance due to plans for oil exploitation in the area.


Asunto(s)
Bacterias/clasificación , Bacterias/aislamiento & purificación , Biodiversidad , Sedimentos Geológicos/microbiología , Agua de Mar/química , Contaminantes del Agua/análisis , Aerobiosis , Alcanos/análisis , Bacterias/genética , Bacterias/crecimiento & desarrollo , Proteínas Bacterianas/genética , Análisis por Conglomerados , ADN Ribosómico/química , ADN Ribosómico/genética , Hidrocarburos/análisis , Océanos y Mares , Filogenia , Hidrocarburos Policíclicos Aromáticos/análisis , ARN Ribosómico 16S/genética , Análisis de Secuencia de ADN
12.
mBio ; 5(6): e01328, 2014 Nov 11.
Artículo en Inglés | MEDLINE | ID: mdl-25389173

RESUMEN

UNLABELLED: Antigenic or phenotypic variation is a widespread phenomenon of expression of variable surface protein coats on eukaryotic microbes. To clarify the mechanism behind mutually exclusive gene expression, we characterized the genetic properties of the surface antigen multigene family in the ciliate Paramecium tetraurelia and the epigenetic factors controlling expression and silencing. Genome analysis indicated that the multigene family consists of intrachromosomal and subtelomeric genes; both classes apparently derive from different gene duplication events: whole-genome and intrachromosomal duplication. Expression analysis provides evidence for telomere position effects, because only subtelomeric genes follow mutually exclusive transcription. Microarray analysis of cultures deficient in Rdr3, an RNA-dependent RNA polymerase, in comparison to serotype-pure wild-type cultures, shows cotranscription of a subset of subtelomeric genes, indicating that the telomere position effect is due to a selective occurrence of Rdr3-mediated silencing in subtelomeric regions. We present a model of surface antigen evolution by intrachromosomal gene duplication involving the maintenance of positive selection of structurally relevant regions. Further analysis of chromosome heterogeneity shows that alternative telomere addition regions clearly affect transcription of closely related genes. Consequently, chromosome fragmentation appears to be of crucial importance for surface antigen expression and evolution. Our data suggest that RNAi-mediated control of this genetic network by trans-acting RNAs allows rapid epigenetic adaptation by phenotypic variation in combination with long-term genetic adaptation by Darwinian evolution of antigen genes. IMPORTANCE: Alternating surface protein structures have been described for almost all eukaryotic microbes, and a broad variety of functions have been described, such as virulence factors, adhesion molecules, and molecular camouflage. Mechanisms controlling gene expression of variable surface proteins therefore represent a powerful tool for rapid phenotypic variation across kingdoms in pathogenic as well as free-living eukaryotic microbes. However, the epigenetic mechanisms controlling synchronous expression and silencing of individual genes are hardly understood. Using the ciliate Paramecium tetraurelia as a (epi)genetic model, we showed that a subtelomeric gene position effect is associated with the selective occurrence of RNAi-mediated silencing of silent surface protein genes, suggesting small interfering RNA (siRNA)-mediated epigenetic cross talks between silent and active surface antigen genes. Our integrated genomic and molecular approach discloses the correlation between gene position effects and siRNA-mediated trans-silencing, thus providing two new parameters for regulation of mutually exclusive gene expression and the genomic organization of variant gene families.


Asunto(s)
Variación Antigénica/genética , Antígenos de Superficie/genética , Expresión Génica , Silenciador del Gen , Paramecium tetraurelia/genética , Interferencia de ARN , Telómero , Adaptación Biológica , Adaptación Fisiológica , Evolución Molecular , Duplicación de Gen , Perfilación de la Expresión Génica , Datos de Secuencia Molecular , Familia de Multigenes , Paramecium tetraurelia/inmunología , ARN Polimerasa Dependiente del ARN/genética , ARN Polimerasa Dependiente del ARN/metabolismo , Análisis de Secuencia de ADN
13.
Genome Announc ; 2(4)2014 Jul 17.
Artículo en Inglés | MEDLINE | ID: mdl-25035320

RESUMEN

The genome sequence of Streptomyces rimosus R6-500, an industrially improved strain which produces high titers of the important antibiotic oxytetracycline, is reported, as well as the genome sequences of two derivatives arising due to the genetic instability of the strain.

14.
J Ind Microbiol Biotechnol ; 41(2): 461-7, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24104398

RESUMEN

Successful genome mining is dependent on accurate prediction of protein function from sequence. This often involves dividing protein families into functional subtypes (e.g., with different substrates). In many cases, there are only a small number of known functional subtypes, but in the case of the adenylation domains of nonribosomal peptide synthetases (NRPS), there are >500 known substrates. Latent semantic indexing (LSI) was originally developed for text processing but has also been used to assign proteins to families. Proteins are treated as ''documents'' and it is necessary to encode properties of the amino acid sequence as ''terms'' in order to construct a term-document matrix, which counts the terms in each document. This matrix is then processed to produce a document-concept matrix, where each protein is represented as a row vector. A standard measure of the closeness of vectors to each other (cosines of the angle between them) provides a measure of protein similarity. Previous work encoded proteins as oligopeptide terms, i.e. counted oligopeptides, but used no information regarding location of oligopeptides in the proteins. A novel tokenization method was developed to analyze information from multiple alignments. LSI successfully distinguished between two functional subtypes in five well-characterized families. Visualization of different ''concept'' dimensions allows exploration of the structure of protein families. LSI was also used to predict the amino acid substrate of adenylation domains of NRPS. Better results were obtained when selected residues from multiple alignments were used rather than the total sequence of the adenylation domains. Using ten residues from the substrate binding pocket performed better than using 34 residues within 8 Å of the active site. Prediction efficiency was somewhat better than that of the best published method using a support vector machine.


Asunto(s)
Péptido Sintasas/química , Péptido Sintasas/metabolismo , Análisis de Secuencia de Proteína/métodos , Aminoácidos/química , Dominio Catalítico , Péptido Sintasas/clasificación , Alineación de Secuencia , Especificidad por Sustrato
15.
J Ind Microbiol Biotechnol ; 41(2): 211-7, 2014 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-24061567

RESUMEN

Actinomycetes are a very important source of natural products for the pharmaceutical industry and other applications. Most of the strains belong to Streptomyces or related genera, partly because they are particularly amenable to growth in the laboratory and industrial fermenters. It is unlikely that chemical synthesis can fulfil the needs of the pharmaceutical industry for novel compounds so there is a continuing need to find novel natural products. An evolutionary perspective can help this process in several ways. Genome mining attempts to identify secondary metabolite biosynthetic clusters in DNA sequences, which are likely to produce interesting chemical entities. There are often technical problems in assembling the DNA sequences of large modular clusters in genome and metagenome projects, which can be overcome partially using information about the evolution of the domain sequences. Understanding the evolutionary mechanisms of modular clusters should allow simulation of evolutionary pathways in the laboratory to generate novel compounds.


Asunto(s)
Actinobacteria/genética , Productos Biológicos/metabolismo , Evolución Molecular , Actinobacteria/metabolismo , Metabolismo Secundario/genética , Análisis de Secuencia de ADN , Streptomyces/genética , Streptomyces/metabolismo
16.
Genome Announc ; 1(4)2013 Aug 08.
Artículo en Inglés | MEDLINE | ID: mdl-23929477

RESUMEN

Streptomyces rapamycinicus strain NRRL 5491 produces the important drug rapamycin. It has a large genome of 12.7 Mb, of which over 3 Mb consists of 48 secondary metabolite biosynthesis clusters.

17.
BMC Genomics ; 14: 509, 2013 Jul 26.
Artículo en Inglés | MEDLINE | ID: mdl-23889801

RESUMEN

BACKGROUND: Contemporary coral reef research has firmly established that a genomic approach is urgently needed to better understand the effects of anthropogenic environmental stress and global climate change on coral holobiont interactions. Here we present KEGG orthology-based annotation of the complete genome sequence of the scleractinian coral Acropora digitifera and provide the first comprehensive view of the genome of a reef-building coral by applying advanced bioinformatics. DESCRIPTION: Sequences from the KEGG database of protein function were used to construct hidden Markov models. These models were used to search the predicted proteome of A. digitifera to establish complete genomic annotation. The annotated dataset is published in ZoophyteBase, an open access format with different options for searching the data. A particularly useful feature is the ability to use a Google-like search engine that links query words to protein attributes. We present features of the annotation that underpin the molecular structure of key processes of coral physiology that include (1) regulatory proteins of symbiosis, (2) planula and early developmental proteins, (3) neural messengers, receptors and sensory proteins, (4) calcification and Ca2+-signalling proteins, (5) plant-derived proteins, (6) proteins of nitrogen metabolism, (7) DNA repair proteins, (8) stress response proteins, (9) antioxidant and redox-protective proteins, (10) proteins of cellular apoptosis, (11) microbial symbioses and pathogenicity proteins, (12) proteins of viral pathogenicity, (13) toxins and venom, (14) proteins of the chemical defensome and (15) coral epigenetics. CONCLUSIONS: We advocate that providing annotation in an open-access searchable database available to the public domain will give an unprecedented foundation to interrogate the fundamental molecular structure and interactions of coral symbiosis and allow critical questions to be addressed at the genomic level based on combined aspects of evolutionary, developmental, metabolic, and environmental perspectives.


Asunto(s)
Acceso a la Información , Antozoos/genética , Minería de Datos , Bases de Datos Genéticas , Anotación de Secuencia Molecular/métodos , Proteómica/métodos , Homología de Secuencia de Ácido Nucleico , Animales , Conservación de los Recursos Naturales , Arrecifes de Coral , Internet
18.
Appl Environ Microbiol ; 78(23): 8183-90, 2012 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-22983969

RESUMEN

The high G+C content and large genome size make the sequencing and assembly of Streptomyces genomes more difficult than for other bacteria. Many pharmaceutically important natural products are synthesized by modular polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs). The analysis of such gene clusters is difficult if the genome sequence is not of the highest quality, because clusters can be distributed over several contigs, and sequencing errors can introduce apparent frameshifts into the large PKS and NRPS proteins. An additional problem is that the modular nature of the clusters results in the presence of imperfect repeats, which may cause assembly errors. The genome sequence of Streptomyces tsukubaensis NRRL18488 was scanned for potential PKS and NRPS modular clusters. A phylogenetic approach was used to identify multiple contigs belonging to the same cluster. Four PKS clusters and six NRPS clusters were identified. Contigs containing cluster sequences were analyzed in detail by using the ClustScan program, which suggested the order and orientation of the contigs. The sequencing of the appropriate PCR products confirmed the ordering and allowed the correction of apparent frameshifts resulting from sequencing errors. The product chemistry of such correctly assembled clusters could also be predicted. The analysis of one PKS cluster showed that it should produce a bafilomycin-like compound, and reverse transcription (RT)-PCR was used to show that the cluster was transcribed.


Asunto(s)
Familia de Multigenes , Péptido Sintasas/genética , Sintasas Poliquetidas/genética , Streptomyces/enzimología , Streptomyces/genética , ADN Bacteriano/química , ADN Bacteriano/genética , Genoma Bacteriano , Datos de Secuencia Molecular , Análisis de Secuencia de ADN
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...