Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
Más filtros










Base de datos
Intervalo de año de publicación
2.
Nat Protoc ; 18(1): 208-238, 2023 01.
Artículo en Inglés | MEDLINE | ID: mdl-36376589

RESUMEN

Uncultivated Bacteria and Archaea account for the vast majority of species on Earth, but obtaining their genomes directly from the environment, using shotgun sequencing, has only become possible recently. To realize the hope of capturing Earth's microbial genetic complement and to facilitate the investigation of the functional roles of specific lineages in a given ecosystem, technologies that accelerate the recovery of high-quality genomes are necessary. We present a series of analysis steps and data products for the extraction of high-quality metagenome-assembled genomes (MAGs) from microbiomes using the U.S. Department of Energy Systems Biology Knowledgebase (KBase) platform ( http://www.kbase.us/ ). Overall, these steps take about a day to obtain extracted genomes when starting from smaller environmental shotgun read libraries, or up to about a week from larger libraries. In KBase, the process is end-to-end, allowing a user to go from the initial sequencing reads all the way through to MAGs, which can then be analyzed with other KBase capabilities such as phylogenetic placement, functional assignment, metabolic modeling, pangenome functional profiling, RNA-Seq and others. While portions of such capabilities are available individually from other resources, the combination of the intuitive usability, data interoperability and integration of tools in a freely available computational resource makes KBase a powerful platform for obtaining MAGs from microbiomes. While this workflow offers tools for each of the key steps in the genome extraction process, it also provides a scaffold that can be easily extended with additional MAG recovery and analysis tools, via the KBase software development kit (SDK).


Asunto(s)
Metagenoma , Microbiota , Filogenia , Genoma Bacteriano , Microbiota/genética , Bacterias/genética , Metagenómica
3.
Proc Natl Acad Sci U S A ; 115(10): E2477-E2486, 2018 03 06.
Artículo en Inglés | MEDLINE | ID: mdl-29463761

RESUMEN

Polypedilum vanderplanki is a striking and unique example of an insect that can survive almost complete desiccation. Its genome and a set of dehydration-rehydration transcriptomes, together with the genome of Polypedilum nubifer (a congeneric desiccation-sensitive midge), were recently released. Here, using published and newly generated datasets reflecting detailed transcriptome changes during anhydrobiosis, as well as a developmental series, we show that the TCTAGAA DNA motif, which closely resembles the binding motif of the Drosophila melanogaster heat shock transcription activator (Hsf), is significantly enriched in the promoter regions of desiccation-induced genes in P. vanderplanki, such as genes encoding late embryogenesis abundant (LEA) proteins, thioredoxins, or trehalose metabolism-related genes, but not in P. nubifer Unlike P. nubifer, P. vanderplanki has double TCTAGAA sites upstream of the Hsf gene itself, which is probably responsible for the stronger activation of Hsf in P. vanderplanki during desiccation compared with P. nubifer To confirm the role of Hsf in desiccation-induced gene activation, we used the Pv11 cell line, derived from P. vanderplanki embryo. After preincubation with trehalose, Pv11 cells can enter anhydrobiosis and survive desiccation. We showed that Hsf knockdown suppresses trehalose-induced activation of multiple predicted Hsf targets (including P. vanderplanki-specific LEA protein genes) and reduces the desiccation survival rate of Pv11 cells fivefold. Thus, cooption of the heat shock regulatory system has been an important evolutionary mechanism for adaptation to desiccation in P. vanderplanki.


Asunto(s)
Chironomidae/fisiología , Factores de Transcripción del Choque Térmico/metabolismo , Proteínas de Insectos/metabolismo , Animales , Evolución Biológica , Chironomidae/genética , Deshidratación , Femenino , Factores de Transcripción del Choque Térmico/genética , Respuesta al Choque Térmico , Proteínas de Insectos/genética , Masculino , Estrés Fisiológico
4.
RNA Biol ; 13(2): 232-42, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26732206

RESUMEN

Transcripts often harbor RNA elements, which regulate cell processes co- or post-transcriptionally. The functions of many regulatory RNA elements depend on their structure, thus it is important to determine the structure as well as to scan genomes for structured elements. State of the art ab initio approaches to predict structured RNAs rely on DNA sequence analysis. They use 2 major types of information inferred from a sequence: thermodynamic stability of an RNA structure and evolutionary footprints of base-pair interactions. In recent years, chemical probing of RNA has arisen as an alternative source of structural information. RNA probing experiments detect positions accessible to specific types of chemicals or enzymes indicating their propensity to be in a paired or unpaired state. There exist several strategies to integrate probing data into RNA secondary structure prediction algorithms that substantially improve the prediction quality. However, whether and how probing data could contribute to detection of structured RNAs remains an open question. We previously developed the energy-based approach RNASurface to detect locally optimal structured RNA elements. Here, we integrate probing data into the RNASurface energy model using a general framework. We show that the use of experimental data allows for better discrimination of ncRNAs from other transcripts. Application of RNASurface to genome-wide analysis of the human transcriptome with PARS data identifies previously undetectable segments, with evidence of functionality for some of them.


Asunto(s)
Conformación de Ácido Nucleico , ARN/genética , Análisis de Secuencia de ADN , Transcriptoma/genética , Algoritmos , Genoma Humano , Humanos , Anotación de Secuencia Molecular , ARN/química
5.
BMC Genomics ; 16: 400, 2015 May 21.
Artículo en Inglés | MEDLINE | ID: mdl-25994131

RESUMEN

BACKGROUND: Pseudogymnoascus spp. is a wide group of fungi lineages in the family Pseudorotiaceae including an aggressive pathogen of bats P. destructans. Although several lineages of P. spp. were shown to produce ascospores in culture, the vast majority of P. spp. demonstrates no evidence of sexual reproduction. P. spp. can tolerate a wide range of different temperatures and salinities and can survive even in permafrost layer. Adaptability of P. spp. to different environments is accompanied by extremely variable morphology and physiology. RESULTS: We sequenced genotypes of 14 strains of P. spp., 5 of which were extracted from permafrost, 1 from a cryopeg, a layer of unfrozen ground in permafrost, and 8 from temperate surface environments. All sequenced genotypes are haploid. Nucleotide diversity among these genomes is very high, with a typical evolutionary distance at synonymous sites dS ≈ 0.5, suggesting that the last common ancestor of these strains lived >50 Mya. The strains extracted from permafrost do not form a separate clade. Instead, each permafrost strain has close relatives from temperate environments. We observed a strictly clonal population structure with no conflicting topologies for ~99% of genome sequences. However, there is a number of short (~100-10,000 nt) genomic segments with the total length of 67.6 Kb which possess phylogenetic patterns strikingly different from the rest of the genome. The most remarkable case is a MAT-locus, which has 2 distinct alleles interspersed along the whole-genome phylogenetic tree. CONCLUSIONS: Predominantly clonal structure of genome sequences is consistent with the observations that sexual reproduction is rare in P. spp. Small number of regions with noncanonical phylogenies seem to arise due to some recombination events between derived lineages of P. spp., with MAT-locus being transferred on multiple occasions. All sequenced strains have heterothallic configuration of MAT-locus.


Asunto(s)
Ascomicetos/fisiología , Evolución Clonal , Genoma Fúngico , Ascomicetos/clasificación , Ascomicetos/genética , Evolución Molecular , Filogenia , Reproducción Asexuada , Análisis de Secuencia de ADN , Especificidad de la Especie
6.
Genome Biol Evol ; 6(6): 1437-47, 2014 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-24966225

RESUMEN

Splice sites (SSs) are short sequences that are crucial for proper mRNA splicing in eukaryotic cells, and therefore can be expected to be shaped by strong selection. Nevertheless, in mammals and in other intron-rich organisms, many of the SSs often involve nonconsensus (Nc), rather than consensus (Cn), nucleotides, and beyond the two critical nucleotides, the SSs are not perfectly conserved between species. Here, we compare the SS sequences between primates, and between Drosophila fruit flies, to reveal the pattern of selection acting at SSs. Cn-to-Nc substitutions are less frequent, and Nc-to-Cn substitutions are more frequent, than neutrally expected, indicating, respectively, negative and positive selection. This selection is relatively weak (1 < |4Nes| < 4), and has a similar efficiency in primates and in Drosophila. Within some nucleotide positions, the positive selection in favor of Nc-to-Cn substitutions is weaker than the negative selection maintaining already established Cn nucleotides; this difference is due to site-specific negative selection favoring current Nc nucleotides. In general, however, the strength of negative selection protecting the Cn alleles is similar in magnitude to the strength of positive selection favoring replacement of Nc alleles, as expected under the simple nearly neutral turnover. In summary, although a fraction of the Nc nucleotides within SSs is maintained by selection, the abundance of deleterious nucleotides in this class suggests a substantial genome-wide drift load.


Asunto(s)
Drosophila/genética , Primates/genética , Sitios de Empalme de ARN , Selección Genética , Animales , Secuencia de Bases , Flujo Genético , Humanos , Empalme del ARN
7.
BMC Genomics ; 14: 745, 2013 Nov 01.
Artículo en Inglés | MEDLINE | ID: mdl-24175918

RESUMEN

BACKGROUND: Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in prokaryotes is one of the critical tasks of modern genomics. Bacteria from different taxonomic groups, whose lifestyles and natural environments are substantially different, possess highly diverged transcriptional regulatory networks. The comparative genomics approaches are useful for in silico reconstruction of bacterial regulons and networks operated by both transcription factors (TFs) and RNA regulatory elements (riboswitches). DESCRIPTION: RegPrecise (http://regprecise.lbl.gov) is a web resource for collection, visualization and analysis of transcriptional regulons reconstructed by comparative genomics. We significantly expanded a reference collection of manually curated regulons we introduced earlier. RegPrecise 3.0 provides access to inferred regulatory interactions organized by phylogenetic, structural and functional properties. Taxonomy-specific collections include 781 TF regulogs inferred in more than 160 genomes representing 14 taxonomic groups of Bacteria. TF-specific collections include regulogs for a selected subset of 40 TFs reconstructed across more than 30 taxonomic lineages. Novel collections of regulons operated by RNA regulatory elements (riboswitches) include near 400 regulogs inferred in 24 bacterial lineages. RegPrecise 3.0 provides four classifications of the reference regulons implemented as controlled vocabularies: 55 TF protein families; 43 RNA motif families; ~150 biological processes or metabolic pathways; and ~200 effectors or environmental signals. Genome-wide visualization of regulatory networks and metabolic pathways covered by the reference regulons are available for all studied genomes. A separate section of RegPrecise 3.0 contains draft regulatory networks in 640 genomes obtained by an conservative propagation of the reference regulons to closely related genomes. CONCLUSIONS: RegPrecise 3.0 gives access to the transcriptional regulons reconstructed in bacterial genomes. Analytical capabilities include exploration of: regulon content, structure and function; TF binding site motifs; conservation and variations in genome-wide regulatory networks across all taxonomic groups of Bacteria. RegPrecise 3.0 was selected as a core resource on transcriptional regulation of the Department of Energy Systems Biology Knowledgebase, an emerging software and data environment designed to enable researchers to collaboratively generate, test and share new hypotheses about gene and protein functions, perform large-scale analyses, and model interactions in microbes, plants, and their communities.


Asunto(s)
Bacterias/genética , Bases de Datos Genéticas , Genoma Bacteriano , Bacterias/clasificación , Redes Reguladoras de Genes/genética , Internet , Redes y Vías Metabólicas/genética , Factores de Transcripción/genética , Interfaz Usuario-Computador
8.
BMC Genomics ; 14: 476, 2013 Jul 15.
Artículo en Inglés | MEDLINE | ID: mdl-23855885

RESUMEN

BACKGROUND: Genlisea aurea (Lentibulariaceae) is a carnivorous plant with unusually small genome size - 63.6 Mb - one of the smallest known among higher plants. Data on the genome sizes and the phylogeny of Genlisea suggest that this is a derived state within the genus. Thus, G. aurea is an excellent model organism for studying evolutionary mechanisms of genome contraction. RESULTS: Here we report sequencing and de novo draft assembly of G. aurea genome. The assembly consists of 10,687 contigs of the total length of 43.4 Mb and includes 17,755 complete and partial protein-coding genes. Its comparison with the genome of Mimulus guttatus, another representative of higher core Lamiales clade, reveals striking differences in gene content and length of non-coding regions. CONCLUSIONS: Genome contraction was a complex process, which involved gene loss and reduction of lengths of introns and intergenic regions, but not intron loss. The gene loss is more frequent for the genes that belong to multigenic families indicating that genetic redundancy is an important prerequisite for genome size reduction.


Asunto(s)
Tamaño del Genoma , Genoma de Planta , Magnoliopsida/genética , Evolución Biológica , Hibridación Genómica Comparativa , ADN Intergénico/genética , ADN de Plantas/genética , Intrones , Anotación de Secuencia Molecular , Filogenia , Análisis de Secuencia de ADN , Transcriptoma
9.
Hum Mol Genet ; 22(17): 3449-59, 2013 Sep 01.
Artículo en Inglés | MEDLINE | ID: mdl-23640990

RESUMEN

Proper splicing is often crucial for gene functioning and its disruption may be strongly deleterious. Nevertheless, even the essential for splicing canonical dinucleotides of the splice sites are often polymorphic. Here, we use data from The 1000 Genomes Project to study single-nucleotide polymorphisms (SNPs) in the canonical dinucleotides. Splice sites carrying SNPs are enriched in weakly expressed genes and in rarely used alternative splice sites. Genes with disrupted splice sites tend to have low selective constraint, and the splice sites disrupted by SNPs are less likely to be conserved in mouse. Furthermore, SNPs are enriched in splice sites whose effects on gene function are minor: splice sites located outside of protein-coding regions, in shorter exons, closer to the 3'-ends of proteins, and outside of functional protein domains. Most of these effects are more pronounced for high-frequency SNPs. Despite these trends, many of the polymorphic sites may still substantially affect the function of the corresponding genes. A number of the observed splice site-disrupting SNPs, including several high-frequency ones, were found among mutations described in OMIM.


Asunto(s)
Genoma Humano , Polimorfismo de Nucleótido Simple , Sitios de Empalme de ARN , Animales , Bases de Datos Genéticas , Evolución Molecular , Variación Genética , Genoma , Humanos , Ratones , Conformación Proteica , Proteínas/química , Empalme del ARN , Alineación de Secuencia
10.
Nucleic Acids Res ; 40(12): e93, 2012 Jul.
Artículo en Inglés | MEDLINE | ID: mdl-22422836

RESUMEN

Identification of transcriptional regulatory regions and tracing their internal organization are important for understanding the eukaryotic cell machinery. Cis-regulatory modules (CRMs) of higher eukaryotes are believed to possess a regulatory 'grammar', or preferred arrangement of binding sites, that is crucial for proper regulation and thus tends to be evolutionarily conserved. Here, we present a method CORECLUST (COnservative REgulatory CLUster STructure) that predicts CRMs based on a set of positional weight matrices. Given regulatory regions of orthologous and/or co-regulated genes, CORECLUST constructs a CRM model by revealing the conserved rules that describe the relative location of binding sites. The constructed model may be consequently used for the genome-wide prediction of similar CRMs, and thus detection of co-regulated genes, and for the investigation of the regulatory grammar of the system. Compared with related methods, CORECLUST shows better performance at identification of CRMs conferring muscle-specific gene expression in vertebrates and early-developmental CRMs in Drosophila.


Asunto(s)
Regulación de la Expresión Génica , Elementos Reguladores de la Transcripción , Análisis de Secuencia de ADN , Algoritmos , Animales , Tipificación del Cuerpo/genética , Drosophila/embriología , Drosophila/genética , Drosophila/metabolismo , Elementos de Facilitación Genéticos , Regulación del Desarrollo de la Expresión Génica , Músculos/metabolismo , Posición Específica de Matrices de Puntuación , Programas Informáticos
11.
J Bioinform Comput Biol ; 4(5): 1033-56, 2006 Oct.
Artículo en Inglés | MEDLINE | ID: mdl-17099940

RESUMEN

Membrane proteins perform a number of crucial functions as transporters, receptors, and components of enzyme complexes. Identification of membrane proteins and prediction of their topology is thus an important part of genome annotation. We present here an overview of transmembrane segments in protein sequences, summarize data from large-scale genome studies, and report results of benchmarking of several popular internet servers.


Asunto(s)
Algoritmos , Membrana Celular/química , Internet , Proteínas de la Membrana/química , Análisis de Secuencia de Proteína/métodos , Validación de Programas de Computación , Secuencia de Aminoácidos , Benchmarking , Datos de Secuencia Molecular , Reproducibilidad de los Resultados , Sensibilidad y Especificidad , Análisis de Secuencia de Proteína/normas
12.
In Silico Biol ; 3(1-2): 197-204, 2003.
Artículo en Inglés | MEDLINE | ID: mdl-14524337

RESUMEN

Transmembrane transport is an essential component of the cell life. Many genes encoding known or putative transport proteins are found in bacterial genomes. In most cases their substrate specificity is not experimentally determined and only approximately predicted by comparative genomic analysis. Even less is known about the 3D structure of transporters. Nevertheless, the published experimental data demonstrate that channel-forming residues determine the substrate specificity of secondary transporters and analysis of these residues would provide better understanding of the transport mechanism. We developed a simple computational method for identification of channel-forming residues in transporter sequences. It is based on the analysis of amino acids frequencies in bacterial secondary transporters. We applied this method to a variety of transmembrane proteins with resolved 3D structure. The predictions are in sufficiently good agreement with the real protein structure.


Asunto(s)
Proteínas Bacterianas/química , Canales Iónicos/química , Canales Iónicos/fisiología , Proteínas de la Membrana/química , Transportadoras de Casetes de Unión a ATP/química , Algoritmos , Secuencia de Aminoácidos , Proteínas Bacterianas/fisiología , Transporte Biológico , Proteínas de la Membrana/fisiología , Modelos Moleculares , Modelos Teóricos , Fragmentos de Péptidos/química , Conformación Proteica , Estructura Secundaria de Proteína
13.
Proteins ; 51(1): 85-95, 2003 Apr 01.
Artículo en Inglés | MEDLINE | ID: mdl-12596266

RESUMEN

Aligned amino acid sequences of three functionally independent samples of transmembrane (TM) transport proteins have been analyzed. The concept of TM-kernel is proposed as the most probable transmembrane region of a sequence. The average amino acid composition of TM-kernels differs from the published amino acid composition of transmembrane segments. TM-kernels contain more alanines, glycines, and less polar, charged, and aromatic residues in contrast to non-TM-proteins. There are also differences between TM-kernels of bacterial and eukaryotic proteins. We have constructed amino acid substitution matrices for bacterial TM-kernels, named the BATMAS (BActerial Transmembrane MAtrix of Substitutions) series. In TM-kernels, polar and charged residues, as well as proline and tyrosine, are highly conserved, whereas there are more substitutions within the group of hydrophobic residues, in contrast to non-TM-proteins that have fewer, relatively more conserved, hydrophobic residues. These results demonstrate that alignment of transmembrane proteins should be based on at least two amino acid substitution matrices, one for loops (e.g., the BLOSUM series) and one for TM-segments (the BATMAS series), and the choice of the TM-matrix should be different for eukaryotic and bacterial proteins.


Asunto(s)
Proteínas Bacterianas/química , Proteínas de Transporte de Membrana/química , Alineación de Secuencia/métodos , Análisis de Secuencia de Proteína , Transportadoras de Casetes de Unión a ATP/química , Algoritmos , Secuencia de Aminoácidos , Sustitución de Aminoácidos , Datos de Secuencia Molecular
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...