Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 19 de 19
Filtrar
1.
Nat Commun ; 15(1): 3699, 2024 May 02.
Artículo en Inglés | MEDLINE | ID: mdl-38698035

RESUMEN

In silico identification of viral anti-CRISPR proteins (Acrs) has relied largely on the guilt-by-association method using known Acrs or anti-CRISPR associated proteins (Acas) as the bait. However, the low number and limited spread of the characterized archaeal Acrs and Aca hinders our ability to identify Acrs using guilt-by-association. Here, based on the observation that the few characterized archaeal Acrs and Aca are transcribed immediately post viral infection, we hypothesize that these genes, and many other unidentified anti-defense genes (ADG), are under the control of conserved regulatory sequences including a strong promoter, which can be used to predict anti-defense genes in archaeal viruses. Using this consensus sequence based method, we identify 354 potential ADGs in 57 archaeal viruses and 6 metagenome-assembled genomes. Experimental validation identified a CRISPR subtype I-A inhibitor and the first virally encoded inhibitor of an archaeal toxin-antitoxin based immune system. We also identify regulatory proteins potentially akin to Acas that can facilitate further identification of ADGs combined with the guilt-by-association approach. These results demonstrate the potential of regulatory sequence analysis for extensive identification of ADGs in viruses of archaea and bacteria.


Asunto(s)
Archaea , Virus de Archaea , Virus de Archaea/genética , Archaea/genética , Archaea/virología , Archaea/inmunología , Regiones Promotoras Genéticas/genética , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas/genética , Secuencias Reguladoras de Ácidos Nucleicos/genética , Proteínas Virales/genética , Proteínas Arqueales/genética , Proteínas Arqueales/metabolismo , Metagenoma/genética , Proteínas Asociadas a CRISPR/genética , Proteínas Asociadas a CRISPR/metabolismo , Sistemas CRISPR-Cas/genética
2.
mBio ; 15(2): e0309223, 2024 Feb 14.
Artículo en Inglés | MEDLINE | ID: mdl-38189270

RESUMEN

The identification of microbial genes essential for survival as those with lethal knockout phenotype (LKP) is a common strategy for functional interrogation of genomes. However, interpretation of the LKP is complicated because a substantial fraction of the genes with this phenotype remains poorly functionally characterized. Furthermore, many genes can exhibit LKP not because their products perform essential cellular functions but because their knockout activates the toxicity of other genes (conditionally essential genes). We analyzed the sets of LKP genes for two archaea, Methanococcus maripaludis and Sulfolobus islandicus, using a variety of computational approaches aiming to differentiate between essential and conditionally essential genes and to predict at least a general function for as many of the proteins encoded by these genes as possible. This analysis allowed us to predict the functions of several LKP genes including previously uncharacterized subunit of the GINS protein complex with an essential function in genome replication and of the KEOPS complex that is responsible for an essential tRNA modification as well as GRP protease implicated in protein quality control. Additionally, several novel antitoxins (conditionally essential genes) were predicted, and this prediction was experimentally validated by showing that the deletion of these genes together with the adjacent genes apparently encoding the cognate toxins caused no growth defect. We applied principal component analysis based on sequence and comparative genomic features showing that this approach can separate essential genes from conditionally essential ones and used it to predict essential genes in other archaeal genomes.IMPORTANCEOnly a relatively small fraction of the genes in any bacterium or archaeon is essential for survival as demonstrated by the lethal effect of their disruption. The identification of essential genes and their functions is crucial for understanding fundamental cell biology. However, many of the genes with a lethal knockout phenotype remain poorly functionally characterized, and furthermore, many genes can exhibit this phenotype not because their products perform essential cellular functions but because their knockout activates the toxicity of other genes. We applied state-of-the-art computational methods to predict the functions of a number of uncharacterized genes with the lethal knockout phenotype in two archaeal species and developed a computational approach to predict genes involved in essential functions. These findings advance the current understanding of key functionalities of archaeal cells.


Asunto(s)
Archaea , Proteínas Arqueales , Archaea/genética , Archaea/metabolismo , Genes Esenciales , Genoma Arqueal , Genómica , Fenotipo , Proteínas Arqueales/genética , Proteínas Arqueales/metabolismo
3.
Front Microbiol ; 14: 1291523, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-38029211

RESUMEN

Genomes of bacteria and archaea contain a much larger fraction of unidirectional (serial) gene pairs than convergent or divergent gene pairs. Many of the unidirectional gene pairs have short overlaps of -4 nt and -1 nt. As shown previously, translation of the genes in overlapping unidirectional gene pairs is tightly coupled. Two alternative models for the fate of the post-termination ribosome predict either that overlaps or very short intergenic distances are essential for translational coupling or that the undissociated post-termination ribosome can scan through long intergenic regions, up to hundreds of nucleotides. We aimed to experimentally resolve the contradiction between the two models by analyzing three native gene pairs from the model archaeon Haloferax volcanii and three native pairs from Escherichia coli. A two reporter gene system was used to quantify the reinitiation frequency, and several stop codons in the upstream gene were introduced to increase the intergenic distances. For all six gene pairs from two species, an extremely strong dependence of the reinitiation efficiency on the intergenic distance was unequivocally demonstrated, such that even short intergenic distances of about 20 nt almost completely abolished translational coupling. Bioinformatic analysis of the intergenic distances in all unidirectional gene pairs in the genomes of H. volcanii and E. coli and in 1,695 prokaryotic species representative of 49 phyla showed that intergenic distances of -4 nt or -1 nt (= short gene overlaps of 4 nt or 1 nt) were by far most common in all these groups of archaea and bacteria. A small set of genes in E. coli, but not in H. volcanii, had intergenic distances of around +10 nt. Our experimental and bioinformatic analyses clearly show that translational coupling requires short gene overlaps, whereas scanning of intergenic regions by the post-termination ribosome occurs rarely, if at all. Short overlaps are enriched among genes that encode subunits of heteromeric complexes, and co-translational complex formation requiring precise subunit stoichiometry likely confers an evolutionary advantage that drove the formation and conservation of overlapping gene pairs during evolution.

4.
Proc Natl Acad Sci U S A ; 120(16): e2300154120, 2023 04 18.
Artículo en Inglés | MEDLINE | ID: mdl-37036997

RESUMEN

The evolution of genomes in all life forms involves two distinct, dynamic types of genomic changes: gene duplication (and loss) that shape families of paralogous genes and extension (and contraction) of low-complexity regions (LCR), which occurs through dynamics of short repeats in protein-coding genes. Although the roles of each of these types of events in genome evolution have been studied, their co-evolutionary dynamics is not thoroughly understood. Here, by analyzing a wide range of genomes from diverse bacteria and archaea, we show that LCR and paralogy represent two distinct routes of evolution that are inversely correlated. The emergence of LCR is a prominent evolutionary mechanism in fast evolving, young protein families, whereas paralogy dominates the comparatively slow evolution of old protein families. The analysis of multiple prokaryotic genomes shows that the formation of LCR is likely a widespread, transient evolutionary mechanism that temporally and locally affects also ancestral functions, but apparently, fades away with time, under mutational and selective pressures, yielding to gene paralogy. We propose that compensatory relationships between short-term and longer-term evolutionary mechanisms are universal in the evolution of life.


Asunto(s)
Evolución Molecular , Células Procariotas , Filogenia , Bacterias/genética , Archaea/genética
5.
Biol Direct ; 17(1): 22, 2022 08 30.
Artículo en Inglés | MEDLINE | ID: mdl-36042479

RESUMEN

BACKGROUND: Evolutionary rate is a key characteristic of gene families that is linked to the functional importance of the respective genes as well as specific biological functions of the proteins they encode. Accurate estimation of evolutionary rates is a challenging task that requires precise phylogenetic analysis. Here we present an easy to estimate protein family level measure of sequence variability based on alignment column homogeneity in multiple alignments of protein sequences from Clade-Specific Clusters of Orthologous Genes (csCOGs). RESULTS: We report genome-wide estimates of variability for 8 diverse groups of bacteria and archaea and investigate the connection between variability and various genomic and biological features. The variability estimates are based on homogeneity distributions across amino acid sequence alignments and can be obtained for multiple groups of genomes at minimal computational expense. About half of the variance in variability values can be explained by the analyzed features, with the greatest contribution coming from the extent of gene paralogy in the given csCOG. The correlation between variability and paralogy appears to originate, primarily, not from gene duplication, but from acquisition of distant paralogs and xenologs, introducing sequence variants that are more divergent than those that could have evolved in situ during the lifetime of the given group of organisms. Both high-variability and low-variability csCOGs were identified in all functional categories, but as expected, proteins encoded by integrated mobile elements as well as proteins involved in defense functions and cell motility are, on average, more variable than proteins with housekeeping functions. Additionally, using linear discriminant analysis, we found that variability and fraction of genomes carrying a given gene are the two variables that provide the best prediction of gene essentiality as compared to the results of transposon mutagenesis in Sulfolobus islandicus. CONCLUSIONS: Variability, a measure of sequence diversity within an alignment relative to the overall diversity within a group of organisms, offers a convenient proxy for evolutionary rate estimates and is informative with respect to prediction of functional properties of proteins. In particular, variability is a strong predictor of gene essentiality for the respective organisms and indicative of sub- or neofunctionalization of paralogs.


Asunto(s)
Evolución Molecular , Células Procariotas , Duplicación de Gen , Filogenia , Proteínas , Alineación de Secuencia
6.
Biol Direct ; 17(1): 7, 2022 03 21.
Artículo en Inglés | MEDLINE | ID: mdl-35313954

RESUMEN

BACKGROUND: Bacteria and archaea produce an enormous diversity of modified peptides that are involved in various forms of inter-microbial conflicts or communication. A vast class of such peptides are Ribosomally synthesized, Postranslationally modified Peptides (RiPPs), and a major group of RiPPs are graspetides, so named after ATP-grasp ligases that catalyze the formation of lactam and lactone linkages in these peptides. The diversity of graspetides, the multiple proteins encoded in the respective Biosynthetic Gene Clusters (BGCs) and their evolution have not been studied in full detail. In this work, we attempt a comprehensive analysis of the graspetide-encoding BGCs and report a variety of novel graspetide groups as well as ancillary proteins implicated in graspetide biosynthesis and expression. RESULTS: We compiled a comprehensive, manually curated set of graspetides that includes 174 families including 115 new families with distinct patterns of amino acids implicated in macrocyclization and further modification, roughly tripling the known graspetide diversity. We derived signature motifs for the leader regions of graspetide precursors that could be used to facilitate graspetide prediction. Graspetide biosynthetic gene clusters and specific precursors were identified in bacterial divisions not previously known to encode RiPPs, in particular, the parasitic and symbiotic bacteria of the Candidate phyla radiation. We identified Bacteroides-specific biosynthetic gene clusters (BGC) that include remarkable diversity of graspetides encoded in the same loci which predicted to be modified by the same ATP-grasp ligase. We studied in details evolution of recently characterized chryseoviridin BGCs and showed that duplication and horizonal gene exchange both contribute to the diversification of the graspetides during evolution. CONCLUSIONS: We demonstrate previously unsuspected diversity of graspetide sequences, even those associated with closely related ATP-grasp enzymes. Several previously unnoticed families of proteins associated with graspetide biosynthetic gene clusters are identified. The results of this work substantially expand the known diversity of RiPPs and can be harnessed to further advance approaches for their identification.


Asunto(s)
Familia de Multigenes , Péptidos , Adenosina Trifosfato/química , Adenosina Trifosfato/metabolismo , Bacterias/genética , Péptidos/química , Filogenia , Procesamiento Proteico-Postraduccional
7.
Front Microbiol ; 12: 721392, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-34489912

RESUMEN

Molecular mechanisms involved in biological conflicts and self vs nonself recognition in archaea remain poorly characterized. We apply phylogenomic analysis to identify a hypervariable gene module that is widespread among Thermococcales. These loci consist of an upstream gene coding for a large protein containing several immunoglobulin (Ig) domains and unique combinations of downstream genes, some of which also contain Ig domains. In the large Ig domain containing protein, the C-terminal Ig domain sequence is hypervariable, apparently, as a result of recombination between genes from different Thermococcales. To reflect the hypervariability, we denote this gene module VARTIG (VARiable Thermococcales IG). The overall organization of the VARTIG modules is similar to the organization of Polymorphic Toxin Systems (PTS). Archaeal genomes outside Thermococcales encode a variety of Ig domain proteins, but no counterparts to VARTIG and no Ig domains with comparable levels of variability. The specific functions of VARTIG remain unknown but the identified features of this system imply three testable hypotheses: (i) involvement in inter-microbial conflicts analogous to PTS, (ii) role in innate immunity analogous to the vertebrate complement system, and (iii) function in self vs nonself discrimination analogous to the vertebrate Major Histocompatibility Complex. The latter two hypotheses seem to be of particular interest given the apparent analogy to the vertebrate immunity.

8.
mBio ; 10(3)2019 05 07.
Artículo en Inglés | MEDLINE | ID: mdl-31064832

RESUMEN

Numerous, diverse, highly variable defense and offense genetic systems are encoded in most bacterial genomes and are involved in various forms of conflict among competing microbes or their eukaryotic hosts. Here we focus on the offense and self-versus-nonself discrimination systems encoded by archaeal genomes that so far have remained largely uncharacterized and unannotated. Specifically, we analyze archaeal genomic loci encoding polymorphic and related toxin systems and ribosomally synthesized antimicrobial peptides. Using sensitive methods for sequence comparison and the "guilt by association" approach, we identified such systems in 141 archaeal genomes. These toxins can be classified into four major groups based on the structure of the components involved in the toxin delivery. The toxin domains are often shared between and within each system. We revisit halocin families and substantially expand the halocin C8 family, which was identified in diverse archaeal genomes and also certain bacteria. Finally, we employ features of protein sequences and genomic locus organization characteristic of archaeocins and polymorphic toxins to identify candidates for analogous but not necessarily homologous systems among uncharacterized protein families. This work confidently predicts that more than 1,600 archaeal proteins, currently annotated as "hypothetical" in public databases, are components of conflict and self-versus-nonself discrimination systems.IMPORTANCE Diverse and highly variable systems involved in biological conflicts and self-versus-nonself discrimination are ubiquitous in bacteria but much less studied in archaea. We performed comprehensive comparative genomic analyses of the archaeal systems that share components with analogous bacterial systems and propose an approach to identify new systems that could be involved in these functions. We predict polymorphic toxin systems in 141 archaeal genomes and identify new, archaea-specific toxin and immunity protein families. These systems are widely represented in archaea and are predicted to play major roles in interactions between species and in intermicrobial conflicts. This work is expected to stimulate experimental research to advance the understanding of poorly characterized major aspects of archaeal biology.


Asunto(s)
Péptidos Catiónicos Antimicrobianos/genética , Archaea/genética , Proteínas Arqueales/genética , Genoma Arqueal , Toxinas Biológicas/genética , Secuencia de Aminoácidos , Proteínas Bacterianas/genética , Evolución Molecular , Genoma Bacteriano , Genómica , Interacciones Microbianas
9.
FEMS Microbiol Lett ; 366(7)2019 04 01.
Artículo en Inglés | MEDLINE | ID: mdl-30993331

RESUMEN

Screening of genomic and metagenomic databases for new variants of CRISPR-Cas systems increasingly results in the discovery of derived variants that do not seem to possess the interference capacity and are implicated in functions distinct from adaptive immunity. We describe an extremely derived putative class 1 CRISPR-Cas system that is present in many Halobacteria and consists of distant homologs of the Cas5 and Cas7 protein along with an uncharacterized conserved protein and various nucleases. We hypothesize that, although this system lacks typical CRISPR effectors or a CRISPR array, it functions as a RNA-dependent defense mechanism that, unlike other derived CRISPR-Cas, utilizes alternative nucleases to cleave invader genomes.


Asunto(s)
Proteínas Arqueales/genética , Genoma Arqueal , Halobacteriaceae/genética , Proteínas Arqueales/metabolismo , Sistemas CRISPR-Cas , Repeticiones Palindrómicas Cortas Agrupadas y Regularmente Espaciadas , Halobacteriaceae/clasificación , Halobacteriaceae/metabolismo , Filogenia
10.
Nat Commun ; 7: 10147, 2016 Jan 07.
Artículo en Inglés | MEDLINE | ID: mdl-26738725

RESUMEN

Toxoplasma gondii is among the most prevalent parasites worldwide, infecting many wild and domestic animals and causing zoonotic infections in humans. T. gondii differs substantially in its broad distribution from closely related parasites that typically have narrow, specialized host ranges. To elucidate the genetic basis for these differences, we compared the genomes of 62 globally distributed T. gondii isolates to several closely related coccidian parasites. Our findings reveal that tandem amplification and diversification of secretory pathogenesis determinants is the primary feature that distinguishes the closely related genomes of these biologically diverse parasites. We further show that the unusual population structure of T. gondii is characterized by clade-specific inheritance of large conserved haploblocks that are significantly enriched in tandemly clustered secretory pathogenesis determinants. The shared inheritance of these conserved haploblocks, which show a different ancestry than the genome as a whole, may thus influence transmission, host range and pathogenicity.


Asunto(s)
Genoma de Protozoos , Toxoplasma/genética , Toxoplasma/patogenicidad , Secuencia Conservada , ADN Protozoario/genética , Regulación de la Expresión Génica/fisiología , Filogenia , Polimorfismo de Nucleótido Simple , Proteínas Protozoarias/genética , Proteínas Protozoarias/metabolismo , ARN Mensajero/genética , ARN Mensajero/metabolismo , Sintenía , Virulencia
11.
Nucleic Acids Res ; 43(Database issue): D1003-9, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25414324

RESUMEN

The Arabidopsis Information Portal (https://www.araport.org) is a new online resource for plant biology research. It houses the Arabidopsis thaliana genome sequence and associated annotation. It was conceived as a framework that allows the research community to develop and release 'modules' that integrate, analyze and visualize Arabidopsis data that may reside at remote sites. The current implementation provides an indexed database of core genomic information. These data are made available through feature-rich web applications that provide search, data mining, and genome browser functionality, and also by bulk download and web services. Araport uses software from the InterMine and JBrowse projects to expose curated data from TAIR, GO, BAR, EBI, UniProt, PubMed and EPIC CoGe. The site also hosts 'science apps,' developed as prototypes for community modules that use dynamic web pages to present data obtained on-demand from third-party servers via RESTful web services. Designed for sustainability, the Arabidopsis Information Portal strategy exploits existing scientific computing infrastructure, adopts a practical mixture of data integration technologies and encourages collaborative enhancement of the resource by its user community.


Asunto(s)
Arabidopsis/genética , Bases de Datos Genéticas , Genoma de Planta , Minería de Datos , Internet , Programas Informáticos
12.
Plant Cell Physiol ; 56(1): e1, 2015 Jan.
Artículo en Inglés | MEDLINE | ID: mdl-25432968

RESUMEN

Medicago truncatula, a close relative of alfalfa (Medicago sativa), is a model legume used for studying symbiotic nitrogen fixation, mycorrhizal interactions and legume genomics. J. Craig Venter Institute (JCVI; formerly TIGR) has been involved in M. truncatula genome sequencing and annotation since 2002 and has maintained a web-based resource providing data to the community for this entire period. The website (http://www.MedicagoGenome.org) has seen major updates in the past year, where it currently hosts the latest version of the genome (Mt4.0), associated data and legacy project information, presented to users via a rich set of open-source tools. A JBrowse-based genome browser interface exposes tracks for visualization. Mutant gene symbols originally assembled and curated by the Frugoli lab are now hosted at JCVI and tie into our community annotation interface, Medicago EuCAP (to be integrated soon with our implementation of WebApollo). Literature pertinent to M. truncatula is indexed and made searchable via the Textpresso search engine. The site also implements MedicMine, an instance of InterMine that offers interconnectivity with other plant 'mines' such as ThaleMine and PhytoMine, and other model organism databases (MODs). In addition to these new features, we continue to provide keyword- and locus identifier-based searches served via a Chado-backed Tripal Instance, a BLAST search interface and bulk downloads of data sets from the iPlant Data Store (iDS). Finally, we maintain an E-mail helpdesk, facilitated by a JIRA issue tracking system, where we receive and respond to questions about the website and requests for specific data sets from the community.


Asunto(s)
Biología Computacional , Bases de Datos Genéticas , Genoma de Planta/genética , Medicago truncatula/genética , Interfaz Usuario-Computador , Almacenamiento y Recuperación de la Información , Internet
13.
PLoS One ; 6(1): e15950, 2011 Jan 10.
Artículo en Inglés | MEDLINE | ID: mdl-21264340

RESUMEN

BACKGROUND: While the pneumococcal protein conjugate vaccines reduce the incidence in invasive pneumococcal disease (IPD), serotype replacement remains a major concern. Thus, serotype-independent protection with vaccines targeting virulence genes, such as PspA, have been pursued. PspA is comprised of diverse clades that arose through recombination. Therefore, multi-locus sequence typing (MLST)-defined clones could conceivably include strains from multiple PspA clades. As a result, a method is needed which can both monitor the long-term epidemiology of the pneumococcus among a large number of isolates, and analyze vaccine-candidate genes, such as pspA, for mutations and recombination events that could result in 'vaccine escape' strains. METHODOLOGY: We developed a resequencing array consisting of five conserved and six variable genes to characterize 72 pneumococcal strains. The phylogenetic analysis of the 11 concatenated genes was performed with the MrBayes program, the single nucleotide polymorphism (SNP) analysis with the DNA Sequence Polymorphism program (DnaSP), and the recombination event analysis with the recombination detection package (RDP). RESULTS: The phylogenetic analysis correlated with MLST, and identified clonal strains with unique PspA clades. The DnaSP analysis correlated with the serotype-specific diversity detected using MLST. Serotypes associated with more than one ST complex had a larger degree of sequence polymorphism than a serotype associated with one ST complex. The RDP analysis confirmed the high frequency of recombination events in the pspA gene. CONCLUSIONS: The phylogenetic tree correlated with MLST, and detected multiple PspA clades among clonal strains. The genetic diversity of the strains and the frequency of recombination events in the mosaic gene, pspA were accurately assessed using the DnaSP and RDP programs, respectively. These data provide proof-of-concept that resequencing arrays could play an important role within research and clinical laboratories in both monitoring the molecular epidemiology of the pneumococcus and detecting 'vaccine escape' strains among vaccine-candidate genes.


Asunto(s)
Evasión Inmune , Polimorfismo de Nucleótido Simple , Recombinación Genética , Análisis de Secuencia de ADN , Streptococcus pneumoniae/genética , Proteínas Bacterianas/genética , Proteínas de Choque Térmico/genética , Epidemiología Molecular , Filogenia , Programas Informáticos , Streptococcus pneumoniae/inmunología , Vacunas/farmacología
14.
BMC Microbiol ; 9: 213, 2009 Oct 07.
Artículo en Inglés | MEDLINE | ID: mdl-19811647

RESUMEN

BACKGROUND: A low genetic diversity in Francisella tularensis has been documented. Current DNA based genotyping methods for typing F. tularensis offer a limited and varying degree of subspecies, clade and strain level discrimination power. Whole genome sequencing is the most accurate and reliable method to identify, type and determine phylogenetic relationships among strains of a species. However, lower cost typing schemes are necessary in order to enable typing of hundreds or even thousands of isolates. RESULTS: We have generated a high-resolution phylogenetic tree from 40 Francisella isolates, including 13 F. tularensis subspecies holarctica (type B) strains, 26 F. tularensis subsp. tularensis (type A) strains and a single F. novicida strain. The tree was generated from global multi-strain single nucleotide polymorphism (SNP) data collected using a set of six Affymetrix GeneChip resequencing arrays with the non-repetitive portion of LVS (type B) as the reference sequence complemented with unique sequences of SCHU S4 (type A). Global SNP based phylogenetic clustering was able to resolve all non-related strains. The phylogenetic tree was used to guide the selection of informative SNPs specific to major nodes in the tree for development of a genotyping assay for identification of F. tularensis subspecies and clades. We designed and validated an assay that uses these SNPs to accurately genotype 39 additional F. tularensis strains as type A (A1, A2, A1a or A1b) or type B (B1 or B2). CONCLUSION: Whole-genome SNP based clustering was shown to accurately identify SNPs for differentiation of F. tularensis subspecies and clades, emphasizing the potential power and utility of this methodology for selecting SNPs for typing of F. tularensis to the strain level. Additionally, whole genome sequence based SNP information gained from a representative population of strains may be used to perform evolutionary or phylogenetic comparisons of strains, or selection of unique strains for whole-genome sequencing projects.


Asunto(s)
Hibridación Genómica Comparativa/métodos , Francisella tularensis/genética , Genoma Bacteriano , Filogenia , Polimorfismo de Nucleótido Simple , Técnicas de Tipificación Bacteriana , Análisis por Conglomerados , Biología Computacional , ADN Bacteriano/genética , Francisella tularensis/clasificación , Reacción en Cadena de la Polimerasa , Análisis de Secuencia de ADN
15.
Mol Biochem Parasitol ; 144(1): 1-9, 2005 Nov.
Artículo en Inglés | MEDLINE | ID: mdl-16085323

RESUMEN

Despite the significance of Plasmodium vivax as the most widespread human malaria parasite and a major public health problem, gene expression in this parasite is poorly understood. To accelerate gene discovery and facilitate the annotation phase of the P. vivax genome project, we have undertaken a transcriptome approach to study gene expression in the mixed blood stages of a P. vivax field isolate. Using a cDNA library constructed from purified blood stages, we have obtained single-pass sequences for approximately 21,500 expressed sequence tags (ESTs), the largest number of transcript tags obtained so far for this species. Cluster analysis revealed that the library is highly redundant, resulting in 5407 clusters. Clustered ESTs were searched against public protein databases for functional annotation, and more than one-third showed a significant match, the majority of these to Plasmodium falciparum proteins. The most abundant clusters were to genes encoding ribosomal proteins and proteins involved in metabolism, consistent with the predominance of trophozoites in the field isolate sample. In spite of the scarcity of other parasite stages in the field isolate, we could identify genes that are expressed in rings, schizonts and gametocytes. This study should facilitate our understanding of the gene expression in P. vivax asexual stages and provide valuable data for gene prediction and annotation of the P. vivax genome sequence.


Asunto(s)
Genes Protozoarios , Plasmodium vivax/genética , Animales , ADN Complementario/genética , ADN Protozoario/genética , Etiquetas de Secuencia Expresada , Biblioteca de Genes , Humanos , Malaria Vivax/parasitología , Datos de Secuencia Molecular
16.
Bioinformatics ; 21(15): 3324-6, 2005 Aug 01.
Artículo en Inglés | MEDLINE | ID: mdl-15919728

RESUMEN

UNLABELLED: MeSHer uses a simple statistical approach to identify biological concepts in the form of Medical Subject Headings (MeSH terms) obtained from the PubMed database that are significantly overrepresented within the identified gene set relative to those associated with the overall collection of genes on the underlying DNA microarray platform. As a demonstration, we apply this approach to gene lists acquired from a published study of the effects of angiotensin II (Ang II) treatment on cardiac gene expression and demonstrate that this approach can aid in the interpretation of the resulting 'significant' gene set. AVAILABILITY: The software is available at http://www.tm4.org. SUPPLEMENTARY INFORMATION: Results from the analysis of significant genes from the published Ang II study.


Asunto(s)
Sistemas de Administración de Bases de Datos , Perfilación de la Expresión Génica/métodos , Almacenamiento y Recuperación de la Información/métodos , Medical Subject Headings , Procesamiento de Lenguaje Natural , Análisis de Secuencia por Matrices de Oligonucleótidos/métodos , PubMed , Programas Informáticos , Inteligencia Artificial , Biología/métodos , Modelos Genéticos , Modelos Estadísticos , Mapeo de Interacción de Proteínas/métodos , Vocabulario Controlado
17.
Bioinformatics ; 19(5): 651-2, 2003 Mar 22.
Artículo en Inglés | MEDLINE | ID: mdl-12651724

RESUMEN

TGICL is a pipeline for analysis of large Expressed Sequence Tags (EST) and mRNA databases in which the sequences are first clustered based on pairwise sequence similarity, and then assembled by individual clusters (optionally with quality values) to produce longer, more complete consensus sequences. The system can run on multi-CPU architectures including SMP and PVM.


Asunto(s)
Sistemas de Administración de Bases de Datos , Bases de Datos de Ácidos Nucleicos , Etiquetas de Secuencia Expresada , Perfilación de la Expresión Génica/métodos , Almacenamiento y Recuperación de la Información/métodos , Alineación de Secuencia/métodos , Análisis de Secuencia de ADN/métodos , Análisis por Conglomerados , Regulación de la Expresión Génica/genética , Homología de Secuencia , Programas Informáticos
18.
Plant Physiol ; 131(2): 419-29, 2003 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-12586867

RESUMEN

The cultivated potato (Solanum tuberosum) shares similar biology with other members of the Solanaceae, yet has features unique within the family, such as modified stems (stolons) that develop into edible tubers. To better understand potato biology, we have undertaken a survey of the potato transcriptome using expressed sequence tags (ESTs) from diverse tissues. A total of 61,940 ESTs were generated from aerial tissues, below-ground tissues, and tissues challenged with the late-blight pathogen (Phytophthora infestans). Clustering and assembly of these ESTs resulted in a total of 19,892 unique sequences with 8,741 tentative consensus sequences and 11,151 singleton ESTs. We were able to identify a putative function for 43.7% of these sequences. A number of sequences (48) were expressed throughout the libraries sampled, representing constitutively expressed sequences. Other sequences (13,068, 21%) were uniquely expressed and were detected only in a single library. Using hierarchal and k means clustering of the EST sequences, we were able to correlate changes in gene expression with major physiological events in potato biology. Using pair-wise comparisons of tuber-related tissues, we were able to associate genes with tuber initiation, dormancy, and sprouting. We also were able to identify a number of characterized as well as novel sequences that were unique to the incompatible interaction of late-blight pathogen, thereby providing a foundation for further understanding the mechanism of resistance.


Asunto(s)
Etiquetas de Secuencia Expresada , Solanum tuberosum/genética , Análisis por Conglomerados , Regulación de la Expresión Génica de las Plantas , Biblioteca de Genes , Inmunidad Innata/genética , Solanum lycopersicum/genética , Phytophthora/crecimiento & desarrollo , Enfermedades de las Plantas/genética , Enfermedades de las Plantas/microbiología , Tallos de la Planta/genética , Tallos de la Planta/crecimiento & desarrollo , Solanum tuberosum/crecimiento & desarrollo , Solanum tuberosum/microbiología
19.
Genome Res ; 12(3): 493-502, 2002 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-11875039

RESUMEN

Comparative genomics promises to rapidly accelerate the identification and functional classification of biologically important human genes. We developed the TIGR Orthologous Gene Alignment (TOGA; ) database to provide a cross-reference between fully and partially sequenced eukaryotic transcribed sequences. Starting with the assembled expressed sequence tag (EST) and gene sequences that comprise the 28 TIGR Gene Indices, we used high-stringency pair-wise sequence searches and a reflexive, transitive closure process to associate sequence-specific best hits, generating 32,652 tentative ortholog groups (TOGs). This has allowed us to identify putative orthologs and paralogs for known genes, as well as those that exist only as uncharacterized ESTs and to provide links to additional information including genome sequence and mapping data. TOGA provides an important new resource for the analysis of gene function in eukaryotes. In addition, an analysis of the most widely represented sequences can begin to provide insight into eukaryotic biological processes.


Asunto(s)
Células Eucariotas , Genes/genética , Alineación de Secuencia/métodos , Algoritmos , Animales , Bovinos , Biología Computacional/métodos , Secuencia de Consenso/genética , Bases de Datos Genéticas , Células Eucariotas/química , Células Eucariotas/metabolismo , Genoma Humano , Humanos , Ratones , Filogenia , Ratas , Homología de Secuencia de Ácido Nucleico
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...