RESUMO
Search engines and retrieval systems are popular tools at a life science desktop. The manual inspection of hundreds of database entries, that reflect a life science concept or fact, is a time intensive daily work. Hereby, not the number of query results matters, but the relevance does. In this paper, we present the LAILAPS search engine for life science databases. The concept is to combine a novel feature model for relevance ranking, a machine learning approach to model user relevance profiles, ranking improvement by user feedback tracking and an intuitive and slim web user interface, that estimates relevance rank by tracking user interactions. Queries are formulated as simple keyword lists and will be expanded by synonyms. Supporting a flexible text index and a simple data import format, LAILAPS can easily be used both as search engine for comprehensive integrated life science databases and for small in-house project databases. With a set of features, extracted from each database hit in combination with user relevance preferences, a neural network predicts user specific relevance scores. Using expert knowledge as training data for a predefined neural network or using users own relevance training sets, a reliable relevance ranking of database hits has been implemented. In this paper, we present the LAILAPS system, the concepts, benchmarks and use cases. LAILAPS is public available for SWISSPROT data at http://lailaps.ipk-gatersleben.de.
Assuntos
Biologia Computacional/métodos , Bases de Dados Factuais , Ferramenta de Busca/métodos , Software , Armazenamento e Recuperação da Informação , Interface Usuário-ComputadorRESUMO
The metabolically versatile Gram-negative bacterium Pseudomonas aeruginosa inhabits terrestrial, aquatic, animal-, human-, and plant-host-associated environments and is an important causative agent of nosocomial infections, particularly in intensive-care units. The population genetics of P. aeruginosa was investigated by an approach that is generally applicable to the rapid, robust, and informative genotyping of bacteria. DNA, amplified from the bacterial colony by circles of multiplex primer extension, is hybridized onto a microarray to yield an electronically portable binary multimarker genotype that represents the core genome by single nucleotide polymorphisms and the accessory genome by markers of genomic islets and islands. The 240 typed P. aeruginosa strains of diverse habitats and geographic origin segregated into two large nonoverlapping clusters and 45 isolated clonal complexes with few or no partners. The majority of strains belonged to few dominant clones widespread in disease and environmental habitats. The most frequent genotype was represented by the sequenced strain PA14. Core and accessory genome were found to be nonrandomly assembled in P. aeruginosa. Individual clones preferred a specific repertoire of accessory segments. Even the most promiscuous genomic island, pKLC102, had integrated preferentially into a subset of clones. Moreover, some physically distant loci of the core genome, including oriC, showed nonrandom associations of genotypes, whereas other segments in between were freely recombining. Thus, the P. aeruginosa genome is made up of clone-typical segments in core and accessory genome and of blocks in the core with unrestricted gene flow in the population.
Assuntos
Pseudomonas aeruginosa/classificação , ADP Ribose Transferases/genética , Alelos , Proteínas de Bactérias/genética , Toxinas Bacterianas/genética , Técnicas de Tipagem Bacteriana , Genoma Bacteriano , Genótipo , Análise de Sequência com Séries de Oligonucleotídeos , Pseudomonas aeruginosa/genética , Pseudomonas aeruginosa/patogenicidadeRESUMO
The metabolically versatile soil bacterium Pseudomonas putida has to cope with numerous abiotic stresses in its habitats. The stress responses of P. putida KT2440 to 4 degrees C, pH 4.5, 0.8 M urea, and 45 mM sodium benzoate were analyzed by determining the global mRNA expression profiles and screening for stress-intolerant nonauxotrophic Tn5 transposon mutants. In 392 regulated genes or operons, 36 gene regions were differentially expressed by more than 2.5-fold, and 32 genes in 23 operons were found to be indispensable for growth during exposure to one of the abiotic stresses. The transcriptomes of the responses to urea, benzoate, and 4 degrees C correlated positively with each other but negatively with the transcriptome of the mineral acid response. The CbrAB sensor kinase, the cysteine synthase CysM, PcnB and VacB, which control mRNA stability, and BipA, which exerts transcript-specific translational control, were essential to cope with cold stress. The cyo operon was required to cope with acid stress. A functional PhoP, PtsP, RelA/SpoT modulon, and adhesion protein LapA were necessary for growth in the presence of urea, and the outer membrane proteins OmlA and FepA and the phosphate transporter PstBACS were indispensable for growth in the presence of benzoate. A lipid A acyltransferase (PP0063) was a mandatory component of the stress responses to cold, mineral acid, and benzoate. Adaptation of the membrane barrier, uptake of phosphate, maintenance of the intracellular pH and redox status, and translational control of metabolism are key mechanisms of the response of P. putida to abiotic stresses.
Assuntos
Genoma Bacteriano , Pseudomonas putida/genética , Proteínas da Membrana Bacteriana Externa/genética , Proteínas da Membrana Bacteriana Externa/metabolismo , Mapeamento Cromossômico , Cromossomos Bacterianos , Perfilação da Expressão Gênica , Genômica/métodos , Óperon , RNA Bacteriano/genética , RNA Mensageiro/genética , Transcrição GênicaRESUMO
We have created an analysis pipeline called Sprockets, which can be used to classify proteins into various hierarchical "families", and build searchable models of these families. The construction of these families is based on data from Expressed Sequence Tags (ESTs) and Coding DNA Sequences (CDSs), making Sprockets clusters especially suitable for studying gene families in organisms for which the completely sequenced genome does not (yet) exist. The pipeline consists of two main parts: pair-wise analysis and grouping of sequences with Z-score statistics, followed by hierarchical splitting of clusters into alignable protein families. Various computational and statistical techniques applied in Sprockets allow it to act like a massive and selective multiple sequence alignment engine for combining individual sequence collections and related public sequences. The end result is a database of gene Hidden Markov Models, each related to the other by three levels of similarity: secondary structure, function and evolutionary origin. For a sample 20,000 EST set from Lactuca spp., Sprockets provided a 9% improvement in mapping of function to unknown sequences over traditional pair-wise search methods and InterPro mapping.
RESUMO
The compositional bias of the G+C, di- and tetranucleotide contents in the 6 181 862 bp Pseudomonas putida KT2440 genome was analysed in sliding windows of 4000 bp in steps of 1000 bp. The genome has a low GC skew (mean 0.066) between the leading and lagging strand. The values of GC contents (mean 61.6%) and of dinucleotide relative abundance exhibit skewed Gaussian distributions. The variance of tetranucleotide frequencies, which increases linearly with increasing GC content, shows two overlapping Gaussian distributions of genome sections with low (minor fraction) or high variance (major fraction). Eighty per cent of the chromosome shares similar GC contents and oligonucleotide bias, but 105 islands of 4000 bp or more show atypical GC contents and/or oligonucleotide signature. Almost all islands provide added value to the metabolic proficiency of P. putida as a saprophytic omnivore. Major features are the uptake and degradation of organic chemicals, ion transport and the synthesis and secretion of secondary metabolites. Other islands endow P. putida with determinants of resistance and defenceor with constituents and appendages of the cell wall. A total of 29 islands carry the signature of mobile elements such as phage, transposons, insertion sequence (IS) elements and group II introns, indicating recent acquisition by horizontal gene transfer. The largest gene carries the most unusual sequence that encodes a multirepeat threonine-rich surface adhesion protein. Among the housekeeping genes, only genes of the translational apparatus were located in segments with an atypical signature, suggesting that the synthesis of ribosomal proteins is uncoupled from the rapidly changing translational demands of the cell by the separate utilization of tRNA pools.
Assuntos
DNA Bacteriano/genética , Genoma Bacteriano , Pseudomonas putida/genética , Composição de Bases , Cromossomos Bacterianos/genética , Elementos de DNA Transponíveis/genética , DNA Bacteriano/análise , Sequência Rica em GC , Genes Bacterianos/genética , Repetições de Microssatélites/genética , Fases de Leitura Aberta/genéticaRESUMO
The genome rearrangements in sequential Pseudomonas aeruginosa clone K isolates from the airways of a patient with cystic fibrosis were determined by an integrated approach of mapping, sequencing and bioinformatics. Restriction mapping uncovered an 8.9 kb deletion of PAO sequence between phnAB and oprL in clone K, and two 106 kb insertions either adjacent to this deletion or several hundred kilobases away, close to the pilA locus. These 106 kb blocks of extra DNA also co-existed as the circular plasmid pKLK106 in several clone K isolates and were found to be closely related to plasmid pKLC102 in P. aeruginosa clone C isolates. The breakpoints of the deletion in clone K and the attB-attP sequences for the reversible integration of the plasmid in clones C and K were located within the 3' end of the lysine tRNA structural genes (att site). pKLK106 sequentially recombined with either of the two tRNA(Lys) genes in clone K isolates. The att site of the pilA hypervariable region has been utilized by clone C to target its plasmid pKLC102 into the chromosome; the att site of the phnAB-oprL region has been employed by strain PAO to incorporate a DNA block encoding pyocin, transposases and IS elements. The use of typical phage attachment sites by conjugative genetic elements could be one of the major mechanisms used by P. aeruginosa to generate the mosaic genome structure of blocks of species-, clone- and strain-specific DNA. The example described here demonstrates the potential impact of systematic genome analysis of sequential isolates from the same habitat on our understanding of the evolution of microbial genomes.
Assuntos
Brônquios/microbiologia , Evolução Molecular , Genoma Bacteriano , Plasmídeos/genética , Pseudomonas aeruginosa/genética , Recombinação Genética/genética , Sequência de Bases , Southern Blotting/métodos , Fibrose Cística/microbiologia , Eletroforese em Gel de Campo Pulsado/métodos , Humanos , Dados de Sequência Molecular , Infecções por Pseudomonas/microbiologia , Pseudomonas aeruginosa/isolamento & purificação , RNA Bacteriano/genética , RNA de Transferência de Lisina/genética , Mapeamento por Restrição/métodos , Análise de Sequência de DNARESUMO
As part of a collaborative project aimed at sequencing and functionally analysing the entire genome of Pseudomonas putida strain KT2440, a physical clone map was produced as an initial resource. To this end, a high-coverage cosmid library was arrayed and ordered by clone hybridizations. Restriction fragments generated by rare-cutting enzymes and plasmids containing the rrn operon and 23S rDNA of Pseudomonas aeruginosa were used as probes and, parts of the cosmids were end-sequenced. This provided the information necessary for merging and comparing the macro-restriction map, cosmid clone order and sequence information, thereby assuring co-linearity of the eventual sequence assembly with the actual genome. A tiling path of clones was selected, from the shotgun clones used for sequencing, for the production of DNA microarrays that represent the entire genome including its non-coding portions.