Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros








Base de dados
Intervalo de ano de publicação
1.
Comput Math Methods Med ; 2021: 9969751, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34122622

RESUMO

Genomic islands are related to microbial adaptation and carry different genomic characteristics from the host. Therefore, many methods have been proposed to detect genomic islands from the rest of the genome by evaluating its sequence composition. Many sequence features have been proposed, but many of them have not been applied to the identification of genomic islands. In this paper, we present a scheme to predict genomic islands using the chi-square test and random forest algorithm. We extract seven kinds of sequence features and select the important features with the chi-square test. All the selected features are then input into the random forest to predict the genome islands. Three experiments and comparison show that the proposed method achieves the best performance. This understanding can be useful to design more powerful method for the genomic island prediction.


Assuntos
Ilhas Genômicas , Genômica/métodos , Algoritmos , Distribuição de Qui-Quadrado , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Genética Microbiana/métodos , Genética Microbiana/estatística & dados numéricos , Genoma Bacteriano , Genômica/estatística & dados numéricos , Modelos Genéticos
2.
PLoS Comput Biol ; 17(5): e1008920, 2021 05.
Artigo em Inglês | MEDLINE | ID: mdl-33945539

RESUMO

Specialised metabolites from microbial sources are well-known for their wide range of biomedical applications, particularly as antibiotics. When mining paired genomic and metabolomic data sets for novel specialised metabolites, establishing links between Biosynthetic Gene Clusters (BGCs) and metabolites represents a promising way of finding such novel chemistry. However, due to the lack of detailed biosynthetic knowledge for the majority of predicted BGCs, and the large number of possible combinations, this is not a simple task. This problem is becoming ever more pressing with the increased availability of paired omics data sets. Current tools are not effective at identifying valid links automatically, and manual verification is a considerable bottleneck in natural product research. We demonstrate that using multiple link-scoring functions together makes it easier to prioritise true links relative to others. Based on standardising a commonly used score, we introduce a new, more effective score, and introduce a novel score using an Input-Output Kernel Regression approach. Finally, we present NPLinker, a software framework to link genomic and metabolomic data. Results are verified using publicly available data sets that include validated links.


Assuntos
Genética Microbiana/estatística & dados numéricos , Genômica/estatística & dados numéricos , Metabolômica/estatística & dados numéricos , Software , Vias Biossintéticas/genética , Biologia Computacional , Mineração de Dados , Bases de Dados Factuais , Bases de Dados Genéticas , Genoma Microbiano , Fenômenos Microbiológicos , Família Multigênica , Análise de Regressão
3.
J Bioinform Comput Biol ; 8 Suppl 1: 17-32, 2010 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21155017

RESUMO

Biomolecule sequences and structures of land, air and water species are determined rapidly and the data entries are unevenly distributed for different organisms. It frequently leads to the BLAST results of homologous search containing undesirable entries from organisms living in different environments. To reduce irrelevant searching results, a separate database for comparative genomics is urgently required. A comprehensive bioinformatics tool set and an integrated database, named Bioinformatics tools for Marine and Freshwater Genomics (BiMFG), are constructed for comparative analyses among model species and underwater species. Novel matching techniques based on conserved motifs and/or secondary structure elements are designed for efficiently and effectively retrieving and aligning remote sequences through cross-species comparisons. It is especially helpful when sequences under analysis possess low similarities and unresolved structural information. In addition, the system provides core techniques of multiple sequence alignment, multiple second structure profile alignment and iteratively refined multiple structural alignments for biodiversity analysis and verification in marine and freshwater biology. The BiMFG web server is freely available for use at http://bimfg.cs.ntou.edu.tw/.


Assuntos
Biologia Computacional , Bases de Dados Genéticas , Biologia Marinha/estatística & dados numéricos , Algoritmos , Animais , Peixes/genética , Água Doce/microbiologia , Genética Microbiana/estatística & dados numéricos , Humanos , Internet , Invertebrados/genética , Modelos Moleculares , Estrutura Secundária de Proteína , Água do Mar/microbiologia , Alinhamento de Sequência/estatística & dados numéricos , Homologia Estrutural de Proteína
4.
Methods Mol Biol ; 551: 287-304, 2009.
Artigo em Inglês | MEDLINE | ID: mdl-19521881

RESUMO

The molecular epidemiology of infectious diseases uses a variety of techniques to assay the relatedness of disease-causing organisms to identify strains responsible for outbreaks or associated with particular phenotypes of interest (such as antibiotic resistance) and, it is hoped, provide insights into where and how these strains have emerged. The correct analysis of such data requires that we understand how the assayed variation accumulates. We discuss this with specific reference to three classes of methods: those based on gel electrophoresis of fragments generated by restriction enzymes or polymerase chain reaction (PCR), those based on microsatellites and other repeat elements, and raw sequence data from protein-coding genes. We also provide a simple example of how the likely origin of an apparently novel antibiotic-resistant strain may be identified and conclude with a discussion of some popular analysis packages and the more interesting prospects for the future in this rapidly developing field.


Assuntos
Doenças Transmissíveis/epidemiologia , Doenças Transmissíveis/microbiologia , Interpretação Estatística de Dados , Epidemiologia Molecular/métodos , Epidemiologia Molecular/estatística & dados numéricos , Análise por Conglomerados , DNA/genética , DNA/isolamento & purificação , Genética Microbiana/métodos , Genética Microbiana/estatística & dados numéricos , Humanos , Polimorfismo de Nucleotídeo Único , Sequências Repetitivas de Ácido Nucleico , Análise de Sequência/métodos , Análise de Sequência/estatística & dados numéricos , Software
5.
J Bioinform Comput Biol ; 7(3): 455-71, 2009 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-19507285

RESUMO

Metagenomics is an emerging methodology for the direct genomic analysis of a mixed community of uncultured microorganisms. The current analyses of metagenomics data largely rely on the computational tools originally designed for microbial genomics projects. The challenge of assembling metagenomic sequences arises mainly from the short reads and the high species complexity of the community. Alternatively, individual (short) reads will be searched directly against databases of known genes (or proteins) to identify homologous sequences. The latter approach may have low sensitivity and specificity in identifying homologous sequences, which may further bias the subsequent diversity analysis. In this paper, we present a novel approach to metagenomic data analysis, called Metagenomic ORFome Assembly (MetaORFA). The whole computational framework consists of three steps. Each read from a metagenomics project will first be annotated with putative open reading frames (ORFs) that likely encode proteins. Next, the predicted ORFs are assembled into a collection of peptides using an EULER assembly method. Finally, the assembled peptides (i.e. ORFome) are used for database searching of homologs and subsequent diversity analysis. We applied MetaORFA approach to several metagenomics datasets with low coverage short reads. The results show that MetaORFA can produce long peptides even when the sequence coverage of reads is extremely low. Hence, the ORFome assembly significantly increases the sensitivity of homology searching, and may potentially improve the diversity analysis of the metagenomic data. This improvement is especially useful for metagenomic projects when the genome assembly does not work because of the low sequence coverage.


Assuntos
Genética Microbiana/estatística & dados numéricos , Fases de Leitura Aberta , Análise de Sequência/estatística & dados numéricos , Algoritmos , Sequência de Aminoácidos , Biologia Computacional , Bases de Dados Genéticas , Genômica/estatística & dados numéricos , Dados de Sequência Molecular , Polimorfismo Genético , Água do Mar/virologia , Alinhamento de Sequência/estatística & dados numéricos , Análise de Sequência de Proteína/estatística & dados numéricos , Proteínas Virais/genética
6.
J Bioinform Comput Biol ; 6(6): 1193-211, 2008 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-19090024

RESUMO

Short-insert shotgun sequencing approaches have been applied in recent years to environmental genomic libraries. In the case of complex multispecies microbial communities, there can be many sequence reads that are not incorporated into assemblies, and thus need to be annotated and accessible as single reads. Most existing annotation systems and genome databases accommodate assembled genomes containing contiguous gene-encoding sequences. Thus, a solution is required that can work effectively with environmental genomic annotation information to facilitate data analysis. The Environmental Genome Informational Utility System (EnGenIUS) is a comprehensive environmental genome (metagenome) research toolset that was specifically designed to accommodate the needs of large (> 250 K sequence reads) environmental genome sequencing efforts. The core EnGenIUS modules consist of a set of UNIX scripts and PHP programs used for data preprocessing, an annotation pipeline with accompanying analysis tools, two entity relational databases, and a graphical user interface. The annotation pipeline has a modular structure and can be customized to best fit input data set properties. The integrated entity relational databases store raw data and annotation analysis results. Access to the underlying databases and services is facilitated through a web-based graphical user interface. Users have the ability to browse, upload, download, and analyze preprocessed data, based on diverse search criteria. The EnGenIUS toolset was successfully tested using the Alvinella pompejana epibiont environmental genome data set, which comprises more than 300 K sequence reads. A fully browsable EnGenIUS portal is available at (http://ocean.dbi.udel.edu/) (access code: "guest"). The scope of this paper covers the implementation details and technical aspects of the EnGenIUS toolset.


Assuntos
Microbiologia Ambiental , Genética Microbiana/estatística & dados numéricos , Software , Biologia Computacional , Bases de Dados Genéticas/estatística & dados numéricos , Biblioteca Genômica , Interface Usuário-Computador
7.
Math Biosci ; 215(1): 48-54, 2008 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-18590919

RESUMO

Current knowledge of microbial mutation rates was accumulated largely by means of fluctuation experiments. A mathematical model describing the cell dynamics in a fluctuation experiment is indispensable to the estimation of mutation rates through fluctuation experiments. In almost six decades the model formulated by Lea and Coulson dominated in research and application, although the model formulated by Bartlett is generally believed to describe the cell dynamics more faithfully. The neglect of the Bartlett formulation was mainly due to mathematical difficulties. The present investigation overcomes some of these difficulties, thereby paving the way for the application of the Bartlett formulation in estimating mutation rates. Specifically, the article offers an algorithm for computing the distribution function of the number of mutants under the Bartlett formulation. The article also provides algorithms for computing point and interval estimates of mutation rates that are based on the maximum-likelihood principle. In addition, the article examines and compares the asymptotic behavior of the distributions of the number of mutants under the two formulations.


Assuntos
Modelos Genéticos , Mutação , Algoritmos , Genética Microbiana/estatística & dados numéricos , Matemática
8.
Int J Comput Biol Drug Des ; 1(1): 26-38, 2008.
Artigo em Inglês | MEDLINE | ID: mdl-20054999

RESUMO

In the context of new metabolic pathways discovery, a full backtranslation of oligopeptides can be a promising approach. When studying complex environments where the composing microorganisms are unknown it is also preferable to have all the complete nucleic sequences corresponding to an enzyme of interest. In this paper, we revisit the existing bioinformatics applications, which bring partial reverse translation solutions, and we compare two algorithms based on oligopeptide degeneracy able to efficiently compute a complete backtranslation of oligopeptides. Such algorithms are precious for the discovery of new organisms and we show their performances on simulated and real biological data sets.


Assuntos
Algoritmos , Oligopeptídeos/química , Oligopeptídeos/genética , Sequência de Aminoácidos , Sequência de Bases , Biologia Computacional , Simulação por Computador , DNA/genética , Bases de Dados de Proteínas , Genética Microbiana/estatística & dados numéricos , Modelos Genéticos , Biossíntese de Proteínas
9.
J Bioinform Comput Biol ; 5(4): 937-61, 2007 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-17787064

RESUMO

We study the problem of selecting control clones in DNA array hybridization experiments. The problem arises in the OFRG method for analyzing microbial communities. The OFRG method performs classification of rRNA gene clones using binary fingerprints created from a series of hybridization experiments, where each experiment consists of hybridizing a collection of arrayed clones with a single oligonucleotide probe. This experiment produces analog signals, one for each clone, which then need to be classified, that is, converted into binary values 1 and 0 that represent hybridization and non-hybridization events. In addition to the sample rRNA gene clones, the array contains a number of control clones needed to calibrate the classification procedure of the hybridization signals. These control clones must be selected with care to optimize the classification process. We formulate this as a combinatorial optimization problem called Balanced Covering. We prove that the problem is NP-hard, and we show some results on hardness of approximation. We propose approximation algorithms based on randomized rounding, and we show that, with high probability, our algorithms approximate well the optimum solution. The experimental results confirm that the algorithms find high quality control clones. The algorithms have been implemented and are publicly available as part of the software package called CloneTools.


Assuntos
Grupos Controle , Análise Numérica Assistida por Computador , Análise de Sequência com Séries de Oligonucleotídeos/métodos , Sondas RNA , Algoritmos , Análise por Conglomerados , Perfilação da Expressão Gênica/métodos , Perfilação da Expressão Gênica/estatística & dados numéricos , Genética Microbiana/métodos , Genética Microbiana/estatística & dados numéricos , Modelos Estatísticos , Hibridização de Ácido Nucleico , Reconhecimento Automatizado de Padrão/métodos , Sondas RNA/análise , Sondas RNA/normas , Reprodutibilidade dos Testes , Tamanho da Amostra
10.
In Silico Biol ; 6(4): 281-306, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16922692

RESUMO

In order to identify and to characterise gene clusters conserved in microbial genomes, the algorithm AMIGOS was developed. It is based on a categorisation of genes using a predefined set of gene functions (GFs). After the categorisation of all genes of a genome and based on their location on a replicon, distances between GFs were determined and stored in genome-specific matrices. These matrices were used to identify GF clusters like those strictly conserved in 13 archaeal, in 47 bacterial genomes and in the combination of the sets. Within the combined set of these 60 microbial genomes, there exist only two strictly conserved clusters harbouring two ribosomal genes each, namely those for L4, L23 and L22, L29. In order to characterise less strictly conserved GF clusters, content of genomes i.e. matrices were analysed pairwise. Resulting clusters were merged to (meta-) clusters if their content overlapped. A scoring system named cons(CL) was developed. It quantifies conservedness of cluster membership for individual GFs. For the genome of Escherichia coli it was shown that a grouping of cluster elements on cons(CL) values dissected the clusters into smaller sets. These sets were frequently overlapped by known transcriptional units (TUs). This finding justifies the usage of cons(CL) scores to predict TU membership of genes. In addition, cons(CL) values provide a sound basis for non-homologous gene annotation. Based on cons(CL) values, examples of conserved clusters containing annotated genes and single ones with unknown function are given.


Assuntos
Algoritmos , Genética Microbiana/métodos , Genômica/métodos , Classificação , Análise por Conglomerados , Sequência Conservada , Bases de Dados Genéticas , Escherichia coli/genética , Genética Microbiana/estatística & dados numéricos , Genoma Arqueal , Genoma Bacteriano , Genômica/estatística & dados numéricos , Família Multigênica , Óperon
11.
Biometrics ; 58(2): 378-86, 2002 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-12071411

RESUMO

In order to understand the relevance of microbial communities on crop productivity, the identification and characterization of the rhizosphere soil microbial community is necessary. Characteristic profiles of the microbial communities are obtained by denaturing gradient gel electrophoresis (DGGE) of polymerase chain reaction (PCR) amplified 16S rDNA from soil extracted DNA. These characteristic profiles, commonly called community DNA fingerprints, can be represented in the form of high-dimensional binary vectors. We address the problem of modeling and variable selection in high-dimensional multivariate binary data and present an application of our methodology in the context of a controlled agricultural experiment.


Assuntos
Impressões Digitais de DNA/estatística & dados numéricos , Genética Microbiana/estatística & dados numéricos , Agricultura , Biometria , Interpretação Estatística de Dados , Ecossistema , Modelos Estatísticos , Análise Multivariada , Plantas Comestíveis/crescimento & desenvolvimento , Microbiologia do Solo
12.
Bioinformatics ; 17 Suppl 1: S39-48, 2001.
Artigo em Inglês | MEDLINE | ID: mdl-11472991

RESUMO

We propose two efficient heuristics for minimizing the number of oligonucleotide probes needed for analyzing populations of ribosomal RNA gene (rDNA) clones by hybridization experiments on DNA microarrays. Such analyses have applications in the study of microbial communities. Unlike in the classical SBH (sequencing by hybridization) procedure, where multiple probes are on a DNA chip, in our applications we perform a series of experiments, each one consisting of applying a single probe to a DNA microarray containing a large sample of rDNA sequences from the studied population. The overall cost of the analysis is thus roughly proportional to the number of experiments, underscoring the need for minimizing the number of probes. Our algorithms are based on two well-known optimization techniques, i.e. simulated annealing and Lagrangian relaxation, and our preliminary tests demonstrate that both algorithms are able to find satisfactory probe sets for real rDNA data.


Assuntos
Algoritmos , Genética Microbiana/estatística & dados numéricos , Sondas de Oligonucleotídeos/genética , Biologia Computacional , Impressões Digitais de DNA/estatística & dados numéricos , DNA Ribossômico/genética , Técnicas de Sonda Molecular/estatística & dados numéricos , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA