Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
BMC Bioinformatics ; 11 Suppl 1: S16, 2010 Jan 18.
Artigo em Inglês | MEDLINE | ID: mdl-20122187

RESUMO

BACKGROUND: The classification of protein sequences using string algorithms provides valuable insights for protein function prediction. Several methods, based on a variety of different patterns, have been previously proposed. Almost all string-based approaches discover patterns that are not "independent, " and therefore the associated scores overcount, a multiple number of times, the contribution of patterns that cover the same region of a sequence. RESULTS: In this paper we use a class of patterns, called irredundant, that is specifically designed to address this issue. Loosely speaking the set of irredundant patterns is the smallest class of "independent" patterns that can describe all common patterns in two sequences, thus they avoid overcounting. We present a novel discriminative method, called Irredundant Class, based on the statistics of irredundant patterns combined with the power of support vector machines. CONCLUSION: Tests on benchmark data show that Irredundant Class outperforms most of the string algorithms previously proposed, and it achieves results as good as current state-of-the-art methods. Moreover the footprints of the most discriminative irredundant patterns can be used to guide the identification of functional regions in protein sequences.


Assuntos
Algoritmos , Proteínas/química , Proteínas/classificação , Análise de Sequência de Proteína/métodos , Homologia de Sequência de Aminoácidos
2.
Microbiome ; 6(1): 213, 2018 11 29.
Artigo em Inglês | MEDLINE | ID: mdl-30497517

RESUMO

BACKGROUND: Even though human sweat is odorless, bacterial growth and decomposition of specific odor precursors in it is believed to give rise to body odor in humans. While mechanisms of odor generation have been widely studied in adults, little is known for teenagers and pre-pubescent children who have distinct sweat composition from immature apocrine and sebaceous glands, but are arguably more susceptible to the social and psychological impact of malodor. RESULTS: We integrated information from whole microbiome analysis of multiple skin sites (underarm, neck, and head) and multiple time points (1 h and 8 h after bath), analyzing 180 samples in total to perform the largest metagenome-wide association study to date on malodor. Significant positive correlations were observed between odor intensity and the relative abundance of Staphylococcus hominis, Staphylococcus epidermidis, and Cutibacterium avidum, as well as negative correlation with Acinetobacter schindleri and Cutibacterium species. Metabolic pathway analysis highlighted the association of isovaleric and acetic acid production (sour odor) from enriched S. epidermidis (teen underarm) and S. hominis (child neck) enzymes and sulfur production from Staphylococcus species (teen underarm) with odor intensity, in good agreement with observed odor characteristics in pre-pubescent children and teenagers. Experiments with cultures on human and artificial sweat confirmed the ability of S. hominis and S. epidermidis to independently produce malodor with distinct odor characteristics. CONCLUSIONS: These results showcase the power of skin metagenomics to study host-microbial co-metabolic interactions, identifying distinct pathways for odor generation from sweat in pre-pubescent children and teenagers and highlighting key enzymatic targets for intervention.


Assuntos
Bactérias/classificação , Metagenômica/métodos , Odorantes/análise , Pele/microbiologia , Suor/microbiologia , Ácido Acético/análise , Acinetobacter/classificação , Acinetobacter/isolamento & purificação , Adolescente , Axila/microbiologia , Bactérias/isolamento & purificação , Criança , Feminino , Cabeça/microbiologia , Hemiterpenos , Humanos , Masculino , Pescoço/microbiologia , Ácidos Pentanoicos/análise , Propionibacteriaceae/classificação , Propionibacteriaceae/isolamento & purificação , Puberdade , Análise de Sequência de DNA , Pele/química , Staphylococcus epidermidis/classificação , Staphylococcus epidermidis/isolamento & purificação , Staphylococcus hominis/classificação , Staphylococcus hominis/isolamento & purificação , Enxofre/análise
3.
Gigascience ; 5: 2, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-26793302

RESUMO

BACKGROUND: Resolution of complex repeat structures and rearrangements in the assembly and analysis of large eukaryotic genomes is often aided by a combination of high-throughput sequencing and genome-mapping technologies (for example, optical restriction mapping). In particular, mapping technologies can generate sparse maps of large DNA fragments (150 kilo base pairs (kbp) to 2 Mbp) and thus provide a unique source of information for disambiguating complex rearrangements in cancer genomes. Despite their utility, combining high-throughput sequencing and mapping technologies has been challenging because of the lack of efficient and sensitive map-alignment algorithms for robustly aligning error-prone maps to sequences. RESULTS: We introduce a novel seed-and-extend glocal (short for global-local) alignment method, OPTIMA (and a sliding-window extension for overlap alignment, OPTIMA-Overlap), which is the first to create indexes for continuous-valued mapping data while accounting for mapping errors. We also present a novel statistical model, agnostic with respect to technology-dependent error rates, for conservatively evaluating the significance of alignments without relying on expensive permutation-based tests. CONCLUSIONS: We show that OPTIMA and OPTIMA-Overlap outperform other state-of-the-art approaches (1.6-2 times more sensitive) and are more efficient (170-200 %) and precise in their alignments (nearly 99 % precision). These advantages are independent of the quality of the data, suggesting that our indexing approach and statistical evaluation are robust, provide improved sensitivity and guarantee high precision.


Assuntos
Algoritmos , Mapeamento Cromossômico/métodos , Biologia Computacional/métodos , Genômica/métodos , Alinhamento de Sequência/métodos , Animais , Simulação por Computador , Drosophila melanogaster/genética , Sequenciamento de Nucleotídeos em Larga Escala , Humanos , Reprodutibilidade dos Testes
4.
Gigascience ; 4: 65, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-26719794

RESUMO

BACKGROUND: Next-generation sequencing (NGS) technologies have changed our understanding of the variability of the human genome. However, the identification of genome structural variations based on NGS approaches with read lengths of 35-300 bases remains a challenge. Single-molecule optical mapping technologies allow the analysis of DNA molecules of up to 2 Mb and as such are suitable for the identification of large-scale genome structural variations, and for de novo genome assemblies when combined with short-read NGS data. Here we present optical mapping data for two human genomes: the HapMap cell line GM12878 and the colorectal cancer cell line HCT116. FINDINGS: High molecular weight DNA was obtained by embedding GM12878 and HCT116 cells, respectively, in agarose plugs, followed by DNA extraction under mild conditions. Genomic DNA was digested with KpnI and 310,000 and 296,000 DNA molecules (≥ 150 kb and 10 restriction fragments), respectively, were analyzed per cell line using the Argus optical mapping system. Maps were aligned to the human reference by OPTIMA, a new glocal alignment method. Genome coverage of 6.8× and 5.7× was obtained, respectively; 2.9× and 1.7× more than the coverage obtained with previously available software. CONCLUSIONS: Optical mapping allows the resolution of large-scale structural variations of the genome, and the scaffold extension of NGS-based de novo assemblies. OPTIMA is an efficient new alignment method; our optical mapping data provide a resource for genome structure analyses of the human HapMap reference cell line GM12878, and the colorectal cancer cell line HCT116.


Assuntos
Neoplasias Colorretais/genética , DNA de Neoplasias , Genoma Humano , Análise de Sequência de DNA/métodos , Linhagem Celular Tumoral , Mapeamento Cromossômico/métodos , Projeto HapMap , Sequenciamento de Nucleotídeos em Larga Escala/métodos , Humanos
5.
Artigo em Inglês | MEDLINE | ID: mdl-26356333

RESUMO

UNLABELLED: The cell-type diversity is to a large degree driven by transcription regulation, i.e., enhancers. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Even if the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult. A similarity measure to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. This will allow large-scale analyses, clustering and genome-wide classifications. In this paper we present Under2, a parameter-free alignment-free statistic based on variable-length words. As opposed to traditional alignment-free methods, which are based on fixed-length patterns or, in other words, tied to a fixed resolution, our statistic is built upon variable-length words, and thus multiple resolutions are allowed. This will capture the great variability of lengths of CRMs. We evaluate several alignment-free statistics on simulated data and real ChIP-seq sequences. The new statistic is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. Finally, experiments on mouse enhancers show that Under2 can separate enhancers active in different tissues. AVAILABILITY: http://www.dei.unipd.it/~ciompin/main/UnderIICRMS.html.


Assuntos
Elementos Facilitadores Genéticos/genética , Genômica/métodos , Mamíferos/genética , Análise de Sequência de DNA/métodos , Animais , Humanos , Camundongos
6.
Algorithms Mol Biol ; 7(1): 34, 2012 Dec 06.
Artigo em Inglês | MEDLINE | ID: mdl-23216990

RESUMO

BACKGROUND: With the progress of modern sequencing technologies a large number of complete genomes are now available. Traditionally the comparison of two related genomes is carried out by sequence alignment. There are cases where these techniques cannot be applied, for example if two genomes do not share the same set of genes, or if they are not alignable to each other due to low sequence similarity, rearrangements and inversions, or more specifically to their lengths when the organisms belong to different species. For these cases the comparison of complete genomes can be carried out only with ad hoc methods that are usually called alignment-free methods. METHODS: In this paper we propose a distance function based on subword compositions called Underlying Approach (UA). We prove that the matching statistics, a popular concept in the field of string algorithms able to capture the statistics of common words between two sequences, can be derived from a small set of "independent" subwords, namely the irredundant common subwords. We define a distance-like measure based on these subwords, such that each region of genomes contributes only once, thus avoiding to count shared subwords a multiple number of times. In a nutshell, this filter discards subwords occurring in regions covered by other more significant subwords. RESULTS: The Underlying Approach (UA) builds a scoring function based on this set of patterns, called underlying. We prove that this set is by construction linear in the size of input, without overlaps, and can be efficiently constructed. Results show the validity of our method in the reconstruction of phylogenetic trees, where the Underlying Approach outperforms the current state of the art methods. Moreover, we show that the accuracy of UA is achieved with a very small number of subwords, which in some cases carry meaningful biological information. AVAILABILITY: http://www.dei.unipd.it/∼ciompin/main/underlying.html.

7.
J Comput Biol ; 18(12): 1819-29, 2011 Dec.
Artigo em Inglês | MEDLINE | ID: mdl-21548811

RESUMO

The automatic classification of protein sequences into families is of great help for the functional prediction and annotation of new proteins. In this article, we present a method called Irredundant Class that address the remote homology detection problem. The best performing methods that solve this problem are string kernels, that compute a similarity function between pairs of proteins based on their subsequence composition. We provide evidence that almost all string kernels are based on patterns that are not independent, and therefore the associated similarity scores are obtained using a set of redundant features, overestimating the similarity between the proteins. To specifically address this issue, we introduce the class of irredundant common patterns. Loosely speaking, the set of irredundant common patterns is the smallest class of independent patterns that can describe all common patterns in a pair of sequences. We present a classification method based on the statistics of these patterns, named Irredundant Class. Results on benchmark data show that the Irredundant Class outperforms most of the string kernels previously proposed, and it achieves results as good as the current state-of-the-art method Local Alignment, but using the same pairwise information only once.


Assuntos
Biologia Computacional/métodos , Proteínas/química , Homologia de Sequência de Aminoácidos , Sequência de Aminoácidos , Bases de Dados de Proteínas , Reconhecimento Automatizado de Padrão , Proteínas/classificação , Curva ROC , Alinhamento de Sequência
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA