Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 12 de 12
Filtrar
Mais filtros

Base de dados
País como assunto
Tipo de documento
Intervalo de ano de publicação
1.
Mass Spectrom Rev ; 36(5): 600-614, 2017 09.
Artigo em Inglês | MEDLINE | ID: mdl-26709718

RESUMO

The elucidation of molecular interaction networks is one of the pivotal challenges in the study of biology. Affinity purification-mass spectrometry and other co-complex methods have become widely employed experimental techniques to identify protein complexes. These techniques typically suffer from a high number of false negatives and false positive contaminants due to technical shortcomings and purification biases. To support a diverse range of experimental designs and approaches, a large number of computational methods have been proposed to filter, infer and validate protein interaction networks from experimental pull-down MS data. Nevertheless, this expansion of available methods complicates the selection of the most optimal ones to support systems biology-driven knowledge extraction. In this review, we give an overview of the most commonly used computational methods to process and interpret co-complex results, and we discuss the issues and unsolved problems that still exist within the field. © 2015 Wiley Periodicals, Inc. Mass Spec Rev 36:600-614, 2017.


Assuntos
Biologia Computacional/métodos , Mapeamento de Interação de Proteínas/métodos , Mapas de Interação de Proteínas , Proteínas/análise , Análise por Conglomerados , Bases de Dados de Proteínas , Complexos Multiproteicos/análise , Complexos Multiproteicos/química , Complexos Multiproteicos/metabolismo , Mapeamento de Interação de Proteínas/normas , Proteínas/química , Proteínas/metabolismo , Controle de Qualidade , Reprodutibilidade dos Testes , Fluxo de Trabalho
2.
Brief Bioinform ; 16(2): 216-31, 2015 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-24162173

RESUMO

Over the past two decades, pattern mining techniques have become an integral part of many bioinformatics solutions. Frequent itemset mining is a popular group of pattern mining techniques designed to identify elements that frequently co-occur. An archetypical example is the identification of products that often end up together in the same shopping basket in supermarket transactions. A number of algorithms have been developed to address variations of this computationally non-trivial problem. Frequent itemset mining techniques are able to efficiently capture the characteristics of (complex) data and succinctly summarize it. Owing to these and other interesting properties, these techniques have proven their value in biological data analysis. Nevertheless, information about the bioinformatics applications of these techniques remains scattered. In this primer, we introduce frequent itemset mining and their derived association rules for life scientists. We give an overview of various algorithms, and illustrate how they can be used in several real-life bioinformatics application domains. We end with a discussion of the future potential and open challenges for frequent itemset mining in the life sciences.


Assuntos
Algoritmos , Mineração de Dados/estatística & dados numéricos , Animais , Análise por Conglomerados , Biologia Computacional , Perfilação da Expressão Gênica/estatística & dados numéricos , Redes Reguladoras de Genes , Sequenciamento de Nucleotídeos em Larga Escala/estatística & dados numéricos , Humanos , Reconhecimento Automatizado de Padrão/estatística & dados numéricos , Polimorfismo de Nucleotídeo Único , Software
3.
J Biomed Inform ; 69: 118-127, 2017 05.
Artigo em Inglês | MEDLINE | ID: mdl-28400312

RESUMO

Clinical codes are used for public reporting purposes, are fundamental to determining public financing for hospitals, and form the basis for reimbursement claims to insurance providers. They are assigned to a patient stay to reflect the diagnosis and performed procedures during that stay. This paper aims to enrich algorithms for automated clinical coding by taking a data-driven approach and by using unsupervised and semi-supervised techniques for the extraction of multi-word expressions that convey a generalisable medical meaning (referred to as concepts). Several methods for extracting concepts from text are compared, two of which are constructed from a large unannotated corpus of clinical free text. A distributional semantic model (i.c. the word2vec skip-gram model) is used to generalize over concepts and retrieve relations between them. These methods are validated on three sets of patient stay data, in the disease areas of urology, cardiology, and gastroenterology. The datasets are in Dutch, which introduces a limitation on available concept definitions from expert-based ontologies (e.g. UMLS). The results show that when expert-based knowledge in ontologies is unavailable, concepts derived from raw clinical texts are a reliable alternative. Both concepts derived from raw clinical texts perform and concepts derived from expert-created dictionaries outperform a bag-of-words approach in clinical code assignment. Adding features based on tokens that appear in a semantically similar context has a positive influence for predicting diagnostic codes. Furthermore, the experiments indicate that a distributional semantics model can find relations between semantically related concepts in texts but also introduces erroneous and redundant relations, which can undermine clinical coding performance.


Assuntos
Codificação Clínica , Bases de Conhecimento , Processamento de Linguagem Natural , Semântica , Algoritmos , Humanos , Idioma , Países Baixos
4.
J Proteome Res ; 13(9): 4175-83, 2014 Sep 05.
Artigo em Inglês | MEDLINE | ID: mdl-25004400

RESUMO

Spectral library searching is a popular approach for MS/MS-based peptide identification. Because the size of spectral libraries continues to grow, the performance of searching algorithms is an important issue. This technical note introduces a strategy based on a minimum shared peak count between two spectra to reduce the set of admissible candidate spectra when issuing a query. A theoretical validation through time complexity analysis and an experimental validation based on an implementation of the candidate reduction strategy show that the approach can achieve a reduction of the set of candidate spectra by (at least) an order of magnitude, resulting in a significant improvement in the speed of the search. Meanwhile, more than 99% of the positive search results is retained. This efficient strategy to drastically improve the speed of spectral library searching with a negligible loss of sensitivity can be applied to any current spectral library search tool, irrespective of the employed similarity metric.


Assuntos
Bases de Dados de Proteínas , Biblioteca de Peptídeos , Proteômica/métodos , Software , Algoritmos , Mineração de Dados , Humanos , Proteínas , Leveduras
5.
Proteome Sci ; 12(1): 54, 2014.
Artigo em Inglês | MEDLINE | ID: mdl-25429250

RESUMO

BACKGROUND: Mass spectrometry-based proteomics experiments generate spectra that are rich in information. Often only a fraction of this information is used for peptide/protein identification, whereas a significant proportion of the peaks in a spectrum remain unexplained. In this paper we explore how a specific class of data mining techniques termed "frequent itemset mining" can be employed to discover patterns in the unassigned data, and how such patterns can help us interpret the origin of the unexpected/unexplained peaks. RESULTS: First a model is proposed that describes the origin of the observed peaks in a mass spectrum. For this purpose we use the classical correlative database search algorithm. Peaks that support a positive identification of the spectrum are termed explained peaks. Next, frequent itemset mining techniques are introduced to infer which unexplained peaks are associated in a spectrum. The method is validated on two types of experimental proteomic data. First, peptide mass fingerprint data is analyzed to explain the unassigned peaks in a full scan mass spectrum. Interestingly, a large numbers of experimental spectra reveals several highly frequent unexplained masses, and pattern mining on these frequent masses demonstrates that subsets of these peaks frequently co-occur. Further evaluation shows that several of these co-occurring peaks indeed have a known common origin, and other patterns are promising hypothesis generators for further analysis. Second, the proposed methodology is validated on tandem mass spectrometral data using a public spectral library, where associations within the mass differences of unassigned peaks and peptide modifications are explored. The investigation of the found patterns illustrates that meaningful patterns can be discovered that can be explained by features of the employed technology and found modifications. CONCLUSIONS: This simple approach offers opportunities to monitor accumulating unexplained mass spectrometry data for emerging new patterns, with possible applications for the development of mass exclusion lists, for the refinement of quality control strategies and for a further interpretation of unexplained spectral peaks in mass spectrometry and tandem mass spectrometry.

6.
BMC Bioinformatics ; 12: 405, 2011 Oct 20.
Artigo em Inglês | MEDLINE | ID: mdl-22014236

RESUMO

BACKGROUND: Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. RESULTS: We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. CONCLUSIONS: The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.


Assuntos
Algoritmos , Análise por Conglomerados , Espectroscopia de Ressonância Magnética/métodos , Análise de Variância , Imageamento por Ressonância Magnética , Metabolômica , Software , Fluxo de Trabalho
7.
IEEE/ACM Trans Comput Biol Bioinform ; 16(5): 1496-1507, 2019.
Artigo em Inglês | MEDLINE | ID: mdl-27295680

RESUMO

In this paper, we present a subgroup discovery method to find subgraphs in a graph that are associated with a given set of vertices. The association between a subgraph pattern and a set of vertices is defined by its significant enrichment based on a Bonferroni-corrected hypergeometric probability value. This interestingness measure requires a dedicated pruning procedure to limit the number of subgraph matches that must be calculated. The presented mining algorithm to find associated subgraph patterns in large graphs is therefore designed to efficiently traverse the search space. We demonstrate the operation of this method by applying it on three biological graph data sets and show that we can find associated subgraphs for a biologically relevant set of vertices and that the found subgraphs themselves are biologically interesting.


Assuntos
Biologia Computacional/métodos , Mineração de Dados/métodos , Algoritmos , Redes Reguladoras de Genes , Modelos Estatísticos , Mapeamento de Interação de Proteínas
8.
BioData Min ; 11: 20, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30202444

RESUMO

Searching for interesting common subgraphs in graph data is a well-studied problem in data mining. Subgraph mining techniques focus on the discovery of patterns in graphs that exhibit a specific network structure that is deemed interesting within these data sets. The definition of which subgraphs are interesting and which are not is highly dependent on the application. These techniques have seen numerous applications and are able to tackle a range of biological research questions, spanning from the detection of common substructures in sets of biomolecular compounds, to the discovery of network motifs in large-scale molecular interaction networks. Thus far, information about the bioinformatics application of subgraph mining remains scattered over heterogeneous literature. In this review, we provide an introduction to subgraph mining for life scientists. We give an overview of various subgraph mining algorithms from a bioinformatics perspective and present several of their potential biomedical applications.

9.
Bioinform Biol Insights ; 10: 37-47, 2016.
Artigo em Inglês | MEDLINE | ID: mdl-27168722

RESUMO

Pattern detection is an inherent task in the analysis and interpretation of complex and continuously accumulating biological data. Numerous itemset mining algorithms have been developed in the last decade to efficiently detect specific pattern classes in data. Although many of these have proven their value for addressing bioinformatics problems, several factors still slow down promising algorithms from gaining popularity in the life science community. Many of these issues stem from the low user-friendliness of these tools and the complexity of their output, which is often large, static, and consequently hard to interpret. Here, we apply three software implementations on common bioinformatics problems and illustrate some of the advantages and disadvantages of each, as well as inherent pitfalls of biological data mining. Frequent itemset mining exists in many different flavors, and users should decide their software choice based on their research question, programming proficiency, and added value of extra features.

10.
BioData Min ; 8: 4, 2015.
Artigo em Inglês | MEDLINE | ID: mdl-25657820

RESUMO

BACKGROUND: The three-dimensional structure of a protein is an essential aspect of its functionality. Despite the large diversity in protein structures and functionality, it is known that there are common patterns and preferences in the contacts between amino acid residues, or between residues and other biomolecules, such as DNA. The discovery and characterization of these patterns is an important research topic within structural biology as it can give fundamental insight into protein structures and can aid in the prediction of unknown structures. RESULTS: Here we apply an efficient spatial pattern miner to search for sets of amino acids that occur frequently in close spatial proximity in the protein structures of the Protein DataBank. This allowed us to mine for a new class of amino acid patterns, that we term FreSCOs (Frequent Spatially Cohesive Component sets), which feature synergetic combinations. To demonstrate the relevance of these FreSCOs, they were compared in relation to the thermostability of the protein structure and the interaction preferences of DNA-protein complexes. In both cases, the results matched well with prior investigations using more complex methods on smaller data sets. CONCLUSIONS: The currently characterized protein structures feature a diverse set of frequent amino acid patterns that can be related to the stability of the protein molecular structure and that are independent from protein function or specific conserved domains.

11.
Artigo em Inglês | MEDLINE | ID: mdl-26356855

RESUMO

In this paper we present a cohesive structural itemset miner aiming to discover interesting patterns in a set of data objects within a multidimensional spatial structure by combining the cohesion and the support of the pattern. We propose two ways to build the itemset miner, VertexOne and VertexAll, in an attempt to find a balance between accuracy and run-times. The experiments show that VertexOne performs better, and finds almost the same itemsets as VertexAll in a much shorter time. The usefulness of the method is demonstrated by applying it to find interesting patterns of amino acids in spatial proximity within a set of proteins based on their atomic coordinates in the protein molecular structure. Several patterns found by the cohesive structural itemset miner contain amino acids that frequently co-occur in the spatial structure, even if they are distant in the primary protein sequence and only brought together by protein folding. Further various indications were found that some of the discovered patterns seem to represent common underlying support structures within the proteins.


Assuntos
Biologia Computacional/métodos , Conformação Proteica , Proteínas/química , Proteínas/metabolismo , Algoritmos , Mineração de Dados , Dobramento de Proteína
12.
Genome Biol ; 12(6): R57, 2011 Jun 22.
Artigo em Inglês | MEDLINE | ID: mdl-21696594

RESUMO

We present BioGraph, a data integration and data mining platform for the exploration and discovery of biomedical information. The platform offers prioritizations of putative disease genes, supported by functional hypotheses. We show that BioGraph can retrospectively confirm recently discovered disease genes and identify potential susceptibility genes, outperforming existing technologies, without requiring prior domain knowledge. Additionally, BioGraph allows for generic biomedical applications beyond gene discovery. BioGraph is accessible at http://www.biograph.be.


Assuntos
Biologia Computacional , Mineração de Dados , Software , Bases de Dados Factuais , Estudo de Associação Genômica Ampla , Humanos , Esquizofrenia/genética , Integração de Sistemas
SELEÇÃO DE REFERÊNCIAS
Detalhe da pesquisa