Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 7 de 7
Filtrar
1.
J Bioinform Comput Biol ; 7(2): 269-85, 2009 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-19340915

RESUMO

In the past decade, many automated prediction methods for the subcellular localization of proteins have been proposed, utilizing a wide range of principles and learning approaches. Based on an experimental evaluation of different methods and their theoretical properties, we propose to combine a well-balanced set of existing approaches to new, ensemble-based prediction methods. The experimental evaluation shows that our ensembles improve substantially over the underlying base methods.


Assuntos
Algoritmos , Proteínas/química , Proteínas/metabolismo , Análise de Sequência de Proteína/métodos , Frações Subcelulares/química , Frações Subcelulares/metabolismo , Sequência de Aminoácidos , Dados de Sequência Molecular , Relação Estrutura-Atividade
2.
BMC Bioinformatics ; 9: 207, 2008 Apr 23.
Artigo em Inglês | MEDLINE | ID: mdl-18433469

RESUMO

BACKGROUND: The increasing amount of published literature in biomedicine represents an immense source of knowledge, which can only efficiently be accessed by a new generation of automated information extraction tools. Named entity recognition of well-defined objects, such as genes or proteins, has achieved a sufficient level of maturity such that it can form the basis for the next step: the extraction of relations that exist between the recognized entities. Whereas most early work focused on the mere detection of relations, the classification of the type of relation is also of great importance and this is the focus of this work. In this paper we describe an approach that extracts both the existence of a relation and its type. Our work is based on Conditional Random Fields, which have been applied with much success to the task of named entity recognition. RESULTS: We benchmark our approach on two different tasks. The first task is the identification of semantic relations between diseases and treatments. The available data set consists of manually annotated PubMed abstracts. The second task is the identification of relations between genes and diseases from a set of concise phrases, so-called GeneRIF (Gene Reference Into Function) phrases. In our experimental setting, we do not assume that the entities are given, as is often the case in previous relation extraction work. Rather the extraction of the entities is solved as a subproblem. Compared with other state-of-the-art approaches, we achieve very competitive results on both data sets. To demonstrate the scalability of our solution, we apply our approach to the complete human GeneRIF database. The resulting gene-disease network contains 34758 semantic associations between 4939 genes and 1745 diseases. The gene-disease network is publicly available as a machine-readable RDF graph. CONCLUSION: We extend the framework of Conditional Random Fields towards the annotation of semantic relations from text and apply it to the biomedical domain. Our approach is based on a rich set of textual features and achieves a performance that is competitive to leading approaches. The model is quite general and can be extended to handle arbitrary biological entities and relation types. The resulting gene-disease network shows that the GeneRIF database provides a rich knowledge source for text mining. Current work is focused on improving the accuracy of detection of entities as well as entity boundaries, which will also greatly improve the relation extraction performance.


Assuntos
Sistemas de Gerenciamento de Base de Dados , Processamento de Linguagem Natural , Pesquisa Biomédica/métodos , Sistemas de Gerenciamento de Base de Dados/normas , Sistemas de Gerenciamento de Base de Dados/estatística & dados numéricos , Bases de Dados Genéticas , Doença/classificação , Doença/etiologia , Genes/fisiologia , Humanos , MEDLINE , Modelos Estatísticos , Semântica , Análise de Sequência , Terminologia como Assunto , Terapêutica/classificação , Vocabulário Controlado
3.
Bioinformatics ; 22(14): e49-57, 2006 Jul 15.
Artigo em Inglês | MEDLINE | ID: mdl-16873512

RESUMO

MOTIVATION: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. RESULTS: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. CONCLUSIONS: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments. AVAILABILITY: http://www.dbs.ifi.lmu.de/~borgward/MMD.


Assuntos
Algoritmos , Biologia Computacional/métodos , Interpretação Estatística de Dados , Bases de Dados Factuais , Armazenamento e Recuperação da Informação/métodos , Modelos Biológicos , Modelos Estatísticos , Simulação por Computador , Tamanho da Amostra , Distribuições Estatísticas , Integração de Sistemas
4.
Med Image Comput Comput Assist Interv ; 14(Pt 2): 607-14, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21995079

RESUMO

Registering CT scans in a body atlas is an important technique for aligning and comparing different CT scans. It is also required for navigating automatically to certain regions of a scan or if sub volumes should be identified automatically. Common solutions to this problem employ landmark detectors and interpolation techniques. However, these solutions are often not applicable if the query scan is very small or consists only of a single slice. Therefore, the research community proposed methods being independent from landmark detectors which are using imaging techniques to register the slices in a generalized height scale. In this paper, we propose an improved prediction method for registering single slices. Our solution is based on specialized image descriptors and instance-based learning. The experimental evaluation shows that the new method improves accuracy and stability of comparable registration methods by using only a single CT slice is required for the registration.


Assuntos
Processamento de Imagem Assistida por Computador/métodos , Tomografia Computadorizada por Raios X/métodos , Adolescente , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Criança , Pré-Escolar , Feminino , Humanos , Masculino , Pessoa de Meia-Idade , Modelos Estatísticos , Pescoço/patologia , Radiografia Torácica/métodos , Software
5.
Pac Symp Biocomput ; : 4-15, 2007.
Artigo em Inglês | MEDLINE | ID: mdl-17992741

RESUMO

It is widely believed that comparing discrepancies in the protein-protein interaction (PPI) networks of individuals will become an important tool in understanding and preventing diseases. Currently PPI networks for individuals are not available, but gene expression data is becoming easier to obtain and allows us to represent individuals by a co-integrated gene expression/protein interaction network. Two major problems hamper the application of graph kernels - state-of-the-art methods for whole-graph comparison - to compare PPI networks. First, these methods do not scale to graphs of the size of a PPI network. Second, missing edges in these interaction networks are biologically relevant for detecting discrepancies, yet, these methods do not take this into account. In this article we present graph kernels for biological network comparison that are fast to compute and take into account missing interactions. We evaluate their practical performance on two datasets of co-integrated gene expression/PPI networks.


Assuntos
Mapeamento de Interação de Proteínas/estatística & dados numéricos , Biologia Computacional , Bases de Dados Genéticas , Progressão da Doença , Perfilação da Expressão Gênica/estatística & dados numéricos , Humanos , Prognóstico , Análise Serial de Proteínas/estatística & dados numéricos
6.
Pac Symp Biocomput ; : 547-58, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-17094268

RESUMO

We present a kernel-based approach to the classification of time series of gene expression profiles. Our method takes into account the dynamic evolution over time as well as the temporal characteristics of the data. More specifically, we model the evolution of the gene expression profiles as a Linear Time Invariant (LTI) dynamical system and estimate its model parameters. A kernel on dynamical systems is then used to classify these time series. We successfully test our approach on a published dataset to predict response to drug therapy in Multiple Sclerosis patients. For pharmacogenomics, our method offers a huge potential for advanced computational tools in disease diagnosis, and disease and drug therapy outcome prognosis.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Inteligência Artificial , Biologia Computacional , Bases de Dados Genéticas , Humanos , Modelos Lineares , Modelos Genéticos , Esclerose Múltipla/tratamento farmacológico , Esclerose Múltipla/genética , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Farmacogenética/estatística & dados numéricos , Fatores de Tempo
7.
Bioinformatics ; 21 Suppl 1: i47-56, 2005 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-15961493

RESUMO

MOTIVATION: Computational approaches to protein function prediction infer protein function by finding proteins with similar sequence, structure, surface clefts, chemical properties, amino acid motifs, interaction partners or phylogenetic profiles. We present a new approach that combines sequential, structural and chemical information into one graph model of proteins. We predict functional class membership of enzymes and non-enzymes using graph kernels and support vector machine classification on these protein graphs. RESULTS: Our graph model, derivable from protein sequence and structure only, is competitive with vector models that require additional protein information, such as the size of surface pockets. If we include this extra information into our graph model, our classifier yields significantly higher accuracy levels than the vector models. Hyperkernels allow us to select and to optimally combine the most relevant node attributes in our protein graphs. We have laid the foundation for a protein function prediction system that integrates protein information from various sources efficiently and effectively. AVAILABILITY: More information available via www.dbs.ifi.lmu.de/Mitarbeiter/borgwardt.html.


Assuntos
Biologia Computacional/métodos , Enzimas/química , Algoritmos , Bases de Dados de Proteínas , Modelos Estatísticos , Conformação Proteica , Estrutura Secundária de Proteína , Análise de Sequência de Proteína/métodos , Software
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA