Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 8 de 8
Filtrar
1.
Brief Bioinform ; 9(2): 102-18, 2008 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-18310106

RESUMO

Mass-spectra based proteomic profiles have received widespread attention as potential tools for biomarker discovery and early disease diagnosis. A major data-analytical problem involved is the extremely high dimensionality (i.e. number of features or variables) of proteomic data, in particular when the sample size is small. This article reviews dimensionality reduction methods that have been used in proteomic biomarker studies. It then focuses on the problem of selecting the most appropriate method for a specific task or dataset, and proposes method combination as a potential alternative to single-method selection. Finally, it points out the potential of novel dimension reduction techniques, in particular those that incorporate domain knowledge through the use of informative priors or causal inference.


Assuntos
Biomarcadores/análise , Processamento Eletrônico de Dados , Proteômica/métodos , Pesquisa , Algoritmos , Processamento Eletrônico de Dados/instrumentação , Processamento Eletrônico de Dados/métodos , Espectrometria de Massas/métodos , Proteoma/análise , Pesquisa/instrumentação , Projetos de Pesquisa
2.
Bioinformatics ; 23(13): i256-63, 2007 Jul 01.
Artigo em Inglês | MEDLINE | ID: mdl-17646304

RESUMO

MOTIVATION: Protein annotation is a task that describes protein X in terms of topic Y. Usually, this is constructed using information from the biomedical literature. Until now, most of literature-based protein annotation work has been done manually by human annotators. However, as the number of biomedical papers grows ever more rapidly, manual annotation becomes more difficult, and there is increasing need to automate the process. Recently, information extraction (IE) has been used to address this problem. Typically, IE requires pre-defined relations and hand-crafted IE rules or annotated corpora, and these requirements are difficult to satisfy in real-world scenarios such as in the biomedical domain. In this article, we describe an IE system that requires only sentences labelled according to their relevance or not to a given topic by domain experts. RESULTS: We applied our system to meet the annotation needs of a well-known protein family database; the results show that our IE system can annotate proteins with a set of extracted relations by learning relations and IE rules for disease, function and structure from only relevant and irrelevant sentences.


Assuntos
Inteligência Artificial , Bases de Dados de Proteínas , Processamento de Linguagem Natural , Reconhecimento Automatizado de Padrão/métodos , Publicações Periódicas como Assunto , Proteínas/química , Proteínas/classificação , Análise de Sequência de Proteína/métodos
3.
Mass Spectrom Rev ; 25(3): 409-49, 2006.
Artigo em Inglês | MEDLINE | ID: mdl-16463283

RESUMO

Among the many applications of mass spectrometry, biomarker pattern discovery from protein mass spectra has aroused considerable interest in the past few years. While research efforts have raised hopes of early and less invasive diagnosis, they have also brought to light the many issues to be tackled before mass-spectra-based proteomic patterns become routine clinical tools. Known issues cover the entire pipeline leading from sample collection through mass spectrometry analytics to biomarker pattern extraction, validation, and interpretation. This study focuses on the data-analytical phase, which takes as input mass spectra of biological specimens and discovers patterns of peak masses and intensities that discriminate between different pathological states. We survey current work and investigate computational issues concerning the different stages of the knowledge discovery process: exploratory analysis, quality control, and diverse transforms of mass spectra, followed by further dimensionality reduction, classification, and model evaluation. We conclude after a brief discussion of the critical biomedical task of analyzing discovered discriminatory patterns to identify their component proteins as well as interpret and validate their biological implications.


Assuntos
Espectrometria de Massas/métodos , Proteínas/análise , Algoritmos , Animais , Biomarcadores , Biologia Computacional , Humanos , Espectrometria de Massas/classificação , Modelos Químicos , Mapeamento de Peptídeos , Proteômica
4.
Artif Intell Med ; 37(1): 7-18, 2006 May.
Artigo em Inglês | MEDLINE | ID: mdl-16233974

RESUMO

OBJECTIVE: An important problem that arises in hospitals is the monitoring and detection of nosocomial or hospital acquired infections (NIs). This paper describes a retrospective analysis of a prevalence survey of NIs done in the Geneva University Hospital. Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the survey. METHODS AND MATERIAL: Standard surveillance strategies are time-consuming and cannot be applied hospital-wide; alternative methods are required. In NI detection viewed as a classification task, the main difficulty resides in the significant imbalance between positive or infected (11%) and negative (89%) cases. To remedy class imbalance, we explore two distinct avenues: (1) a new re-sampling approach in which both over-sampling of rare positives and under-sampling of the noninfected majority rely on synthetic cases (prototypes) generated via class-specific sub-clustering, and (2) a support vector algorithm in which asymmetrical margins are tuned to improve recognition of rare positive cases. RESULTS AND CONCLUSION: Experiments have shown both approaches to be effective for the NI detection problem. Our novel re-sampling strategies perform remarkably better than classical random re-sampling. However, they are outperformed by asymmetrical soft margin support vector machines which attained a sensitivity rate of 92%, significantly better than the highest sensitivity (87%) obtained via prototype-based re-sampling.


Assuntos
Infecção Hospitalar/epidemiologia , Modelos Estatísticos , Vigilância da População/métodos , Algoritmos , Inteligência Artificial , Análise por Conglomerados , Estudos Transversais , Hospitais Universitários , Humanos , Controle de Infecções , Curva ROC , Estudos Retrospectivos , Suíça/epidemiologia
5.
Stud Health Technol Inform ; 116: 193-8, 2005.
Artigo em Inglês | MEDLINE | ID: mdl-16160258

RESUMO

This paper addresses the model selection problem for Support Vector Machines. A hybrid genetic algorithm guided by Direct Simplex Search to evolves hyperparameter values using an empirical error estimate as a steering criterion. This approach is specificaly tailored and experimentally evaluated on a health care problem which involves discriminating 11 % nosocomially infected patients from 89 % non infected patients. The combination of Direct Search Simplex with GAs is shown to improve the performance of GAs in terms of solution quality and computational efficiency. Unlike most other hyperparameter tuning techniques, our hybrid approach does not require supplementary effort such as computation of derivatives, making them well suited for practical purposes. This method produces encouraging results: it exhibits high performance and good convergence properties.


Assuntos
Algoritmos , Máquina de Vetores de Suporte , Inteligência Artificial , Humanos , Modelos Teóricos
6.
Stud Health Technol Inform ; 107(Pt 1): 716-20, 2004.
Artigo em Inglês | MEDLINE | ID: mdl-15360906

RESUMO

Nosocomial infections (NIs)---those acquired in health care settings---are among the major causes of increased mortality among hospitalized patients. They are a significant burden for patients and health authorities alike; it is thus important to monitor and detect them through an effective surveillance system. This paper describes a retrospective analysis of a prevalence survey of NIs done in the Geneva University Hospital. Our goal is to identify patients with one or more NIs on the basis of clinical and other data collected during the survey. In this two-class classification task, the main difficulty lies in the significant imbalance between positive or infected (11%) and negative (89%) cases. To cope with class imbalance, we investigate one-class SVMs which can be trained to distinguish two classes on the basis of examples from a single class (in this case, only "normal" or non infected patients). The infected ones are then identified as "abnormal" cases or outliers that deviate significantly from the normal profile. Experimental results are encouraging: whereas standard 2-class SVMs scored a baseline sensitivity of 50.6% on this problem, the one-class approach increased sensitivity to as much as 92.6%. These results are comparable to those obtained by the authors in a previous study on asymmetrical soft margin SVMs; they suggest that one-class SVMs can provide an effective and efficient way of overcoming data imbalance in classification problems.


Assuntos
Inteligência Artificial , Infecção Hospitalar/diagnóstico , Algoritmos , Infecção Hospitalar/epidemiologia , Coleta de Dados , Hospitais Universitários , Humanos , Controle de Infecções , Vigilância da População , Prevalência , Estudos Retrospectivos , Suíça/epidemiologia
7.
Proteomics ; 4(8): 2320-32, 2004 Aug.
Artigo em Inglês | MEDLINE | ID: mdl-15274126

RESUMO

In this paper we try to identify potential biomarkers for early stroke diagnosis using surface-enhanced laser desorption/ionization mass spectrometry coupled with analysis tools from machine learning and data mining. Data consist of 42 specimen samples, i.e., mass spectra divided in two big categories, stroke and control specimens. Among the stroke specimens two further categories exist that correspond to ischemic and hemorrhagic stroke; in this paper we limit our data analysis to discriminating between control and stroke specimens. We performed two suites of experiments. In the first one we simply applied a number of different machine learning algorithms; in the second one we have chosen the best performing algorithm as it was determined from the first phase and coupled it with a number of different feature selection methods. The reason for this was 2-fold, first to establish whether feature selection can indeed improve performance, which in our case it did not seem to confirm, but more importantly to acquire a small list of potentially interesting biomarkers. Of the different methods explored the most promising one was support vector machines which gave us high levels of sensitivity and specificity. Finally, by analyzing the models constructed by support vector machines we produced a small set of 13 features that could be used as potential biomarkers, and which exhibited good performance both in terms of sensitivity, specificity and model stability.


Assuntos
Espectrometria de Massas/métodos , Acidente Vascular Cerebral/sangue , Acidente Vascular Cerebral/diagnóstico , Adulto , Idoso , Idoso de 80 Anos ou mais , Algoritmos , Biomarcadores , Feminino , Perfilação da Expressão Gênica , Humanos , Masculino , Pessoa de Meia-Idade , Análise Serial de Proteínas , Sensibilidade e Especificidade , Acidente Vascular Cerebral/classificação , Acidente Vascular Cerebral/patologia
8.
Proteomics ; 3(9): 1716-9, 2003 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-12973731

RESUMO

We addressed the problem of discriminating between 24 diseased and 17 healthy specimens on the basis of protein mass spectra. To prepare the data, we performed mass to charge ratio (m/z) normalization, baseline elimination, and conversion of absolute peak height measures to height ratios. After preprocessing, the major difficulty encountered was the extremely large number of variables (1676 m/z values) versus the number of examples (41). Dimensionality reduction was treated as an integral part of the classification process; variable selection was coupled with model construction in a single ten-fold cross-validation loop. We explored different experimental setups involving two peak height representations, two variable selection methods, and six induction algorithms, all on both the original 1676-mass data set and on a prescreened 124-mass data set. Highest predictive accuracies (1-2 off-sample misclassifications) were achieved by a multilayer perceptron and Naïve Bayes, with the latter displaying more consistent performance (hence greater reliability) over varying experimental conditions. We attempted to identify the most discriminant peaks (proteins) on the basis of scores assigned by the two variable selection methods and by neural network based sensitivity analysis. These three scoring schemes consistently ranked four peaks as the most relevant discriminators: 11683, 1403, 17350 and 66107.


Assuntos
Inteligência Artificial , Neoplasias Pulmonares/diagnóstico , Espectrometria de Massas/métodos , Proteínas/química , Algoritmos , Biologia Computacional/métodos , Bases de Dados de Proteínas , Humanos , Espectrometria de Massas/estatística & dados numéricos
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA