Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Entropy (Basel) ; 25(1)2023 Jan 12.
Artigo em Inglês | MEDLINE | ID: mdl-36673295

RESUMO

Kernel methods have played a major role in the last two decades in the modeling and visualization of complex problems in data science. The choice of kernel function remains an open research area and the reasons why some kernels perform better than others are not yet understood. Moreover, the high computational costs of kernel-based methods make it extremely inefficient to use standard model selection methods, such as cross-validation, creating a need for careful kernel design and parameter choice. These reasons justify the prior analyses of kernel matrices, i.e., mathematical objects generated by the kernel functions. This paper explores these topics from an entropic standpoint for the case of kernelized relevance vector machines (RVMs), pinpointing desirable properties of kernel matrices that increase the likelihood of obtaining good model performances in terms of generalization power, as well as relate these properties to the model's fitting ability. We also derive a heuristic for achieving close-to-optimal modeling results while keeping the computational costs low, thus providing a recipe for efficient analysis when processing resources are limited.

2.
BMC Bioinformatics ; 20(1): 410, 2019 Jul 30.
Artigo em Inglês | MEDLINE | ID: mdl-31362714

RESUMO

BACKGROUND: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. RESULTS: We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. CONCLUSIONS: Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern.


Assuntos
Algoritmos , Biologia Computacional/métodos , Farmacorresistência Viral/genética , HIV-1/genética , Fármacos Anti-HIV/farmacologia , Farmacorresistência Viral/efeitos dos fármacos , Infecções por HIV/virologia , HIV-1/efeitos dos fármacos , HIV-1/isolamento & purificação , Humanos , Modelos Lineares , Análise de Componente Principal
3.
Sensors (Basel) ; 16(11)2016 Oct 26.
Artigo em Inglês | MEDLINE | ID: mdl-27792165

RESUMO

Biosensors are small analytical devices incorporating a biological recognition element and a physico-chemical transducer to convert a biological signal into an electrical reading. Nowadays, their technological appeal resides in their fast performance, high sensitivity and continuous measuring capabilities; however, a full understanding is still under research. This paper aims to contribute to this growing field of biotechnology, with a focus on Glucose-Oxidase Biosensor (GOB) modeling through statistical learning methods from a regression perspective. We model the amperometric response of a GOB with dependent variables under different conditions, such as temperature, benzoquinone, pH and glucose concentrations, by means of several machine learning algorithms. Since the sensitivity of a GOB response is strongly related to these dependent variables, their interactions should be optimized to maximize the output signal, for which a genetic algorithm and simulated annealing are used. We report a model that shows a good generalization error and is consistent with the optimization.


Assuntos
Técnicas Biossensoriais/métodos , Glucose Oxidase/metabolismo , Glucose/análise , Aprendizado de Máquina , Benzoquinonas/química , Benzoquinonas/metabolismo , Concentração de Íons de Hidrogênio , Análise dos Mínimos Quadrados , Temperatura
4.
J Environ Manage ; 151: 317-25, 2015 Mar 15.
Artigo em Inglês | MEDLINE | ID: mdl-25585145

RESUMO

In this study we use a machine learning software (Ichnaea) to generate predictive models for water samples with different concentrations of fecal contamination (point source, moderate and low). We applied several MST methods (host-specific Bacteroides phages, mitochondrial DNA genetic markers, Bifidobacterium adolescentis and Bifidobacterium dentium markers, and bifidobacterial host-specific qPCR), and general indicators (Escherichia coli, enterococci and somatic coliphages) to evaluate the source of contamination in the samples. The results provided data to the Ichnaea software, that evaluated the performance of each method in the different scenarios and determined the source of the contamination. Almost all MST methods in this study determined correctly the origin of fecal contamination at point source and in moderate concentration samples. When the dilution of the fecal pollution increased (below 3 log10 CFU E. coli/100 ml) some of these indicators (bifidobacterial host-specific qPCR, some mitochondrial markers or B. dentium marker) were not suitable because their concentrations decreased below the detection limit. Using the data from source point samples, the software Ichnaea produced models for waters with low levels of fecal pollution. These models included some MST methods, on the basis of their best performance, that were used to determine the source of pollution in this area. Regardless the methods selected, that could vary depending on the scenario, inductive machine learning methods are a promising tool in MST studies and may represent a leap forward in solving MST cases.


Assuntos
Inteligência Artificial , Bactérias/classificação , Fezes/microbiologia , Software , Microbiologia da Água , Bactérias/isolamento & purificação , Colífagos , Monitoramento Ambiental/métodos , Reação em Cadeia da Polimerase em Tempo Real , Poluição da Água/análise
5.
Adv Exp Med Biol ; 696: 45-55, 2011.
Artigo em Inglês | MEDLINE | ID: mdl-21431545

RESUMO

Machine learning methods have of late made significant efforts to solving multidisciplinary problems in the field of cancer classification in microarray gene expression data. These tasks are characterized by a large number of features and a few observations, making the modeling a nontrivial undertaking. In this study, we apply entropic filter methods for gene selection, in combination with several off-the-shelf classifiers. The introduction of bootstrap resampling techniques permits the achievement of more stable performance estimates. Our findings show that the proposed methodology permits a drastic reduction in dimension, offering attractive solutions in terms of both prediction accuracy and number of explanatory genes; a dimensionality reduction technique preserving discrimination capabilities is used for visualization of the selected genes.


Assuntos
Perfilação da Expressão Gênica/estatística & dados numéricos , Neoplasias/genética , Análise de Sequência com Séries de Oligonucleotídeos/estatística & dados numéricos , Algoritmos , Inteligência Artificial , Biologia Computacional , Mineração de Dados , Bases de Dados Genéticas , Diagnóstico Diferencial , Feminino , Humanos , Masculino , Neoplasias/classificação , Neoplasias/diagnóstico
6.
Front Microbiol ; 12: 609048, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-33584612

RESUMO

The advent of next-generation sequencing technologies allowed relative quantification of microbiome communities and their spatial and temporal variation. In recent years, supervised learning (i.e., prediction of a phenotype of interest) from taxonomic abundances has become increasingly common in the microbiome field. However, a gap exists between supervised and classical unsupervised analyses, based on computing ecological dissimilarities for visualization or clustering. Despite this, both approaches face common challenges, like the compositional nature of next-generation sequencing data or the integration of the spatial and temporal dimensions. Here we propose a kernel framework to place on a common ground the unsupervised and supervised microbiome analyses, including the retrieval of microbial signatures (taxa importances). We define two compositional kernels (Aitchison-RBF and compositional linear) and discuss how to transform non-compositional beta-dissimilarity measures into kernels. Spatial data is integrated with multiple kernel learning, while longitudinal data is evaluated by specific kernels. We illustrate our framework through a single point soil dataset, a human dataset with a spatial component, and a previously unpublished longitudinal dataset concerning pig production. The proposed framework and the case studies are freely available in the kernInt package at https://github.com/elies-ramon/kernInt.

7.
Genes Genet Syst ; 90(6): 343-56, 2016 Apr 28.
Artigo em Inglês | MEDLINE | ID: mdl-26960968

RESUMO

Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results.


Assuntos
Redes Reguladoras de Genes/genética , Estudos de Associação Genética , Predisposição Genética para Doença , Distrofia Muscular Facioescapuloumeral/genética , Algoritmos , Regulação da Expressão Gênica , Humanos , Aprendizado de Máquina , Músculo Esquelético/metabolismo , Músculo Esquelético/patologia , Distrofia Muscular Facioescapuloumeral/fisiopatologia , Mutação , Biossíntese de Proteínas/genética
8.
PLoS One ; 8(12): e82071, 2013.
Artigo em Inglês | MEDLINE | ID: mdl-24349187

RESUMO

The Facioscapulohumeral Muscular Dystrophy (FSHD) is an autosomal dominant neuromuscular disorder whose incidence is estimated in about one in 400,000 to one in 20,000. No effective therapeutic strategies are known to halt progression or reverse muscle weakness and atrophy. It is known that the FSHD is caused by modifications located within a D4ZA repeat array in the chromosome 4q, while recent advances have linked these modifications to the DUX4 gene. Unfortunately, the complete mechanisms responsible for the molecular pathogenesis and progressive muscle weakness still remain unknown. Although there are many studies addressing cancer databases from a machine learning perspective, there is no such precedent in the analysis of the FSHD. This study aims to fill this gap by analyzing two specific FSHD databases. A feature selection algorithm is used as the main engine to select genes promoting the highest possible classification capacity. The combination of feature selection and classification aims at obtaining simple models (in terms of very low numbers of genes) capable of good generalization, that may be associated with the disease. We show that the reported method is highly efficient in finding genes to discern between healthy cases (not affected by the FSHD) and FSHD cases, allowing the discovery of very parsimonious models that yield negligible repeated cross-validation error. These models in turn give rise to very simple decision procedures in the form of a decision tree. Current biological evidence regarding these genes shows that they are linked to skeletal muscle processes concerning specific human conditions.


Assuntos
Perfilação da Expressão Gênica , Distrofia Muscular Facioescapuloumeral/classificação , Distrofia Muscular Facioescapuloumeral/genética , Algoritmos , Análise por Conglomerados , Bases de Dados Genéticas , Regulação da Expressão Gênica , Humanos , Modelos Genéticos
9.
Appl Environ Microbiol ; 72(9): 5915-26, 2006 Sep.
Artigo em Inglês | MEDLINE | ID: mdl-16957211

RESUMO

Several microbes and chemicals have been considered as potential tracers to identify fecal sources in the environment. However, to date, no one approach has been shown to accurately identify the origins of fecal pollution in aquatic environments. In this multilaboratory study, different microbial and chemical indicators were analyzed in order to distinguish human fecal sources from nonhuman fecal sources using wastewaters and slurries from diverse geographical areas within Europe. Twenty-six parameters, which were later combined to form derived variables for statistical analyses, were obtained by performing methods that were achievable in all the participant laboratories: enumeration of fecal coliform bacteria, enterococci, clostridia, somatic coliphages, F-specific RNA phages, bacteriophages infecting Bacteroides fragilis RYC2056 and Bacteroides thetaiotaomicron GA17, and total and sorbitol-fermenting bifidobacteria; genotyping of F-specific RNA phages; biochemical phenotyping of fecal coliform bacteria and enterococci using miniaturized tests; specific detection of Bifidobacterium adolescentis and Bifidobacterium dentium; and measurement of four fecal sterols. A number of potentially useful source indicators were detected (bacteriophages infecting B. thetaiotaomicron, certain genotypes of F-specific bacteriophages, sorbitol-fermenting bifidobacteria, 24-ethylcoprostanol, and epycoprostanol), although no one source identifier alone provided 100% correct classification of the fecal source. Subsequently, 38 variables (both single and derived) were defined from the measured microbial and chemical parameters in order to find the best subset of variables to develop predictive models using the lowest possible number of measured parameters. To this end, several statistical or machine learning methods were evaluated and provided two successful predictive models based on just two variables, giving 100% correct classification: the ratio of the densities of somatic coliphages and phages infecting Bacteroides thetaiotaomicron to the density of somatic coliphages and the ratio of the densities of fecal coliform bacteria and phages infecting Bacteroides thetaiotaomicron to the density of fecal coliform bacteria. Other models with high rates of correct classification were developed, but in these cases, higher numbers of variables were required.


Assuntos
Fezes/microbiologia , Técnicas Microbiológicas , Microbiologia da Água , Animais , Inteligência Artificial , Bacteriófagos/classificação , Bacteriófagos/genética , Bacteriófagos/isolamento & purificação , Enterobacteriaceae/classificação , Enterobacteriaceae/genética , Enterobacteriaceae/isolamento & purificação , Enterococcus/classificação , Enterococcus/genética , Enterococcus/isolamento & purificação , Europa (Continente) , Fezes/química , Fezes/virologia , Humanos , Fenótipo , Esteróis/análise
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA