Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Resultados 1 - 9 de 9
Filtrar
1.
Entropy (Basel) ; 25(1)2023 Jan 12.
Artículo en Inglés | MEDLINE | ID: mdl-36673295

RESUMEN

Kernel methods have played a major role in the last two decades in the modeling and visualization of complex problems in data science. The choice of kernel function remains an open research area and the reasons why some kernels perform better than others are not yet understood. Moreover, the high computational costs of kernel-based methods make it extremely inefficient to use standard model selection methods, such as cross-validation, creating a need for careful kernel design and parameter choice. These reasons justify the prior analyses of kernel matrices, i.e., mathematical objects generated by the kernel functions. This paper explores these topics from an entropic standpoint for the case of kernelized relevance vector machines (RVMs), pinpointing desirable properties of kernel matrices that increase the likelihood of obtaining good model performances in terms of generalization power, as well as relate these properties to the model's fitting ability. We also derive a heuristic for achieving close-to-optimal modeling results while keeping the computational costs low, thus providing a recipe for efficient analysis when processing resources are limited.

2.
BMC Bioinformatics ; 20(1): 410, 2019 Jul 30.
Artículo en Inglés | MEDLINE | ID: mdl-31362714

RESUMEN

BACKGROUND: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. RESULTS: We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. CONCLUSIONS: Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at https://bitbucket.org/elies_ramon/catkern.


Asunto(s)
Algoritmos , Biología Computacional/métodos , Farmacorresistencia Viral/genética , VIH-1/genética , Fármacos Anti-VIH/farmacología , Farmacorresistencia Viral/efectos de los fármacos , Infecciones por VIH/virología , VIH-1/efectos de los fármacos , VIH-1/aislamiento & purificación , Humanos , Modelos Lineales , Análisis de Componente Principal
3.
Sensors (Basel) ; 16(11)2016 Oct 26.
Artículo en Inglés | MEDLINE | ID: mdl-27792165

RESUMEN

Biosensors are small analytical devices incorporating a biological recognition element and a physico-chemical transducer to convert a biological signal into an electrical reading. Nowadays, their technological appeal resides in their fast performance, high sensitivity and continuous measuring capabilities; however, a full understanding is still under research. This paper aims to contribute to this growing field of biotechnology, with a focus on Glucose-Oxidase Biosensor (GOB) modeling through statistical learning methods from a regression perspective. We model the amperometric response of a GOB with dependent variables under different conditions, such as temperature, benzoquinone, pH and glucose concentrations, by means of several machine learning algorithms. Since the sensitivity of a GOB response is strongly related to these dependent variables, their interactions should be optimized to maximize the output signal, for which a genetic algorithm and simulated annealing are used. We report a model that shows a good generalization error and is consistent with the optimization.


Asunto(s)
Técnicas Biosensibles/métodos , Glucosa Oxidasa/metabolismo , Glucosa/análisis , Aprendizaje Automático , Benzoquinonas/química , Benzoquinonas/metabolismo , Concentración de Iones de Hidrógeno , Análisis de los Mínimos Cuadrados , Temperatura
4.
J Environ Manage ; 151: 317-25, 2015 Mar 15.
Artículo en Inglés | MEDLINE | ID: mdl-25585145

RESUMEN

In this study we use a machine learning software (Ichnaea) to generate predictive models for water samples with different concentrations of fecal contamination (point source, moderate and low). We applied several MST methods (host-specific Bacteroides phages, mitochondrial DNA genetic markers, Bifidobacterium adolescentis and Bifidobacterium dentium markers, and bifidobacterial host-specific qPCR), and general indicators (Escherichia coli, enterococci and somatic coliphages) to evaluate the source of contamination in the samples. The results provided data to the Ichnaea software, that evaluated the performance of each method in the different scenarios and determined the source of the contamination. Almost all MST methods in this study determined correctly the origin of fecal contamination at point source and in moderate concentration samples. When the dilution of the fecal pollution increased (below 3 log10 CFU E. coli/100 ml) some of these indicators (bifidobacterial host-specific qPCR, some mitochondrial markers or B. dentium marker) were not suitable because their concentrations decreased below the detection limit. Using the data from source point samples, the software Ichnaea produced models for waters with low levels of fecal pollution. These models included some MST methods, on the basis of their best performance, that were used to determine the source of pollution in this area. Regardless the methods selected, that could vary depending on the scenario, inductive machine learning methods are a promising tool in MST studies and may represent a leap forward in solving MST cases.


Asunto(s)
Inteligencia Artificial , Bacterias/clasificación , Heces/microbiología , Programas Informáticos , Microbiología del Agua , Bacterias/aislamiento & purificación , Colifagos , Monitoreo del Ambiente/métodos , Reacción en Cadena en Tiempo Real de la Polimerasa , Contaminación del Agua/análisis
5.
Adv Exp Med Biol ; 696: 45-55, 2011.
Artículo en Inglés | MEDLINE | ID: mdl-21431545

RESUMEN

Machine learning methods have of late made significant efforts to solving multidisciplinary problems in the field of cancer classification in microarray gene expression data. These tasks are characterized by a large number of features and a few observations, making the modeling a nontrivial undertaking. In this study, we apply entropic filter methods for gene selection, in combination with several off-the-shelf classifiers. The introduction of bootstrap resampling techniques permits the achievement of more stable performance estimates. Our findings show that the proposed methodology permits a drastic reduction in dimension, offering attractive solutions in terms of both prediction accuracy and number of explanatory genes; a dimensionality reduction technique preserving discrimination capabilities is used for visualization of the selected genes.


Asunto(s)
Perfilación de la Expresión Génica/estadística & datos numéricos , Neoplasias/genética , Análisis de Secuencia por Matrices de Oligonucleótidos/estadística & datos numéricos , Algoritmos , Inteligencia Artificial , Biología Computacional , Minería de Datos , Bases de Datos Genéticas , Diagnóstico Diferencial , Femenino , Humanos , Masculino , Neoplasias/clasificación , Neoplasias/diagnóstico
6.
Front Microbiol ; 12: 609048, 2021.
Artículo en Inglés | MEDLINE | ID: mdl-33584612

RESUMEN

The advent of next-generation sequencing technologies allowed relative quantification of microbiome communities and their spatial and temporal variation. In recent years, supervised learning (i.e., prediction of a phenotype of interest) from taxonomic abundances has become increasingly common in the microbiome field. However, a gap exists between supervised and classical unsupervised analyses, based on computing ecological dissimilarities for visualization or clustering. Despite this, both approaches face common challenges, like the compositional nature of next-generation sequencing data or the integration of the spatial and temporal dimensions. Here we propose a kernel framework to place on a common ground the unsupervised and supervised microbiome analyses, including the retrieval of microbial signatures (taxa importances). We define two compositional kernels (Aitchison-RBF and compositional linear) and discuss how to transform non-compositional beta-dissimilarity measures into kernels. Spatial data is integrated with multiple kernel learning, while longitudinal data is evaluated by specific kernels. We illustrate our framework through a single point soil dataset, a human dataset with a spatial component, and a previously unpublished longitudinal dataset concerning pig production. The proposed framework and the case studies are freely available in the kernInt package at https://github.com/elies-ramon/kernInt.

7.
Genes Genet Syst ; 90(6): 343-56, 2016 Apr 28.
Artículo en Inglés | MEDLINE | ID: mdl-26960968

RESUMEN

Facioscapulohumeral muscular dystrophy (FSHD) is a neuromuscular disorder that shows a preference for the facial, shoulder and upper arm muscles. FSHD affects about one in 20-400,000 people, and no effective therapeutic strategies are known to halt disease progression or reverse muscle weakness or atrophy. Many genes may be incorrectly regulated in affected muscle tissue, but the mechanisms responsible for the progressive muscle weakness remain largely unknown. Although machine learning (ML) has made significant inroads in biomedical disciplines such as cancer research, no reports have yet addressed FSHD analysis using ML techniques. This study explores a specific FSHD data set from a ML perspective. We report results showing a very promising small group of genes that clearly separates FSHD samples from healthy samples. In addition to numerical prediction figures, we show data visualizations and biological evidence illustrating the potential usefulness of these results.


Asunto(s)
Redes Reguladoras de Genes/genética , Estudios de Asociación Genética , Predisposición Genética a la Enfermedad , Distrofia Muscular Facioescapulohumeral/genética , Algoritmos , Regulación de la Expresión Génica , Humanos , Aprendizaje Automático , Músculo Esquelético/metabolismo , Músculo Esquelético/patología , Distrofia Muscular Facioescapulohumeral/fisiopatología , Mutación , Biosíntesis de Proteínas/genética
8.
PLoS One ; 8(12): e82071, 2013.
Artículo en Inglés | MEDLINE | ID: mdl-24349187

RESUMEN

The Facioscapulohumeral Muscular Dystrophy (FSHD) is an autosomal dominant neuromuscular disorder whose incidence is estimated in about one in 400,000 to one in 20,000. No effective therapeutic strategies are known to halt progression or reverse muscle weakness and atrophy. It is known that the FSHD is caused by modifications located within a D4ZA repeat array in the chromosome 4q, while recent advances have linked these modifications to the DUX4 gene. Unfortunately, the complete mechanisms responsible for the molecular pathogenesis and progressive muscle weakness still remain unknown. Although there are many studies addressing cancer databases from a machine learning perspective, there is no such precedent in the analysis of the FSHD. This study aims to fill this gap by analyzing two specific FSHD databases. A feature selection algorithm is used as the main engine to select genes promoting the highest possible classification capacity. The combination of feature selection and classification aims at obtaining simple models (in terms of very low numbers of genes) capable of good generalization, that may be associated with the disease. We show that the reported method is highly efficient in finding genes to discern between healthy cases (not affected by the FSHD) and FSHD cases, allowing the discovery of very parsimonious models that yield negligible repeated cross-validation error. These models in turn give rise to very simple decision procedures in the form of a decision tree. Current biological evidence regarding these genes shows that they are linked to skeletal muscle processes concerning specific human conditions.


Asunto(s)
Perfilación de la Expresión Génica , Distrofia Muscular Facioescapulohumeral/clasificación , Distrofia Muscular Facioescapulohumeral/genética , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Regulación de la Expresión Génica , Humanos , Modelos Genéticos
9.
Appl Environ Microbiol ; 72(9): 5915-26, 2006 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-16957211

RESUMEN

Several microbes and chemicals have been considered as potential tracers to identify fecal sources in the environment. However, to date, no one approach has been shown to accurately identify the origins of fecal pollution in aquatic environments. In this multilaboratory study, different microbial and chemical indicators were analyzed in order to distinguish human fecal sources from nonhuman fecal sources using wastewaters and slurries from diverse geographical areas within Europe. Twenty-six parameters, which were later combined to form derived variables for statistical analyses, were obtained by performing methods that were achievable in all the participant laboratories: enumeration of fecal coliform bacteria, enterococci, clostridia, somatic coliphages, F-specific RNA phages, bacteriophages infecting Bacteroides fragilis RYC2056 and Bacteroides thetaiotaomicron GA17, and total and sorbitol-fermenting bifidobacteria; genotyping of F-specific RNA phages; biochemical phenotyping of fecal coliform bacteria and enterococci using miniaturized tests; specific detection of Bifidobacterium adolescentis and Bifidobacterium dentium; and measurement of four fecal sterols. A number of potentially useful source indicators were detected (bacteriophages infecting B. thetaiotaomicron, certain genotypes of F-specific bacteriophages, sorbitol-fermenting bifidobacteria, 24-ethylcoprostanol, and epycoprostanol), although no one source identifier alone provided 100% correct classification of the fecal source. Subsequently, 38 variables (both single and derived) were defined from the measured microbial and chemical parameters in order to find the best subset of variables to develop predictive models using the lowest possible number of measured parameters. To this end, several statistical or machine learning methods were evaluated and provided two successful predictive models based on just two variables, giving 100% correct classification: the ratio of the densities of somatic coliphages and phages infecting Bacteroides thetaiotaomicron to the density of somatic coliphages and the ratio of the densities of fecal coliform bacteria and phages infecting Bacteroides thetaiotaomicron to the density of fecal coliform bacteria. Other models with high rates of correct classification were developed, but in these cases, higher numbers of variables were required.


Asunto(s)
Heces/microbiología , Técnicas Microbiológicas , Microbiología del Agua , Animales , Inteligencia Artificial , Bacteriófagos/clasificación , Bacteriófagos/genética , Bacteriófagos/aislamiento & purificación , Enterobacteriaceae/clasificación , Enterobacteriaceae/genética , Enterobacteriaceae/aislamiento & purificación , Enterococcus/clasificación , Enterococcus/genética , Enterococcus/aislamiento & purificación , Europa (Continente) , Heces/química , Heces/virología , Humanos , Fenotipo , Esteroles/análisis
SELECCIÓN DE REFERENCIAS
Detalles de la búsqueda