Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 10 de 10
Filtrar
1.
BMC Bioinformatics ; 25(1): 188, 2024 May 14.
Artículo en Inglés | MEDLINE | ID: mdl-38745112

RESUMEN

BACKGROUND: Microbiome dysbiosis has recently been associated with different diseases and disorders. In this context, machine learning (ML) approaches can be useful either to identify new patterns or learn predictive models. However, data to be fed to ML methods can be subject to different sampling, sequencing and preprocessing techniques. Each different choice in the pipeline can lead to a different view (i.e., feature set) of the same individuals, that classical (single-view) ML approaches may fail to simultaneously consider. Moreover, some views may be incomplete, i.e., some individuals may be missing in some views, possibly due to the absence of some measurements or to the fact that some features are not available/applicable for all the individuals. Multi-view learning methods can represent a possible solution to consider multiple feature sets for the same individuals, but most existing multi-view learning methods are limited to binary classification tasks or cannot work with incomplete views. RESULTS: We propose irBoost.SH, an extension of the multi-view boosting algorithm rBoost.SH, based on multi-armed bandits. irBoost.SH solves multi-class classification tasks and can analyze incomplete views. At each iteration, it identifies one winning view using adversarial multi-armed bandits and uses its predictions to update a shared instance weight distribution in a learning process based on boosting. In our experiments, performed on 5 multi-view microbiome datasets, the model learned by irBoost.SH always outperforms the best model learned from a single view, its closest competitor rBoost.SH, and the model learned by a multi-view approach based on feature concatenation, reaching an improvement of 11.8% of the F1-score in the prediction of the Autism Spectrum disorder and of 114% in the prediction of the Colorectal Cancer disease. CONCLUSIONS: The proposed method irBoost.SH exhibited outstanding performances in our experiments, also compared to competitor approaches. The obtained results confirm that irBoost.SH can fruitfully be adopted for the analysis of microbiome data, due to its capability to simultaneously exploit multiple feature sets obtained through different sequencing and preprocessing pipelines.


Asunto(s)
Algoritmos , Aprendizaje Automático , Microbiota , Humanos
2.
Sci Rep ; 13(1): 3205, 2023 02 24.
Artículo en Inglés | MEDLINE | ID: mdl-36828900

RESUMEN

Pollen monitoring have become data-intensive in recent years as real-time detectors are deployed to classify airborne pollen grains. Machine learning models with a focus on deep learning, have an essential role in the pollen classification task. Within this study we developed an explainable framework to unveil a deep learning model for pollen classification. Model works on data coming from single particle detector (Rapid-E) that records for each particle optical fingerprint with scattered light and laser induced fluorescence. Morphological properties of a particle are sensed with the light scattering process, while chemical properties are encoded with fluorescence spectrum and fluorescence lifetime induced by high-resolution laser. By utilizing these three data modalities, scattering, spectrum, and lifetime, deep learning-based models with millions of parameters are learned to distinguish different pollen classes, but a proper understanding of such a black-box model decisions demands additional methods to employ. Our study provides the first results of applied explainable artificial intelligence (xAI) methodology on the pollen classification model. Extracted knowledge on the important features that attribute to the predicting particular pollen classes is further examined from the perspective of domain knowledge and compared to available reference data on pollen sizes, shape, and laboratory spectrofluorometer measurements.


Asunto(s)
Inteligencia Artificial , Aprendizaje Profundo , Espectrometría de Fluorescencia , Recolección de Datos , Polen
3.
Sci Total Environ ; 851(Pt 2): 158234, 2022 Dec 10.
Artículo en Inglés | MEDLINE | ID: mdl-36007635

RESUMEN

Pollen is the most common cause of seasonal allergies, affecting over 33 % of the European population, even when considering only grasses. Informing the population and clinicians in real-time about the actual presence of pollen in the atmosphere is essential to reduce its harmful health and economic impact. Thus, there is a growing network of automatic particle analysers, and the reproducibility and transferability of implemented models are recommended since a reference dataset for local pollen of interest needs to be collected for each device to classify pollen, which is complex and time-consuming. Therefore, it would be beneficial to incorporate the reference dataset collected from other devices in different locations. However, it must be considered that laser-induced data are prone to device-specific noise due to laser and detector sensibility. This study collected data from two Rapid-E bioaerosol identifiers in Serbia and Italy and implemented a multi-modal convolutional neural network for pollen classification. We showed that models lost their performance when trained on data from one and tested on another device, not only in terms of the recognition ability but also in comparison with the manual measurements from Hirst-type traps. To enable pollen classification with just one model in both study locations, we first included the missing pollen classes in the dataset from the other study location, but it showed poor results, implying that data of one pollen class from different devices are more different than data of different pollen classes from one device. Combining all available reference data in a single model enabled the classification of a higher number of pollen classes in both study locations. Finally, we implemented a domain adaptation method, which improved the recognition ability and the correlations of transferred models only for several pollen classes.


Asunto(s)
Redes Neurales de la Computación , Polen , Reproducibilidad de los Resultados , Atmósfera , Poaceae , Alérgenos
4.
Sci Rep ; 11(1): 23109, 2021 11 30.
Artículo en Inglés | MEDLINE | ID: mdl-34848748

RESUMEN

Tomato is an important commercial product which is perishable by nature and highly susceptible to fungal incidence once it is harvested. Not all tomatoes are equally vulnerable to pathogenic fungi, and an early detection of the vulnerable ones can help in taking timely preventive actions, ranging from isolating tomato batches to adjusting storage conditions, but also in making right business decisions like dynamic pricing based on quality or better shelf life estimate. More importantly, early detection of vulnerable produce can help in taking timely actions to minimize potential post-harvest losses. This paper investigates Near-infrared (NIR) hyperspectral imaging (1000-1700 nm) and machine learning to build models to automatically predict the susceptibility of sepals of recently harvested tomatoes to future fungal infections. Hyperspectral images of newly harvested tomatoes (cultivar Brioso) from 5 different growers were acquired before the onset of any visible fungal infection. After imaging, the tomatoes were placed under controlled conditions suited for fungal germination and growth for a 4-day period, and then imaged using normal color cameras. All sepals in the color images were ranked for fungal severity using crowdsourcing, and the final severity of each sepal was fused using principal component analysis. A novel hyperspectral data processing pipeline is presented which was used to automatically segment the tomato sepals from spectral images with multiple tomatoes connected via a truss. The key modelling question addressed in this research is whether there is a correlation between the hyperspectral data captured at harvest and the fungal infection observed 4 days later. Using 10-fold and group k-fold cross-validation, XG-Boost and Random Forest based regression models were trained on the features derived from the hyperspectral data corresponding to each sepal in the training set and tested on hold out test set. The best model found a Pearson correlation of 0.837, showing that there is strong linear correlation between the NIR spectra and the future fungal severity of the sepal. The sepal specific predictions were aggregated to predict the susceptibility of individual tomatoes, and a correlation of 0.92 was found. Besides modelling, focus is also on model interpretation, particularly to understand which spectral features are most relevant to model prediction. Two approaches to model interpretation were explored, feature importance and SHAP (SHapley Additive exPlanations), resulting in similar conclusions that the NIR range between 1390-1420 nm contributes most to the model's final decision.


Asunto(s)
Enfermedades de las Plantas/genética , Enfermedades de las Plantas/microbiología , Solanum lycopersicum/microbiología , Espectroscopía Infrarroja Corta/métodos , Algoritmos , Calibración , Productos Agrícolas , Aprendizaje Profundo , Frutas/microbiología , Solanum lycopersicum/genética , Aprendizaje Automático , Microbiología , Reconocimiento de Normas Patrones Automatizadas , Enfermedades de las Plantas/prevención & control , Análisis de Componente Principal , Reproducibilidad de los Resultados , Programas Informáticos
5.
Stud Health Technol Inform ; 285: 165-170, 2021 Oct 27.
Artículo en Inglés | MEDLINE | ID: mdl-34734869

RESUMEN

In this study, we investigate faecal microbiota composition, in an attempt to evaluate performance of classification algorithms in identifying Inflammatory Bowel Disease (IBD) and its two types: Crohn's disease (CD) and ulcerative colitis (UC). From many investigated algorithms, a random forest (RF) classifier was selected for detailed evaluation in three-class (CD versus UC versus nonIBD) classification task and two binary (nonIBD versus IBD and CD versus UC) classification tasks. We dealt with class imbalance, performed extensive parameter search, dimensionality reduction and two-level classification. In three-class classification, our best model reaches F1 score of 91% in average, which confirms the strong connection of IBD and gastrointestinal microbiome. Among most important features in three-class classification are species Staphylococcus hominis, Porphyromonas endodontalis, Slackia piriformis and genus Bacteroidetes.


Asunto(s)
Colitis Ulcerosa , Enfermedad de Crohn , Microbioma Gastrointestinal , Enfermedades Inflamatorias del Intestino , Actinobacteria , Bacteroidetes , Colitis Ulcerosa/diagnóstico , Colitis Ulcerosa/microbiología , Enfermedad de Crohn/diagnóstico , Enfermedad de Crohn/microbiología , Humanos , Enfermedades Inflamatorias del Intestino/diagnóstico , Enfermedades Inflamatorias del Intestino/microbiología , Aprendizaje Automático , Porphyromonas endodontalis , Staphylococcus hominis
6.
Sci Rep ; 10(1): 3421, 2020 02 25.
Artículo en Inglés | MEDLINE | ID: mdl-32099053

RESUMEN

In this study we used meteorological parameters and predictive modelling interpreted by model explanation to develop stress metrics that indicate the presence of drought and heat stress at the specific environment. We started from the extreme temperature and precipitation indices, modified some of them and introduced additional drought indices relevant to the analysis. Based on maize's sensitivity to stress, the growing season was divided into four stages. The features were calculated throughout the growing season and split in two groups, one for the drought and the other for heat stress. Generated meteorological features were combined with soil features and fed to random forest regression model for the yield prediction. Model explanation gave us the contribution of features to yield decrease, from which we estimated total amount of stress at the environments, which represents new environmental index. Using this index we ranked the environments according to the level of stress. More than 2400 hybrids were tested across the environments where they were grown and based on the yield stability they were marked as either tolerant or susceptible to heat, drought or combined heat and drought stress. Presented methodology and results were produced within the Syngenta Crop Challenge 2019.


Asunto(s)
Aclimatación , Genotipo , Respuesta al Choque Térmico , Hibridación Genética , Modelos Biológicos , Zea mays , Producción de Cultivos , Meteorología , Hojas de la Planta/genética , Hojas de la Planta/crecimiento & desarrollo , Zea mays/genética , Zea mays/crecimiento & desarrollo
7.
PLoS One ; 12(9): e0184198, 2017.
Artículo en Inglés | MEDLINE | ID: mdl-28863173

RESUMEN

The aim of this work was to develop a method for selection of optimal soybean varieties for the American Midwest using data analytics. We extracted the knowledge about 174 varieties from the dataset, which contained information about weather, soil, yield and regional statistical parameters. Next, we predicted the yield of each variety in each of 6,490 observed subregions of the Midwest. Furthermore, yield was predicted for all the possible weather scenarios approximated by 15 historical weather instances contained in the dataset. Using predicted yields and covariance between varieties through different weather scenarios, we performed portfolio optimisation. In this way, for each subregion, we obtained a selection of varieties, that proved superior to others in terms of the amount and stability of yield. According to the rules of Syngenta Crop Challenge, for which this research was conducted, we aggregated the results across all subregions and selected up to five soybean varieties that should be distributed across the network of seed retailers. The work presented in this paper was the winning solution for Syngenta Crop Challenge 2017.


Asunto(s)
Productos Agrícolas , Glycine max/genética , Tiempo (Meteorología) , Agricultura/métodos , Cambio Climático , Medio Oeste de Estados Unidos , Modelos Estadísticos , Análisis de Regresión , Semillas/genética , Incertidumbre
8.
Stud Health Technol Inform ; 224: 181-3, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-27225576

RESUMEN

Lumbar disc herniation (LDH) is the most common disease among working population requiring surgical intervention. This study aims to predict the return to work after operative treatment of LDH based on the observational study including 153 patients. The classification problem was approached using decision trees (DT), support vector machines (SVM) and multilayer perception (MLP) combined with RELIEF algorithm for feature selection. MLP provided best recall of 0.86 for the class of patients not returning to work, which combined with the selected features enables early identification and personalized targeted interventions towards subjects at risk of prolonged disability. The predictive modeling indicated at the most decisive risk factors in prolongation of work absence: psychosocial factors, mobility of the spine and structural changes of facet joints and professional factors including standing, sitting and microclimate.


Asunto(s)
Discectomía/métodos , Desplazamiento del Disco Intervertebral/cirugía , Reinserción al Trabajo , Resultado del Tratamiento , Algoritmos , Árboles de Decisión , Femenino , Humanos , Masculino , Microcirugia/métodos , Modelos Teóricos , Medicina del Trabajo , Serbia , Máquina de Vectores de Soporte
9.
Sci Rep ; 6: 19342, 2016 Jan 13.
Artículo en Inglés | MEDLINE | ID: mdl-26758042

RESUMEN

An increasing amount of geo-referenced mobile phone data enables the identification of behavioral patterns, habits and movements of people. With this data, we can extract the knowledge potentially useful for many applications including the one tackled in this study - understanding spatial variation of epidemics. We explored the datasets collected by a cell phone service provider and linked them to spatial HIV prevalence rates estimated from publicly available surveys. For that purpose, 224 features were extracted from mobility and connectivity traces and related to the level of HIV epidemic in 50 Ivory Coast departments. By means of regression models, we evaluated predictive ability of extracted features. Several models predicted HIV prevalence that are highly correlated (>0.7) with actual values. Through contribution analysis we identified key elements that correlate with the rate of infections and could serve as a proxy for epidemic monitoring. Our findings indicate that night connectivity and activity, spatial area covered by users and overall migrations are strongly linked to HIV. By visualizing the communication and mobility flows, we strived to explain the spatial structure of epidemics. We discovered that strong ties and hubs in communication and mobility align with HIV hot spots.


Asunto(s)
Teléfono Celular , Infecciones por VIH/epidemiología , Vigilancia de la Población , Análisis Espacial , Adolescente , Adulto , Geografía Médica , Humanos , Persona de Mediana Edad , Prevalencia , Serbia/epidemiología , Adulto Joven
10.
IEEE J Biomed Health Inform ; 19(2): 698-708, 2015 Mar.
Artículo en Inglés | MEDLINE | ID: mdl-24733033

RESUMEN

Recent developments in molecular biology and techniques for genome-wide data acquisition have resulted in abundance of data to profile genes and predict their function. These datasets may come from diverse sources and it is an open question how to commonly address them and fuse them into a joint prediction model. A prevailing technique to identify groups of related genes that exhibit similar profiles is profile-based clustering. Cluster inference may benefit from consensus across different clustering models. In this paper, we propose a technique that develops separate gene clusters from each of available data sources and then fuses them by means of nonnegative matrix factorization. We use gene profile data on the budding yeast S. cerevisiae to demonstrate that this approach can successfully integrate heterogeneous datasets and yield high-quality clusters that could otherwise not be inferred by simply merging the gene profiles prior to clustering.


Asunto(s)
Biología Computacional/métodos , Perfilación de la Expresión Génica/métodos , Modelos Estadísticos , Algoritmos , Análisis por Conglomerados , Bases de Datos Genéticas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...