Your browser doesn't support javascript.
loading
: 20 | 50 | 100
1 - 13 de 13
1.
BMC Bioinformatics ; 25(1): 26, 2024 Jan 15.
Article En | MEDLINE | ID: mdl-38225565

BACKGROUND: In recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research. RESULTS: Three experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods. CONCLUSIONS: We developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results.


Autism Spectrum Disorder , Biomedical Research , Diabetes Mellitus, Type 2 , Microbiota , Humans , RNA, Ribosomal, 16S/genetics , Reproducibility of Results , Diabetes Mellitus, Type 2/genetics , Machine Learning , Biomarkers , Microbiota/genetics
2.
Math Biosci Eng ; 20(12): 20528-20552, 2023 Nov 14.
Article En | MEDLINE | ID: mdl-38124564

Odor is central to food quality. Still, a major challenge is to understand how the odorants present in a given food contribute to its specific odor profile, and how to predict this olfactory outcome from the chemical composition. In this proof-of-concept study, we seek to develop an integrative model that combines expert knowledge, fuzzy logic, and machine learning to predict the quantitative odor description of complex mixtures of odorants. The model output is the intensity of relevant odor sensory attributes calculated on the basis of the content in odor-active comounds. The core of the model is the mathematically formalized knowledge of four senior flavorists, which provided a set of optimized rules describing the sensory-relevant combinations of odor qualities the experts have in mind to elaborate the target odor sensory attributes. The model first queries analytical and sensory databases in order to standardize, homogenize, and quantitatively code the odor descriptors of the odorants. Then the standardized odor descriptors are translated into a limited number of odor qualities used by the experts thanks to an ontology. A third step consists of aggregating all the information in terms of odor qualities across all the odorants found in a given product. The final step is a set of knowledge-based fuzzy membership functions representing the flavorist expertise and ensuring the prediction of the intensity of the target odor sensory descriptors on the basis of the products' aggregated odor qualities; several methods of optimization of the fuzzy membership functions have been tested. Finally, the model was applied to predict the odor profile of 16 red wines from two grape varieties for which the content in odorants was available. The results showed that the model can predict the perceptual outcome of food odor with a certain level of accuracy, and may also provide insights into combinations of odorants not mentioned by the experts.


Artificial Intelligence , Odorants , Smell , Machine Learning , Fuzzy Logic
3.
Clin Transl Allergy ; 13(11): e12306, 2023 Nov.
Article En | MEDLINE | ID: mdl-38006387

BACKGROUND: Not being well controlled by therapy with inhaled corticosteroids and long-acting ß2 agonist bronchodilators is a major concern for severe-asthma patients. The current treatment option for these patients is the use of biologicals such as anti-IgE treatment, omalizumab, as an add-on therapy. Despite the accepted use of omalizumab, patients do not always benefit from it. Therefore, there is a need to identify reliable biomarkers as predictors of omalizumab response. METHODS: Two novel computational algorithms, machine-learning based Recursive Ensemble Feature Selection (REFS) and rule-based algorithm Logic Explainable Networks (LEN), were used on open accessible mRNA expression data from moderate-to-severe asthma patients to identify genes as predictors of omalizumab response. RESULTS: With REFS, the number of features was reduced from 28,402 genes to 5 genes while obtaining a cross-validated accuracy of 0.975. The 5 responsiveness predictive genes encode the following proteins: Coiled-coil domain- containing protein 113 (CCDC113), Solute Carrier Family 26 Member 8 (SLC26A), Protein Phosphatase 1 Regulatory Subunit 3D (PPP1R3D), C-Type lectin Domain Family 4 member C (CLEC4C) and LOC100131780 (not annotated). The LEN algorithm found 4 identical genes with REFS: CCDC113, SLC26A8 PPP1R3D and LOC100131780. Literature research showed that the 4 identified responsiveness predicting genes are associated with mucosal immunity, cell metabolism, and airway remodeling. CONCLUSION AND CLINICAL RELEVANCE: Both computational methods show 4 identical genes as predictors of omalizumab response in moderate-to-severe asthma patients. The obtained high accuracy indicates that our approach has potential in clinical settings. Future studies in relevant cohort data should validate our computational approach.

4.
Front Microbiol ; 14: 1261889, 2023.
Article En | MEDLINE | ID: mdl-37808286

Microbiome data predictive analysis within a machine learning (ML) workflow presents numerous domain-specific challenges involving preprocessing, feature selection, predictive modeling, performance estimation, model interpretation, and the extraction of biological information from the results. To assist decision-making, we offer a set of recommendations on algorithm selection, pipeline creation and evaluation, stemming from the COST Action ML4Microbiome. We compared the suggested approaches on a multi-cohort shotgun metagenomics dataset of colorectal cancer patients, focusing on their performance in disease diagnosis and biomarker discovery. It is demonstrated that the use of compositional transformations and filtering methods as part of data preprocessing does not always improve the predictive performance of a model. In contrast, the multivariate feature selection, such as the Statistically Equivalent Signatures algorithm, was effective in reducing the classification error. When validated on a separate test dataset, this algorithm in combination with random forest modeling, provided the most accurate performance estimates. Lastly, we showed how linear modeling by logistic regression coupled with visualization techniques such as Individual Conditional Expectation (ICE) plots can yield interpretable results and offer biological insights. These findings are significant for clinicians and non-experts alike in translational applications.

5.
Sci Rep ; 13(1): 15782, 2023 09 22.
Article En | MEDLINE | ID: mdl-37737287

As the COVID-19 pandemic winds down, it leaves behind the serious concern that future, even more disruptive pandemics may eventually surface. One of the crucial steps in handling the SARS-CoV-2 pandemic was being able to detect the presence of the virus in an accurate and timely manner, to then develop policies counteracting the spread. Nevertheless, as the pandemic evolved, new variants with potentially dangerous mutations appeared. Faced by these developments, it becomes clear that there is a need for fast and reliable techniques to create highly specific molecular tests, able to uniquely identify VOCs. Using an automated pipeline built around evolutionary algorithms, we designed primer sets for SARS-CoV-2 (main lineage) and for VOC, B.1.1.7 (Alpha) and B.1.1.529 (Omicron). Starting from sequences openly available in the GISAID repository, our pipeline was able to deliver the primer sets for the main lineage and each variant in a matter of hours. Preliminary in-silico validation showed that the sequences in the primer sets featured high accuracy. A pilot test in a laboratory setting confirmed the results: the developed primers were favorably compared against existing commercial versions for the main lineage, and the specific versions for the VOCs B.1.1.7 and B.1.1.529 were clinically tested successfully.


COVID-19 , SARS-CoV-2 , Humans , SARS-CoV-2/genetics , COVID-19/diagnosis , Pandemics , Artificial Intelligence
7.
Foods ; 10(1)2021 Jan 04.
Article En | MEDLINE | ID: mdl-33406629

In recent years, modelling techniques have become more frequently adopted in the field of food processing, especially for cereal-based products, which are among the most consumed foods in the world. Predictive models and simulations make it possible to explore new approaches and optimize proceedings, potentially helping companies reduce costs and limit carbon emissions. Nevertheless, as the different phases of the food processing chain are highly specialized, advances in modelling are often unknown outside of a single domain, and models rarely take into account more than one step. This paper introduces the first high-level overview of modelling techniques employed in different parts of the cereal supply chain, from farming to storage, from drying to milling, from processing to consumption. This review, issued from a networking project including researchers from over 30 different countries, aims at presenting the current state of the art in each domain, showing common trends and synergies, to finally suggest promising future venues for research.

8.
Sci Rep ; 11(1): 947, 2021 01 13.
Article En | MEDLINE | ID: mdl-33441822

In this paper, deep learning is coupled with explainable artificial intelligence techniques for the discovery of representative genomic sequences in SARS-CoV-2. A convolutional neural network classifier is first trained on 553 sequences from the National Genomics Data Center repository, separating the genome of different virus strains from the Coronavirus family with 98.73% accuracy. The network's behavior is then analyzed, to discover sequences used by the model to identify SARS-CoV-2, ultimately uncovering sequences exclusive to it. The discovered sequences are validated on samples from the National Center for Biotechnology Information and Global Initiative on Sharing All Influenza Data repositories, and are proven to be able to separate SARS-CoV-2 from different virus strains with near-perfect accuracy. Next, one of the sequences is selected to generate a primer set, and tested against other state-of-the-art primer sets, obtaining competitive results. Finally, the primer is synthesized and tested on patient samples (n = 6 previously tested positive), delivering a sensitivity similar to routine diagnostic methods, and 100% specificity. The proposed methodology has a substantial added value over existing methods, as it is able to both automatically identify promising primer sets for a virus from a limited amount of data, and deliver effective results in a minimal amount of time. Considering the possibility of future pandemics, these characteristics are invaluable to promptly create specific detection methods for diagnostics.


DNA Primers/genetics , Deep Learning , Limit of Detection , Polymerase Chain Reaction/methods , SARS-CoV-2/genetics , SARS-CoV-2/isolation & purification
9.
Cancers (Basel) ; 12(7)2020 Jul 03.
Article En | MEDLINE | ID: mdl-32635415

Circulating microRNAs (miRNA) are small noncoding RNA molecules that can be detected in bodily fluids without the need for major invasive procedures on patients. miRNAs have shown great promise as biomarkers for tumors to both assess their presence and to predict their type and subtype. Recently, thanks to the availability of miRNAs datasets, machine learning techniques have been successfully applied to tumor classification. The results, however, are difficult to assess and interpret by medical experts because the algorithms exploit information from thousands of miRNAs. In this work, we propose a novel technique that aims at reducing the necessary information to the smallest possible set of circulating miRNAs. The dimensionality reduction achieved reflects a very important first step in a potential, clinically actionable, circulating miRNA-based precision medicine pipeline. While it is currently under discussion whether this first step can be taken, we demonstrate here that it is possible to perform classification tasks by exploiting a recursive feature elimination procedure that integrates a heterogeneous ensemble of high-quality, state-of-the-art classifiers on circulating miRNAs. Heterogeneous ensembles can compensate inherent biases of classifiers by using different classification algorithms. Selecting features then further eliminates biases emerging from using data from different studies or batches, yielding more robust and reliable outcomes. The proposed approach is first tested on a tumor classification problem in order to separate 10 different types of cancer, with samples collected over 10 different clinical trials, and later is assessed on a cancer subtype classification task, with the aim to distinguish triple negative breast cancer from other subtypes of breast cancer. Overall, the presented methodology proves to be effective and compares favorably to other state-of-the-art feature selection methods.

10.
BMC Bioinformatics ; 20(1): 480, 2019 Sep 18.
Article En | MEDLINE | ID: mdl-31533612

BACKGROUND: MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal tissues, is of utmost importance. Feature selection techniques applied in machine learning can help however they often provide naive or biased results. RESULTS: An ensemble feature selection strategy for miRNA signatures is proposed. miRNAs are chosen based on consensus on feature relevance from high-accuracy classifiers of different typologies. This methodology aims to identify signatures that are considerably more robust and reliable when used in clinically relevant prediction tasks. Using the proposed method, a 100-miRNA signature is identified in a dataset of 8023 samples, extracted from TCGA. When running eight-state-of-the-art classifiers along with the 100-miRNA signature against the original 1046 features, it could be detected that global accuracy differs only by 1.4%. Importantly, this 100-miRNA signature is sufficient to distinguish between tumor and normal tissues. The approach is then compared against other feature selection methods, such as UFS, RFE, EN, LASSO, Genetic Algorithms, and EFS-CLA. The proposed approach provides better accuracy when tested on a 10-fold cross-validation with different classifiers and it is applied to several GEO datasets across different platforms with some classifiers showing more than 90% classification accuracy, which proves its cross-platform applicability. CONCLUSIONS: The 100-miRNA signature is sufficiently stable to provide almost the same classification accuracy as the complete TCGA dataset, and it is further validated on several GEO datasets, across different types of cancer and platforms. Furthermore, a bibliographic analysis confirms that 77 out of the 100 miRNAs in the signature appear in lists of circulating miRNAs used in cancer studies, in stem-loop or mature-sequence form. The remaining 23 miRNAs offer potentially promising avenues for future research.


Machine Learning/trends , MicroRNAs/genetics , Neoplasms/classification , Humans
11.
Data Brief ; 25: 104204, 2019 Aug.
Article En | MEDLINE | ID: mdl-31406900

This data article contains annotation data characterizing Multi Criteria Assessment (MCA) Methods proposed in the agri-food sector by researchers from INRA, Europe's largest agricultural research institute (INRA, http://institut.inra.fr/en). MCA can be used to assess and compare agricultural and food systems, and support multi-actor decision making and design of innovative systems for crop production, animal production and processing of agricultural products. These data are stored in a public repository managed by INRA (https://data.inra.fr/; https://doi.org/10.15454/WB51LL).

12.
Foods ; 8(8)2019 Aug 01.
Article En | MEDLINE | ID: mdl-31374833

This paper gives an overview of scientific challenges that occur when performing life-cycle assessment (LCA) in the food supply chain. In order to evaluate these risks, the Failure Mode and Effect Analysis tool has been used. Challenges related to setting the goal and scope of LCA revealed four hot spots: system boundaries of LCA; used functional units; type and quality of data categories, and main assumptions and limitations of the study. Within the inventory analysis, challenging issues are associated with allocation of material and energy flows and waste streams released to the environment. Impact assessment brings uncertainties in choosing appropriate environmental impacts. Finally, in order to interpret results, a scientifically sound sensitivity analysis should be performed to check how stable calculations and results are. Identified challenges pave the way for improving LCA of food supply chains in order to enable comparison of results.

13.
Food Funct ; 8(12): 4404-4413, 2017 Dec 13.
Article En | MEDLINE | ID: mdl-29072742

This paper presents a novel model of protein hydrolysis and release of peptides by endoproteases. It requires the amino-acid sequence of the protein substrate to run, and makes use of simple Monte-Carlo in silico simulations to qualitatively and quantitatively predict the peptides that are likely to be produced during the course of the proteolytic reaction. In the present study, the model is applied to the case of pepsin, the gastric protease. Unlike pancreatic proteases, pepsin has a low substrate specificity and therefore displays a stochastic behavior that is particularly challenging to model and predict. Two versions of the model are studied and compared with peptidomic data obtained during pepsin hydrolysis of bovine lactoferrin. The first version of the model takes into account cleavage probabilities according to the amino acids in position P1-P1' only, whereas the second version also accounts for the influence of neighbor amino acids (P4, P3, P2, P2', P3', P4') and peptide terminal ends. The second version of the model was able to reproduce many real-world features of the reported behavior of pepsin, such as the peptide size distribution, or the quantity of free amino-acids. More remarkably, 50% of the experimentally monitored peptides (44/87) lay within the 120 most abundant simulated peptides. The presented methodology has the advantage of being applicable not only to different proteins, but to different enzymes as well, as long as cleavage frequency data are available.


Lactoferrin/chemistry , Animals , Biocatalysis , Cattle , Computer Simulation , Hydrolysis , Kinetics , Models, Molecular , Pepsin A/chemistry , Peptide Mapping , Peptides/chemistry , Substrate Specificity
...