ABSTRACT
Urinary tract infections (UTIs) are a worldwide health problem. Fast and accurate detection of bacterial infection is essential to provide appropriate antibiotherapy to patients and to avoid the emergence of drug-resistant pathogens. While the gold standard requires 24 h to 48 h of bacteria culture prior to MALDI-TOF species identification, we propose a culture-free workflow, enabling bacterial identification and quantification in less than 4 h using 1 ml of urine. After rapid and automatable sample preparation, a signature of 82 bacterial peptides, defined by machine learning, was monitored in LC-MS, to distinguish the 15 species causing 84% of the UTIs. The combination of the sensitivity of the SRM mode on a triple quadrupole TSQ Altis instrument and the robustness of capillary flow enabled us to analyze up to 75 samples per day, with 99.2% accuracy on bacterial inoculations of healthy urines. We have also shown our method can be used to quantify the spread of the infection, from 8 × 104 to 3 × 107 CFU/ml. Finally, the workflow was validated on 45 inoculated urines and on 84 UTI-positive urine from patients, with respectively 93.3% and 87.1% of agreement with the culture-MALDI procedure at a level above 1 × 105 CFU/ml corresponding to an infection requiring antibiotherapy.
ABSTRACT
BACKGROUND: Vitamin C (ascorbate) is a water-soluble antioxidant and an important cofactor for various biosynthetic and regulatory enzymes. Mice can synthesize vitamin C thanks to the key enzyme gulonolactone oxidase (Gulo) unlike humans. In the current investigation, we used Gulo-/- mice, which cannot synthesize their own ascorbate to determine the impact of this vitamin on both the transcriptomics and proteomics profiles in the whole liver. The study included Gulo-/- mouse groups treated with either sub-optimal or optimal ascorbate concentrations in drinking water. Liver tissues of females and males were collected at the age of four months and divided for transcriptomics and proteomics analysis. Immunoblotting, quantitative RT-PCR, and polysome profiling experiments were also conducted to complement our combined omics studies. RESULTS: Principal component analyses revealed distinctive differences in the mRNA and protein profiles as a function of sex between all the mouse cohorts. Despite such sexual dimorphism, Spearman analyses of transcriptomics data from females and males revealed correlations of hepatic ascorbate levels with transcripts encoding a wide array of biological processes involved in glucose and lipid metabolisms as well as in the acute-phase immune response. Moreover, integration of the proteomics data showed that ascorbate modulates the abundance of various enzymes involved in lipid, xenobiotic, organic acid, acetyl-CoA, and steroid metabolism mainly at the transcriptional level, especially in females. However, several proteins of the mitochondrial complex III significantly correlated with ascorbate concentrations in both males and females unlike their corresponding transcripts. Finally, poly(ribo)some profiling did not reveal significant enrichment difference for these mitochondrial complex III mRNAs between Gulo-/- mice treated with sub-optimal and optimal ascorbate levels. CONCLUSIONS: Thus, the abundance of several subunits of the mitochondrial complex III are regulated by ascorbate at the post-transcriptional levels. Our extensive omics analyses provide a novel resource of altered gene expression patterns at the transcriptional and post-transcriptional levels under ascorbate deficiency.
Subject(s)
Ascorbic Acid , Liver , Proteomics , Animals , Ascorbic Acid/metabolism , Liver/metabolism , Liver/drug effects , Female , Male , Mice , L-Gulonolactone Oxidase/genetics , L-Gulonolactone Oxidase/metabolism , Gene Expression Profiling , Transcriptome , Principal Component Analysis , Antioxidants/metabolismABSTRACT
Human infection with the coronavirus disease 2019 (COVID-19) is mediated by the binding of the spike protein of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) to the human angiotensin-converting enzyme 2 (ACE2). The frequent mutations in the receptor-binding domain (RBD) of the spike protein induced the emergence of variants with increased contagion and can hinder vaccine efficiency. Hence, it is crucial to better understand the binding mechanisms of variant RBDs to human ACE2 and develop efficient methods to characterize this interaction. In this work, we present an approach that uses machine learning to analyze the molecular dynamics simulations of RBD variant trajectories bound to ACE2. Along with the binding free energy calculation, this method was used to characterize the major differences in ACE2-binding capacity of three SARS-CoV-2 RBD variants-namely the original Wuhan strain, Omicron BA.1, and the more recent Omicron BA.5 sublineages. Our analyses assessed the differences in binding free energy and shed light on how it affects the infectious rates of different variants. Furthermore, this approach successfully characterized key binding interactions and could be deployed as an efficient tool to predict different binding inhibitors to pave the way for new preventive and therapeutic strategies.
Subject(s)
Angiotensin-Converting Enzyme 2 , COVID-19 , Machine Learning , Molecular Dynamics Simulation , Protein Binding , SARS-CoV-2 , Spike Glycoprotein, Coronavirus , SARS-CoV-2/metabolism , SARS-CoV-2/genetics , Angiotensin-Converting Enzyme 2/metabolism , Angiotensin-Converting Enzyme 2/chemistry , Humans , Spike Glycoprotein, Coronavirus/metabolism , Spike Glycoprotein, Coronavirus/chemistry , Spike Glycoprotein, Coronavirus/genetics , COVID-19/virology , COVID-19/metabolism , Binding Sites , Mutation , Protein Interaction Domains and MotifsABSTRACT
MOTIVATION: The growing production of massive heterogeneous biological data offers opportunities for new discoveries. However, performing multi-omics data analysis is challenging, and researchers are forced to handle the ever-increasing complexity of both data management and evolution of our biological understanding. Substantial efforts have been made to unify biological datasets into integrated systems. Unfortunately, they are not easily scalable, deployable and searchable, locally or globally. RESULTS: This publication presents two tools with a simple structure that can help any data provider, organization or researcher, requiring a reliable data search and analysis base. The first tool is Kibio, a scalable and adaptable data storage based on Elasticsearch search engine. The second tool is KibioR, a R package to pull, push and search Kibio datasets or any accessible Elasticsearch-based databases. These tools apply a uniform data exchange model and minimize the burden of data management by organizing data into a decentralized, versatile, searchable and shareable structure. Several case studies are presented using multiple databases, from drug characterization to miRNAs and pathways identification, emphasizing the ease of use and versatility of the Kibio/KibioR framework. AVAILABILITYAND IMPLEMENTATION: Both KibioR and Elasticsearch are open source. KibioR package source is available at https://github.com/regisoc/kibior and the library on CRAN at https://cran.r-project.org/package=kibior. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
ABSTRACT
Over the past decade, the data-independent acquisition mode has gained popularity for broad coverage of complex proteomes by LC-MS/MS and quantification of low-abundance proteins. However, there is no consensus in the literature on the best data acquisition parameters and processing tools to use for this specific application. Here, we present the most comprehensive comparison of DIA workflows on Orbitrap instruments published so far in the field of proteomics. Using a standard human 48 proteins mixture (UPS1-Sigma) at 8 different concentrations in an E. coli proteome background, we tested 36 workflows including 4 different DIA window acquisition schemes and 6 different software tools (DIA-NN, DIA-Umpire, OpenSWATH, ScaffoldDIA, Skyline, and Spectronaut) with or without the use of a DDA spectral library. On the basis of the number of proteins identified, quantification linearity and reproducibility, as well as sensitivity and specificity in 28 pairwise comparisons of different UPS1 concentrations, we summarize the major considerations and propose guidelines for choosing the DIA workflow best suited for LC-MS/MS proteomic analyses. Our 96 DIA raw files and software outputs have been deposited on ProteomeXchange for testing or developing new DIA processing tools.
Subject(s)
Benchmarking , Proteomics , Chromatography, Liquid , Escherichia coli/genetics , Humans , Proteome , Reproducibility of Results , Software , Tandem Mass SpectrometryABSTRACT
PURPOSES: The objectives of this study were to investigate differences in gut microbiota (GM) composition after high dairy intake (HD) compared to adequate dairy intake (AD) and to correlate GM composition variations with the change in glycemic parameters in hyperinsulinemic subjects. METHODS: In this crossover study, 10 hyperinsulinemic adults were randomized to HD (≥ 4 servings/day) or AD (≤ 2 servings/day) for 6 weeks, separated by a 6-week washout period. Fasting insulin and glucose levels were measured after each intervention. Insulin resistance was calculated with the homeostasis model assessment of insulin resistance (HOMA-IR). GM was determined with 16S rRNA-based high-throughput sequencing at the end of each intervention. Paired t test, correlations and machine learning analyses were performed. RESULTS: Endpoint glycemic parameters were not different between HD and AD intake. After HD compared with AD intake, there was a decrease in the abundance of bacteria in Roseburia and Verrucomicrobia (p = 0.04 and p = 0.02, respectively) and a trend for an increase abundance in Faecalibacteria and Flavonifractor (p = 0.05 and p = 0.06, respectively). The changes in abundance of Coriobacteriia, Erysipelotrichia, and Flavonifractor were negatively correlated with the change in HOMA-IR between the AD and HD phases. Furthermore, a predictive GM signature, including Anaerotruncus, Flavonifractor, Ruminococcaceae, and Subdoligranulum, was related to HOMA-IR. CONCLUSION: Overall, these results suggest that HD modifies the abundance of specific butyrate-producing bacteria in Firmicutes and of bacteria in Verrucomicrobia in hyperinsulinemic individuals. In addition, the butyrate producing bacteria in Firmicutes phylum correlate negatively with insulin resistance.
Subject(s)
Gastrointestinal Microbiome , Insulin Resistance , Adult , Cross-Over Studies , Dairy Products , Humans , RNA, Ribosomal, 16S/geneticsABSTRACT
Fast identification of microbial species in clinical samples is essential to provide an appropriate antibiotherapy to the patient and reduce the prescription of broad-spectrum antimicrobials leading to antibioresistances. MALDI-TOF-MS technology has become a tool of choice for microbial identification but has several drawbacks: it requires a long step of bacterial culture before analysis (≥24 h), has a low specificity and is not quantitative. We developed a new strategy for identifying bacterial species in urine using specific LC-MS/MS peptidic signatures. In the first training step, libraries of peptides are obtained on pure bacterial colonies in DDA mode, their detection in urine is then verified in DIA mode, followed by the use of machine learning classifiers (NaiveBayes, BayesNet and Hoeffding tree) to define a peptidic signature to distinguish each bacterial species from the others. Then, in the second step, this signature is monitored in unknown urine samples using targeted proteomics. This method, allowing bacterial identification in less than 4 h, has been applied to fifteen species representing 84% of all Urinary Tract Infections. More than 31,000 peptides in 190 samples were quantified by DIA and classified by machine learning to determine an 82 peptides signature and build a prediction model. This signature was validated for its use in routine using Parallel Reaction Monitoring on two different instruments. Linearity and reproducibility of the method were demonstrated as well as its accuracy on donor specimens. Within 4h and without bacterial culture, our method was able to predict the predominant bacteria infecting a sample in 97% of cases and 100% above the standard threshold. This work demonstrates the efficiency of our method for the rapid and specific identification of the bacterial species causing UTI and could be extended in the future to other biological specimens and to bacteria having specific virulence or resistance factors.
Subject(s)
Bacteria/classification , Bacterial Proteins/urine , Bacteriuria/urine , Chromatography, Liquid/methods , Machine Learning , Tandem Mass Spectrometry/methods , Bacteria/isolation & purification , Humans , Peptides/urine , Proteomics , Spectrometry, Mass, Matrix-Assisted Laser Desorption-IonizationABSTRACT
BACKGROUND: Perturbation of the major UGT2B17-dependent androgen catabolism pathway has the potential to affect prostate cancer (PCa) progression. The objective was to evaluate UGT2B17 protein expression in primary tumours in relation to hormone levels, disease characteristics and cancer evolution. METHODS: We conducted an analysis of a high-density prostate tumour tissue microarray consisting of 239 localised PCa cases treated by radical prostatectomy (RP). Cox proportional hazard ratio analysis was used to evaluate biochemical recurrence (BCR), and a linear regression model evaluated variations in circulating hormone levels measured by mass spectrometry. The transcriptome of UGT2B17 in PCa was established by using RNA-sequencing data. RESULTS: UGT2B17 expression in primary tumours was associated with node-positive disease at RP and linked to circulating levels of 3α-diol-17 glucuronide, a major circulating DHT metabolite produced by the UGT2B17 pathway. UGT2B17 was an independent prognostic factor linked to BCR after RP, and its overexpression was associated with development of metastasis. Finally, we demonstrated that distinctive alternative promoters dictate UGT2B17-dependent androgen catabolism in localised and metastatic PCa. CONCLUSIONS: The androgen-inactivating gene UGT2B17 is controlled by overlooked regulatory regions in PCa. UGT2B17 expression in primary tumours influences the steroidome, and is associated with relevant clinical outcomes, such as BCR and metastasis.
Subject(s)
Androgens/metabolism , Glucuronosyltransferase/metabolism , Minor Histocompatibility Antigens/metabolism , Prostatic Neoplasms/genetics , Adult , Aged , Disease Progression , Humans , Male , Middle Aged , Prostatic Neoplasms/pathologyABSTRACT
MicroRNAs (miRNA) are short single-stranded RNA molecules derived from hairpin-forming precursors that play a crucial role as post-transcriptional regulators in eukaryotes and viruses. In the past years, many microRNA target genes (MTGs) have been identified experimentally. However, because of the high costs of experimental approaches, target genes databases remain incomplete. Although several target prediction programs have been developed in the recent years to identify MTGs in silico, their specificity and sensitivity remain low. Here, we propose a new approach called MirAncesTar, which uses ancestral genome reconstruction to boost the accuracy of existing MTGs prediction tools for human miRNAs. For each miRNA and each putative human target UTR, our algorithm makes uses of existing prediction tools to identify putative target sites in the human UTR, as well as in its mammalian orthologs and inferred ancestral sequences. It then evaluates evidence in support of selective pressure to maintain target site counts (rather than sequences), accounting for the possibility of target site turnover. It finally integrates this measure with several simpler ones using a logistic regression predictor. MirAncesTar improves the accuracy of existing MTG predictors by 26% to 157%. Source code and prediction results for human miRNAs, as well as supporting evolutionary data are available at http://cs.mcgill.ca/â¼blanchem/mirancestar.
Subject(s)
Computational Biology/methods , MicroRNAs/genetics , RNA Interference , RNA, Messenger/genetics , Algorithms , Animals , Binding Sites , Computer Simulation , Humans , MicroRNAs/chemistry , RNA, Messenger/chemistryABSTRACT
BACKGROUND: Wheat is a major staple crop with broad adaptability to a wide range of environmental conditions. This adaptability involves several stress and developmentally responsive genes, in which microRNAs (miRNAs) have emerged as important regulatory factors. However, the currently used approaches to identify miRNAs in this polyploid complex system focus on conserved and highly expressed miRNAs avoiding regularly those that are often lineage-specific, condition-specific, or appeared recently in evolution. In addition, many environmental and biological factors affecting miRNA expression were not yet considered, resulting still in an incomplete repertoire of wheat miRNAs. RESULTS: We developed a conservation-independent technique based on an integrative approach that combines machine learning, bioinformatic tools, biological insights of known miRNA expression profiles and universal criteria of plant miRNAs to identify miRNAs with more confidence. The developed pipeline can potentially identify novel wheat miRNAs that share features common to several species or that are species specific or clade specific. It allowed the discovery of 199 miRNA candidates associated with different abiotic stresses and development stages. We also highlight from the raw data 267 miRNAs conserved with 43 miRBase families. The predicted miRNAs are highly associated with abiotic stress responses, tolerance and development. GO enrichment analysis showed that they may play biological and physiological roles associated with cold, salt and aluminum (Al) through auxin signaling pathways, regulation of gene expression, ubiquitination, transport, carbohydrates, gibberellins, lipid, glutathione and secondary metabolism, photosynthesis, as well as floral transition and flowering. CONCLUSION: This approach provides a broad repertoire of hexaploid wheat miRNAs associated with abiotic stress responses, tolerance and development. These valuable resources of expressed wheat miRNAs will help in elucidating the regulatory mechanisms involved in freezing and Al responses and tolerance mechanisms as well as for development and flowering. In the long term, it may help in breeding stress tolerant plants.
Subject(s)
Computational Biology/methods , MicroRNAs/analysis , RNA, Plant/analysis , Triticum/growth & development , Triticum/genetics , Gene Expression Profiling/methods , Gene Expression Regulation, Plant , Machine Learning , Polyploidy , Species Specificity , Stress, PhysiologicalABSTRACT
MicroRNAs (miRNAs) are short RNA species derived from hairpin-forming miRNA precursors (pre-miRNA) and acting as key posttranscriptional regulators. Most computational tools labeled as miRNA predictors are in fact pre-miRNA predictors and provide no information about the putative miRNA location within the pre-miRNA. Sequence and structural features that determine the location of the miRNA, and the extent to which these properties vary from species to species, are poorly understood. We have developed miRdup, a computational predictor for the identification of the most likely miRNA location within a given pre-miRNA or the validation of a candidate miRNA. MiRdup is based on a random forest classifier trained with experimentally validated miRNAs from miRbase, with features that characterize the miRNA-miRNA* duplex. Because we observed that miRNAs have sequence and structural properties that differ between species, mostly in terms of duplex stability, we trained various clade-specific miRdup models and obtained increased accuracy. MiRdup self-trains on the most recent version of miRbase and is easy to use. Combined with existing pre-miRNA predictors, it will be valuable for both de novo mapping of miRNAs and filtering of large sets of candidate miRNAs obtained from transcriptome sequencing projects. MiRdup is open source under the GPLv3 and available at http://www.cs.mcgill.ca/â¼blanchem/mirdup/.
Subject(s)
Computational Biology/methods , MicroRNAs/analysis , RNA Precursors/analysis , RNA, Plant/analysis , Software , Animals , Internet , Inverted Repeat Sequences , MicroRNAs/genetics , Nucleic Acid Conformation , Plants/genetics , RNA Precursors/genetics , RNA, Plant/genetics , Sensitivity and Specificity , Sequence Analysis, RNA/methodsABSTRACT
Vitamin C (ascorbic acid) is an important water-soluble antioxidant associated with decreased oxidative stress in type 2 diabetes (T2D) patients. A previous targeted plasma proteomic study has indicated that ascorbic acid is associated with markers of the immune system in healthy subjects. However, the association between the levels of ascorbic acid and blood biomarkers in subjects at risk of developing T2D is still unknown. Serum ascorbic acid was measured by ultra-performance liquid chromatography and serum proteins were quantified by untargeted liquid-chromatography mass spectrometry in 25 hyperinsulinemia subjects that were randomly assigned a high dairy intake diet or an adequate dairy intake diet for 6 weeks, then crossed-over after a 6-week washout period. Spearman correlation followed by gene ontology analyses were performed to identify biological pathways associated with ascorbic acid. Finally, machine learning analysis was performed to obtain a specific serum protein signature that could predict ascorbic acid levels. After adjustments for waist circumference, LDL, HDL, fasting insulin, fasting blood glucose, age, gender, and dairy intake; serum ascorbic acid correlated positively with different aspects of the immune system. Machine learning analysis indicated that a signature composed of 21 features that included 17 proteins (mainly from the immune system), age, sex, waist circumference, and LDL could predict serum ascorbic acid levels in hyperinsulinemia subjects. In conclusion, the result reveals a correlation as well as modulation between serum ascorbic acid levels and proteins that play vital roles in regulating different aspects of the immune response in individuals at risk of T2D. The development of a predictive signature for ascorbic acid will further help the assessment of ascorbic acid status in clinical settings.
Subject(s)
Diabetes Mellitus, Type 2 , Hyperinsulinism , Humans , Ascorbic Acid , Blood Proteins , Lipoproteins, LDL , Proteomics , Waist Circumference , Male , FemaleABSTRACT
Biomedical research takes advantage of omic data, such as transcriptomics, to unravel the complexity of diseases. A conventional strategy identifies transcriptomic biomarkers characterized by expression patterns associated with a phenotype by relying on feature selection approaches. Hybrid ensemble feature selection (HEFS) has become increasingly popular as it ensures robustness of the selected features by performing data and functional perturbations. However, it remains difficult to make the best suited choices at each step when designing such approaches. We conducted an extensive analysis of four possible HEFS scenarios for the identification of Stage IV colorectal, Stage I kidney and lung and Stage III endometrial cancer biomarkers from transcriptomic data. These scenarios investigate the use of two types of feature reduction by filters (differentially expressed genes and variance) conjointly with two types of resampling strategies (repeated holdout by distribution-balanced stratified and random stratified) for downstream feature selection through an aggregation of thousands of wrapped machine learning models. Based on our results, we emphasize the advantages of using HEFS approaches to identify complex disease biomarkers, given their ability to produce generalizable and stable results to both data and functional perturbations. Finally, we highlight critical issues that need to be considered in the design of such strategies.
ABSTRACT
In this study, we introduce an affordable and accessible method that combines optical microscopy and photogrammetry to reconstruct 3D models of Tahitian pearls. We present a novel device designed for acquiring microscopic images around a sphere using translational displacement stages and outline our method for reconstructing these images. We successfully created 3D models of two individual pearl rings, each representing 6.3% of the pearl's surface. Additionally, we generated a combined model representing 10.3% of the pearl's surface. This showcases the potential for reconstructing entire pearls with appropriate instrumentation. We emphasize that our approach extends beyond pearls and spherical objects and can be adapted for various object types using appropriate acquisition devices. We provide a proof of concept demonstrating the feasibility of 3D photogrammetry using optical microscopy. Consequently, our method offers a practical and cost-effective alternative for generating 3D models at a microscopic scale, particularly when detailed internal structure information is unnecessary.
ABSTRACT
The discovery of novel therapeutic targets, defined as proteins which drugs can interact with to induce therapeutic benefits, typically represent the first and most important step of drug discovery. One solution for target discovery is target repositioning, a strategy which relies on the repurposing of known targets for new diseases, leading to new treatments, less side effects and potential drug synergies. Biological networks have emerged as powerful tools for integrating heterogeneous data and facilitating the prediction of biological or therapeutic properties. Consequently, they are widely employed to predict new therapeutic targets by characterizing potential candidates, often based on their interactions within a Protein-Protein Interaction (PPI) network, and their proximity to genes associated with the disease. However, over-reliance on PPI networks and the assumption that potential targets are necessarily near known genes can introduce biases that may limit the effectiveness of these methods. This study addresses these limitations in two ways. First, by exploiting a multi-layer network which incorporates additional information such as gene regulation, metabolite interactions, metabolic pathways, and several disease signatures such as Differentially Expressed Genes, mutated genes, Copy Number Alteration, and structural variants. Second, by extracting relevant features from the network using several approaches including proximity to disease-associated genes, but also unbiased approaches such as propagation-based methods, topological metrics, and module detection algorithms. Using prostate cancer as a case study, the best features were identified and utilized to train machine learning algorithms to predict 5 novel promising therapeutic targets for prostate cancer: IGF2R, C5AR, RAB7, SETD2 and NPBWR1.
ABSTRACT
Background: Aortic valve stenosis (AS) is a progressive chronic disease with progression rates that vary in patients and therefore difficult to predict. Objectives: The aim of this study was to predict the progression of AS using comprehensive and longitudinal patient data. Methods: Machine and deep learning algorithms were trained on a data set of 303 patients enrolled in the PROGRESSA (Metabolic Determinants of the Progression of Aortic Stenosis) study who underwent clinical and echocardiographic follow-up on an annual basis. Performance of the models was measured to predict disease progression over long (next 5 years) and short (next 2 years) terms and was compared to a standard clinical model with usually used features in clinical settings based on logistic regression. Results: For each annual follow-up visit including baseline, we trained various supervised learning algorithms in predicting disease progression at 2- and 5-year terms. At both terms, LightGBM consistently outperformed other models with the highest average area under curves across patient visits (0.85 at 2 years, 0.83 at 5 years). Recurrent neural network-based models (Gated Recurrent Unit and Long Short-Term Memory) and XGBoost also demonstrated strong predictive capabilities, while the clinical model showed the lowest performance. Conclusions: This study demonstrates how an artificial intelligence-guided approach in clinical routine could help enhance risk stratification of AS. It presents models based on multisource comprehensive data to predict disease progression and clinical outcomes in patients with mild-to-moderate AS at baseline.
ABSTRACT
ABSTRACT: Megakaryocytes (MKs), integral to platelet production, predominantly reside in the bone marrow (BM) and undergo regulated fragmentation within sinusoid vessels to release platelets into the bloodstream. Inflammatory states and infections influence MK transcription, potentially affecting platelet functionality. Notably, COVID-19 has been associated with altered platelet transcriptomes. In this study, we investigated the hypothesis that severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection could affect the transcriptome of BM MKs. Using spatial transcriptomics to discriminate subpopulations of MKs based on proximity to BM sinusoids, we identified â¼19 000 genes in MKs. Machine learning techniques revealed that the transcriptome of healthy murine BM MKs exhibited minimal differences based on proximity to sinusoid vessels. Furthermore, at peak SARS-CoV-2 viremia, when the disease primarily affected the lungs, MKs were not significantly different from those from healthy mice. Conversely, a significant divergence in the MK transcriptome was observed during systemic inflammation, although SARS-CoV-2 RNA was never detected in the BM, and it was no longer detectable in the lungs. Under these conditions, the MK transcriptional landscape was enriched in pathways associated with histone modifications, MK differentiation, NETosis, and autoimmunity, which could not be explained by cell proximity to sinusoid vessels. Notably, the type I interferon signature and calprotectin (S100A8/A9) were not induced in MKs under any condition. However, inflammatory cytokines induced in the blood and lungs of COVID-19 mice were different from those found in the BM, suggesting a discriminating impact of inflammation on this specific subset of cells. Collectively, our data indicate that a new population of BM MKs may emerge through COVID-19-related pathogenesis.
Subject(s)
Bone Marrow , COVID-19 , Megakaryocytes , SARS-CoV-2 , Transcriptome , COVID-19/pathology , COVID-19/virology , COVID-19/genetics , COVID-19/metabolism , Megakaryocytes/metabolism , Megakaryocytes/virology , Animals , SARS-CoV-2/physiology , SARS-CoV-2/genetics , Mice , Bone Marrow/metabolism , Bone Marrow/pathology , Calgranulin B/metabolism , Calgranulin B/genetics , Humans , Calgranulin A/metabolism , Calgranulin A/genetics , Disease Models, AnimalABSTRACT
Liquid Chromatography Mass Spectrometry (LC-MS) is a powerful method for profiling complex biological samples. However, batch effects typically arise from differences in sample processing protocols, experimental conditions, and data acquisition techniques, significantly impacting the interpretability of results. Correcting batch effects is crucial for the reproducibility of omics research, but current methods are not optimal for the removal of batch effects without compressing the genuine biological variation under study. We propose a suite of Batch Effect Removal Neural Networks (BERNN) to remove batch effects in large LC-MS experiments, with the goal of maximizing sample classification performance between conditions. More importantly, these models must efficiently generalize in batches not seen during training. A comparison of batch effect correction methods across five diverse datasets demonstrated that BERNN models consistently showed the strongest sample classification performance. However, the model producing the greatest classification improvements did not always perform best in terms of batch effect removal. Finally, we show that the overcorrection of batch effects resulted in the loss of some essential biological variability. These findings highlight the importance of balancing batch effect removal while preserving valuable biological diversity in large-scale LC-MS experiments.
Subject(s)
Liquid Chromatography-Mass Spectrometry , Neural Networks, Computer , Reproducibility of ResultsABSTRACT
Tahitian pearls, artificially cultivated from the black-lipped pearl oyster Pinctada margaritifera, are renowned for their unique color and large size, making the pearl industry vital for the French Polynesian economy. Understanding the mechanisms of pearl formation is essential for enabling quality and sustainable production. In this paper, we explore the process of pearl formation by studying pearl rotation. Here we show, using a deep convolutional neural network, a direct link between the rotation of the pearl during its formation in the oyster and its final shape. We propose a new method for non-invasive pearl monitoring and a model for predicting the final shape of the pearl from rotation data with 81.9% accuracy. These novel resources provide a fresh perspective to study and enhance our comprehension of the overall mechanism of pearl formation, with potential long-term applications for improving pearl production and quality control in the industry.
Subject(s)
Pinctada , Animals , RotationABSTRACT
Machine learning (ML) algorithms are powerful tools to find complex patterns and biomarker signatures when conventional statistical methods fail to identify them. While the ML field made significant progress, state of the art methodologies to build efficient and non-overfitting models are not always applied in the literature. To this purpose, automatic programs, such as BioDiscML, were designed to identify biomarker signatures and correlated features while escaping overfitting using multiple evaluation strategies, such as cross validation, bootstrapping and repeated holdout. To further improve BioDiscML and reach a broader audience, better visualization support and flexibility in choosing the best models and signatures are needed. Thus, to provide researchers with an easily accessible and usable tool for in depth investigation of the results from BioDiscML outputs, we developed a visual interaction tool called BioDiscViz. This tool provides summaries, tables and graphics, in the form of Principal Component Analysis (PCA) plots, UMAP, t-SNE, heatmaps and boxplots for the best model and the correlated features. Furthermore, this tool also provides visual support to extract a consensus signature from BioDiscML models using a combination of filters. BioDiscViz will be a great visual support for research using ML, hence new opportunities in this field by opening it to a broader community.