|

1.

Automatized self-supervised learning for skin lesion screening.

Useini, Vullnet; Tanadini-Lang, Stephanie; Lohmeyer, Quentin; Meboldt, Mirko; Andratschke, Nicolaus; Braun, Ralph P; Barranco García, Javier.

Sci Rep ; 14(1): 12697, 2024 06 03.

Article En | MEDLINE | ID: mdl-38830890

Melanoma, the deadliest form of skin cancer, has seen a steady increase in incidence rates worldwide, posing a significant challenge to dermatologists. Early detection is crucial for improving patient survival rates. However, performing total body screening (TBS), i.e., identifying suspicious lesions or ugly ducklings (UDs) by visual inspection, can be challenging and often requires sound expertise in pigmented lesions. To assist users of varying expertise levels, an artificial intelligence (AI) decision support tool was developed. Our solution identifies and characterizes UDs from real-world wide-field patient images. It employs a state-of-the-art object detection algorithm to locate and isolate all skin lesions present in a patient's total body images. These lesions are then sorted based on their level of suspiciousness using a self-supervised AI approach, tailored to the specific context of the patient under examination. A clinical validation study was conducted to evaluate the tool's performance. The results demonstrated an average sensitivity of 95% for the top-10 AI-identified UDs on skin lesions selected by the majority of experts in pigmented skin lesions. The study also found that the tool increased dermatologists' confidence when formulating a diagnosis, and the average majority agreement with the top-10 AI-identified UDs reached 100% when assisted by our tool. With the development of this AI-based decision support tool, we aim to address the shortage of specialists, enable faster consultation times for patients, and demonstrate the impact and usability of AI-assisted screening. Future developments will include expanding the dataset to include histologically confirmed melanoma and validating the tool for additional body regions.

Early Detection of Cancer , Melanoma , Skin Neoplasms , Supervised Machine Learning , Humans , Skin Neoplasms/diagnosis , Melanoma/diagnosis , Early Detection of Cancer/methods , Artificial Intelligence , Algorithms , Male , Female , Skin/pathology

2.

DREAMER: a computational framework to evaluate readiness of datasets for machine learning.

Ahangaran, Meysam; Zhu, Hanzhi; Li, Ruihui; Yin, Lingkai; Jang, Joseph; Chaudhry, Arnav P; Farrer, Lindsay A; Au, Rhoda; Kolachalama, Vijaya B.

BMC Med Inform Decis Mak ; 24(1): 152, 2024 Jun 04.

Article En | MEDLINE | ID: mdl-38831432

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community.. RESULTS: The proposed model in this study was applied to three distinct tabular datasets, resulting in notable enhancements in their quality with respect to readiness for ML tasks, as assessed through established data quality metrics. Our findings demonstrate the efficacy of the framework in substantially augmenting the original dataset quality, achieved through the elimination of extraneous features and rows. This refinement yielded improved accuracy across both supervised and unsupervised learning methodologies. CONCLUSION: Our software presents an automated framework for data readiness, aimed at enhancing the integrity of raw datasets to facilitate robust utilization within ML pipelines. Through our proposed framework, we streamline the original dataset, resulting in enhanced accuracy and efficiency within the associated ML algorithms.

Machine Learning , Humans , Datasets as Topic , Unsupervised Machine Learning , Algorithms , Supervised Machine Learning , Software

3.

A comparative study of supervised and unsupervised machine learning algorithms applied to human microbiome.

Kalluçi, E; Preni, B; Dhamo, X; Noka, E; Bardhi, S; Macchia, A; Bonetti, G; Dhuli, K; Donato, K; Bertelli, M; Zambrano, L J M; Janaqi, S.

Clin Ter ; 175(3): 98-116, 2024.

Article En | MEDLINE | ID: mdl-38767067

Background: The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims. Methods: In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction. Results: Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features. Conclusions: This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.

Microbiota , RNA, Ribosomal, 16S , Supervised Machine Learning , Unsupervised Machine Learning , Humans , Microbiota/genetics , RNA, Ribosomal, 16S/genetics , RNA, Ribosomal, 16S/analysis , Colorectal Neoplasms/microbiology , Gastrointestinal Microbiome/genetics , Algorithms , Feces/microbiology , Adenoma/microbiology

4.

Shifting to machine supervision: annotation-efficient semi and self-supervised learning for automatic medical image segmentation and classification.

Singh, Pranav; Chukkapalli, Raviteja; Chaudhari, Shravan; Chen, Luoyao; Chen, Mei; Pan, Jinqian; Smuda, Craig; Cirrone, Jacopo.

Sci Rep ; 14(1): 10820, 2024 05 11.

Article En | MEDLINE | ID: mdl-38734825

Advancements in clinical treatment are increasingly constrained by the limitations of supervised learning techniques, which depend heavily on large volumes of annotated data. The annotation process is not only costly but also demands substantial time from clinical specialists. Addressing this issue, we introduce the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning. These techniques engage in auxiliary tasks that do not require labeling, thus simplifying the scaling of machine supervision compared to fully-supervised methods. Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks. Notably, we observed that self-supervised learning significantly surpassed the performance of supervised methods in the classification of all evaluated datasets. Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets. In line with our commitment to contributing to the scientific community, we have made the S4MI code openly accessible, allowing for broader application and further development of these methods. The code can be accessed at https://github.com/pranavsinghps1/S4MI .

Image Processing, Computer-Assisted , Supervised Machine Learning , Humans , Image Processing, Computer-Assisted/methods , Diagnostic Imaging/methods , Algorithms

5.

Global marine phytoplankton dynamics analysis with machine learning and reanalyzed remote sensing.

Adhikary, Subhrangshu; Tiwari, Surya Prakash; Banerjee, Saikat; Dwivedi, Ashutosh Dhar; Rahman, Syed Masiur.

PeerJ ; 12: e17361, 2024.

Article En | MEDLINE | ID: mdl-38737741

Phytoplankton are the world's largest oxygen producers found in oceans, seas and large water bodies, which play crucial roles in the marine food chain. Unbalanced biogeochemical features like salinity, pH, minerals, etc., can retard their growth. With advancements in better hardware, the usage of Artificial Intelligence techniques is rapidly increasing for creating an intelligent decision-making system. Therefore, we attempt to overcome this gap by using supervised regressions on reanalysis data targeting global phytoplankton levels in global waters. The presented experiment proposes the applications of different supervised machine learning regression techniques such as random forest, extra trees, bagging and histogram-based gradient boosting regressor on reanalysis data obtained from the Copernicus Global Ocean Biogeochemistry Hindcast dataset. Results obtained from the experiment have predicted the phytoplankton levels with a coefficient of determination score (R2) of up to 0.96. After further validation with larger datasets, the model can be deployed in a production environment in an attempt to complement in-situ measurement efforts.

Machine Learning , Phytoplankton , Remote Sensing Technology , Remote Sensing Technology/methods , Remote Sensing Technology/instrumentation , Oceans and Seas , Environmental Monitoring/methods , Supervised Machine Learning

6.

Efficient data integration under prior probability shift.

Huang, Ming-Yueh; Qin, Jing; Huang, Chiung-Yu.

Biometrics ; 80(2)2024 Mar 27.

Article En | MEDLINE | ID: mdl-38768225

Conventional supervised learning usually operates under the premise that data are collected from the same underlying population. However, challenges may arise when integrating new data from different populations, resulting in a phenomenon known as dataset shift. This paper focuses on prior probability shift, where the distribution of the outcome varies across datasets but the conditional distribution of features given the outcome remains the same. To tackle the challenges posed by such shift, we propose an estimation algorithm that can efficiently combine information from multiple sources. Unlike existing methods that are restricted to discrete outcomes, the proposed approach accommodates both discrete and continuous outcomes. It also handles high-dimensional covariate vectors through variable selection using an adaptive least absolute shrinkage and selection operator penalty, producing efficient estimates that possess the oracle property. Moreover, a novel semiparametric likelihood ratio test is proposed to check the validity of prior probability shift assumptions by embedding the null conditional density function into Neyman's smooth alternatives (Neyman, 1937) and testing study-specific parameters. We demonstrate the effectiveness of our proposed method through extensive simulations and a real data example. The proposed methods serve as a useful addition to the repertoire of tools for dealing dataset shifts.

Algorithms , Computer Simulation , Models, Statistical , Probability , Humans , Likelihood Functions , Biometry/methods , Data Interpretation, Statistical , Supervised Machine Learning

7.

EEG power spectra parameterization and adaptive channel selection towards semi-supervised seizure prediction.

Li, Hanyi; Liao, Jiahui; Wang, Hongxiao; Zhan, Chang'an A; Yang, Feng.

Comput Biol Med ; 175: 108510, 2024 Jun.

Article En | MEDLINE | ID: mdl-38691913

BACKGROUND: The seizure prediction algorithms have demonstrated their potential in mitigating epilepsy risks by detecting the pre-ictal state using ongoing electroencephalogram (EEG) signals. However, most of them require high-density EEG, which is burdensome to the patients for daily monitoring. Moreover, prevailing seizure models require extensive training with significant labeled data which is very time-consuming and demanding for the epileptologists. METHOD: To address these challenges, here we propose an adaptive channel selection strategy and a semi-supervised deep learning model respectively to reduce the number of EEG channels and to limit the amount of labeled data required for accurate seizure prediction. Our channel selection module is centered on features from EEG power spectra parameterization that precisely characterize the epileptic activities to identify the seizure-associated channels for each patient. The semi-supervised model integrates generative adversarial networks and bidirectional long short-term memory networks to enhance seizure prediction. RESULTS: Our approach is evaluated on the CHB-MIT and Siena epilepsy datasets. With utilizing only 4 channels, the method demonstrates outstanding performance with an AUC of 93.15% on the CHB-MIT dataset and an AUC of 88.98% on the Siena dataset. Experimental results also demonstrate that our selection approach reduces the model parameters and training time. CONCLUSIONS: Adaptive channel selection coupled with semi-supervised learning can offer the possible bases for a light weight and computationally efficient seizure prediction system, making the daily monitoring practical to improve patients' quality of life.

Electroencephalography , Seizures , Humans , Electroencephalography/methods , Seizures/physiopathology , Seizures/diagnosis , Signal Processing, Computer-Assisted , Deep Learning , Algorithms , Databases, Factual , Epilepsy/physiopathology , Supervised Machine Learning

8.

MA-MIL: Sampling point-level abnormal ECG location method via weakly supervised learning.

Liu, Jin; Li, Jiadong; Duan, Yuxin; Zhou, Yang; Fan, Xiaoxue; Li, Shuo; Chang, Shijie.

Comput Methods Programs Biomed ; 250: 108164, 2024 Jun.

Article En | MEDLINE | ID: mdl-38718709

BACKGROUND AND OBJECTIVE: Current automatic electrocardiogram (ECG) diagnostic systems could provide classification outcomes but often lack explanations for these results. This limitation hampers their application in clinical diagnoses. Previous supervised learning could not highlight abnormal segmentation output accurately enough for clinical application without manual labeling of large ECG datasets. METHOD: In this study, we present a multi-instance learning framework called MA-MIL, which has designed a multi-layer and multi-instance structure that is aggregated step by step at different scales. We evaluated our method using the public MIT-BIH dataset and our private dataset. RESULTS: The results show that our model performed well in both ECG classification output and heartbeat level, sub-heartbeat level abnormal segment detection, with accuracy and F1 scores of 0.987 and 0.986 for ECG classification and 0.968 and 0.949 for heartbeat level abnormal detection, respectively. Compared to visualization methods, the IoU values of MA-MIL improved by at least 17 % and at most 31 % across all categories. CONCLUSIONS: MA-MIL could accurately locate the abnormal ECG segment, offering more trustworthy results for clinical application.

Algorithms , Electrocardiography , Supervised Machine Learning , Electrocardiography/methods , Humans , Heart Rate , Databases, Factual , Signal Processing, Computer-Assisted

9.

VOLTA: an enVironment-aware cOntrastive ceLl represenTation leArning for histopathology.

Nakhli, Ramin; Rich, Katherine; Zhang, Allen; Darbandsari, Amirali; Shenasa, Elahe; Hadjifaradji, Amir; Thiessen, Sidney; Milne, Katy; Jones, Steven J M; McAlpine, Jessica N; Nelson, Brad H; Gilks, C Blake; Farahani, Hossein; Bashashati, Ali.

Nat Commun ; 15(1): 3942, 2024 May 10.

Article En | MEDLINE | ID: mdl-38729933

In clinical oncology, many diagnostic tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques necessitate the need for labels, providing manual cell annotations is time-consuming. In this paper, we propose a self-supervised framework (enVironment-aware cOntrastive cell represenTation learning: VOLTA) for cell representation learning in histopathology images using a technique that accounts for the cell's mutual relationship with its environment. We subject our model to extensive experiments on data collected from multiple institutions comprising over 800,000 cells and six cancer types. To showcase the potential of our proposed framework, we apply VOLTA to ovarian and endometrial cancers and demonstrate that our cell representations can be utilized to identify the known histotypes of ovarian cancer and provide insights that link histopathology and molecular subtypes of endometrial cancer. Unlike supervised models, we provide a framework that can empower discoveries without any annotation data, even in situations where sample sizes are limited.

Endometrial Neoplasms , Ovarian Neoplasms , Humans , Female , Endometrial Neoplasms/pathology , Ovarian Neoplasms/pathology , Machine Learning , Supervised Machine Learning , Algorithms , Image Processing, Computer-Assisted/methods

10.

A culture-independent approach, supervised machine learning, and the characterization of the microbial community composition of coastal areas across the Bay of Bengal and the Arabian Sea.

Rekadwad, Bhagwan Narayan; Shouche, Yogesh Shreepad; Jangid, Kamlesh.

BMC Microbiol ; 24(1): 162, 2024 May 10.

Article En | MEDLINE | ID: mdl-38730339

BACKGROUND: Coastal areas are subject to various anthropogenic and natural influences. In this study, we investigated and compared the characteristics of two coastal regions, Andhra Pradesh (AP) and Goa (GA), focusing on pollution, anthropogenic activities, and recreational impacts. We explored three main factors influencing the differences between these coastlines: The Bay of Bengal's shallower depth and lower salinity; upwelling phenomena due to the thermocline in the Arabian Sea; and high tides that can cause strong currents that transport pollutants and debris. RESULTS: The microbial diversity in GA was significantly higher than that in AP, which might be attributed to differences in temperature, soil type, and vegetation cover. 16S rRNA amplicon sequencing and bioinformatics analysis indicated the presence of diverse microbial phyla, including candidate phyla radiation (CPR). Statistical analysis, random forest regression, and supervised machine learning models classification confirm the diversity of the microbiome accurately. Furthermore, we have identified 450 cultures of heterotrophic, biotechnologically important bacteria. Some strains were identified as novel taxa based on 16S rRNA gene sequencing, showing promising potential for further study. CONCLUSION: Thus, our study provides valuable insights into the microbial diversity and pollution levels of coastal areas in AP and GA. These findings contribute to a better understanding of the impact of anthropogenic activities and climate variations on biology of coastal ecosystems and biodiversity.

Bacteria , Bays , Microbiota , Phylogeny , RNA, Ribosomal, 16S , Seawater , Supervised Machine Learning , RNA, Ribosomal, 16S/genetics , Bacteria/classification , Bacteria/genetics , Bacteria/isolation & purification , Microbiota/genetics , Seawater/microbiology , India , Bays/microbiology , Biodiversity , DNA, Bacterial/genetics , Salinity , Sequence Analysis, DNA/methods

11.

A cautionary tale about properly vetting datasets used in supervised learning predicting metabolic pathway involvement.

Huckvale, Erik D; Moseley, Hunter N B.

PLoS One ; 19(5): e0299583, 2024.

Article En | MEDLINE | ID: mdl-38696410

The mapping of metabolite-specific data to pathways within cellular metabolism is a major data analysis step needed for biochemical interpretation. A variety of machine learning approaches, particularly deep learning approaches, have been used to predict these metabolite-to-pathway mappings, utilizing a training dataset of known metabolite-to-pathway mappings. A few such training datasets have been derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG). However, several prior published machine learning approaches utilized an erroneous KEGG-derived training dataset that used SMILES molecular representations strings (KEGG-SMILES dataset) and contained a sizable proportion (~26%) duplicate entries. The presence of so many duplicates taint the training and testing sets generated from k-fold cross-validation of the KEGG-SMILES dataset. Therefore, the k-fold cross-validation performance of the resulting machine learning models was grossly inflated by the erroneous presence of these duplicate entries. Here we describe and evaluate the KEGG-SMILES dataset so that others may avoid using it. We also identify the prior publications that utilized this erroneous KEGG-SMILES dataset so their machine learning results can be properly and critically evaluated. In addition, we demonstrate the reduction of model k-fold cross-validation (CV) performance after de-duplicating the KEGG-SMILES dataset. This is a cautionary tale about properly vetting prior published benchmark datasets before using them in machine learning approaches. We hope others will avoid similar mistakes.

Metabolic Networks and Pathways , Supervised Machine Learning , Humans , Datasets as Topic

12.

[Constructing a predictive model for the death risk of patients with septic shock based on supervised machine learning algorithms].

Xie, Zheng; Jin, Jing; Liu, Dongsong; Lu, Shengyi; Yu, Hui; Han, Dong; Sun, Wei; Huang, Ming.

Zhonghua Wei Zhong Bing Ji Jiu Yi Xue ; 36(4): 345-352, 2024 Apr.

Article Zh | MEDLINE | ID: mdl-38813626

OBJECTIVE: To construct and validate the best predictive model for 28-day death risk in patients with septic shock based on different supervised machine learning algorithms. METHODS: The patients with septic shock meeting the Sepsis-3 criteria were selected from Medical Information Mart for Intensive Care-IV v2.0 (MIMIC-IV v2.0). According to the principle of random allocation, 70% of these patients were used as the training set, and 30% as the validation set. Relevant predictive variables were extracted from three aspects: demographic characteristics and basic vital signs, serum indicators within 24 hours of intensive care unit (ICU) admission and complications possibly affecting indicators, functional scoring and advanced life support. The predictive efficacy of models constructed using five mainstream machine learning algorithms including decision tree classification and regression tree (CART), random forest (RF), support vector machine (SVM), linear regression (LR), and super learner [SL; combined CART, RF and extreme gradient boosting (XGBoost)] for 28-day death in patients with septic shock was compared, and the best algorithm model was selected. The optimal predictive variables were determined by intersecting the results from LASSO regression, RF, and XGBoost algorithms, and a predictive model was constructed. The predictive efficacy of the model was validated by drawing receiver operator characteristic curve (ROC curve), the accuracy of the model was assessed using calibration curves, and the practicality of the model was verified through decision curve analysis (DCA). RESULTS: A total of 3 295 patients with septic shock were included, with 2 164 surviving and 1 131 dying within 28 days, resulting in a mortality of 34.32%. Of these, 2 307 were in the training set (with 792 deaths within 28 days, a mortality of 34.33%), and 988 in the validation set (with 339 deaths within 28 days, a mortality of 34.31%). Five machine learning models were established based on the training set data. After including variables at three aspects, the area under the ROC curve (AUC) of RF, SVM, and LR machine learning algorithm models for predicting 28-day death in septic shock patients in the validation set was 0.823 [95% confidence interval (95%CI) was 0.795-0.849], 0.823 (95%CI was 0.796-0.849), and 0.810 (95%CI was 0.782-0.838), respectively, which were higher than that of the CART algorithm model (AUC = 0.750, 95%CI was 0.717-0.782) and SL algorithm model (AUC = 0.756, 95%CI was 0.724-0.789). Thus above three algorithm models were determined to be the best algorithm models. After integrating variables from three aspects, 16 optimal predictive variables were identified through intersection by LASSO regression, RF, and XGBoost algorithms, including the highest pH value, the highest albumin (Alb), the highest body temperature, the lowest lactic acid (Lac), the highest Lac, the highest serum creatinine (SCr), the highest Ca2+, the lowest hemoglobin (Hb), the lowest white blood cell count (WBC), age, simplified acute physiology score III (SAPS III), the highest WBC, acute physiology score III (APS III), the lowest Na+, body mass index (BMI), and the shortest activated partial thromboplastin time (APTT) within 24 hours of ICU admission. ROC curve analysis showed that the Logistic regression model constructed with above 16 optimal predictive variables was the best predictive model, with an AUC of 0.806 (95%CI was 0.778-0.835) in the validation set. The calibration curve and DCA curve showed that this model had high accuracy and the highest net benefit could reach 0.3, which was significantly outperforming traditional models based on single functional score [APS III score, SAPS III score, and sequential organ failure assessment (SOFA) score] with AUC (95%CI) of 0.746 (0.715-0.778), 0.765 (0.734-0.796), and 0.625 (0.589-0.661), respectively. CONCLUSIONS: The Logistic regression model, constructed using 16 optimal predictive variables including pH value, Alb, body temperature, Lac, SCr, Ca2+, Hb, WBC, SAPS III score, APS III score, Na+, BMI, and APTT, is identified as the best predictive model for the 28-day death risk in patients with septic shock. Its performance is stable, with high discriminative ability and accuracy.

Algorithms , Shock, Septic , Supervised Machine Learning , Support Vector Machine , Humans , Shock, Septic/mortality , Shock, Septic/diagnosis , Female , Prognosis , Intensive Care Units , Male , Middle Aged , Machine Learning , Decision Trees

13.

Learning semi-supervised enrichment of longitudinal imaging-genetic data for improved prediction of cognitive decline.

Seo, Hoon; Brand, Lodewijk; Wang, Hua.

BMC Med Inform Decis Mak ; 24(Suppl 1): 61, 2024 May 28.

Article En | MEDLINE | ID: mdl-38807132

BACKGROUND: Alzheimer's Disease (AD) is a progressive memory disorder that causes irreversible cognitive decline. Given that there is currently no cure, it is critical to detect AD in its early stage during the disease progression. Recently, many statistical learning methods have been presented to identify cognitive decline with temporal data, but few of these methods integrate heterogeneous phenotype and genetic information together to improve the accuracy of prediction. In addition, many of these models are often unable to handle incomplete temporal data; this often manifests itself in the removal of records to ensure consistency in the number of records across participants. RESULTS: To address these issues, in this work we propose a novel approach to integrate the genetic data and the longitudinal phenotype data to learn a fixed-length "enriched" biomarker representation derived from the temporal heterogeneous neuroimaging records. Armed with this enriched representation, as a fixed-length vector per participant, conventional machine learning models can be used to predict clinical outcomes associated with AD. CONCLUSION: The proposed method shows improved prediction performance when applied to data derived from Alzheimer's Disease Neruoimaging Initiative cohort. In addition, our approach can be easily interpreted to allow for the identification and validation of biomarkers associated with cognitive decline.

Alzheimer Disease , Cognitive Dysfunction , Neuroimaging , Humans , Cognitive Dysfunction/genetics , Cognitive Dysfunction/diagnostic imaging , Alzheimer Disease/genetics , Alzheimer Disease/diagnostic imaging , Aged , Longitudinal Studies , Supervised Machine Learning , Female , Male , Machine Learning

14.

GMIM: Self-supervised pre-training for 3D medical image segmentation with adaptive and hierarchical masked image modeling.

Qi, Liangce; Jiang, Zhengang; Shi, Weili; Qu, Feng; Feng, Guanyuan.

Comput Biol Med ; 176: 108547, 2024 Jun.

Article En | MEDLINE | ID: mdl-38728994

Self-supervised pre-training and fully supervised fine-tuning paradigms have received much attention to solve the data annotation problem in deep learning fields. Compared with traditional pre-training on large natural image datasets, medical self-supervised learning methods learn rich representations derived from unlabeled data itself thus avoiding the distribution shift between different image domains. However, nowadays state-of-the-art medical pre-training methods were specifically designed for downstream tasks making them less flexible and difficult to apply to new tasks. In this paper, we propose grid mask image modeling, a flexible and general self-supervised method to pre-train medical vision transformers for 3D medical image segmentation. Our goal is to guide networks to learn the correlations between organs and tissues by reconstructing original images based on partial observations. The relationships are consistent within the human body and invariant to disease type or imaging modality. To achieve this, we design a Siamese framework consisting of an online branch and a target branch. An adaptive and hierarchical masking strategy is employed in the online branch to (1) learn the boundaries or small contextual mutation regions within images; (2) to learn high-level semantic representations from deeper layers of the multiscale encoder. In addition, the target branch provides representations for contrastive learning to further reduce representation redundancy. We evaluate our method through segmentation performance on two public datasets. The experimental results demonstrate our method outperforms other self-supervised methods. Codes are available at https://github.com/mobiletomb/Gmim.

Imaging, Three-Dimensional , Humans , Imaging, Three-Dimensional/methods , Deep Learning , Algorithms , Supervised Machine Learning

15.

Analysis of 3D pathology samples using weakly supervised AI.

Song, Andrew H; Williams, Mane; Williamson, Drew F K; Chow, Sarah S L; Jaume, Guillaume; Gao, Gan; Zhang, Andrew; Chen, Bowen; Baras, Alexander S; Serafin, Robert; Colling, Richard; Downes, Michelle R; Farré, Xavier; Humphrey, Peter; Verrill, Clare; True, Lawrence D; Parwani, Anil V; Liu, Jonathan T C; Mahmood, Faisal.

Cell ; 187(10): 2502-2520.e17, 2024 May 09.

Article En | MEDLINE | ID: mdl-38729110

Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.

Imaging, Three-Dimensional , Prostatic Neoplasms , Supervised Machine Learning , Humans , Male , Deep Learning , Imaging, Three-Dimensional/methods , Prognosis , Prostatic Neoplasms/pathology , Prostatic Neoplasms/diagnostic imaging , X-Ray Microtomography/methods

16.

Kidney Tumor Classification on CT images using Self-supervised Learning.

Özbay, Erdal; Özbay, Feyza Altunbey; Gharehchopogh, Farhad Soleimanian.

Comput Biol Med ; 176: 108554, 2024 Jun.

Article En | MEDLINE | ID: mdl-38744013

One of the most common diseases affecting society around the world is kidney tumor. The risk of kidney disease increases due to reasons such as consumption of ready-made food and bad habits. Early diagnosis of kidney tumors is essential for effective treatment, reducing side effects, and reducing the number of deaths. With the development of computer-aided diagnostic methods, the need for accurate renal tumor classification is also increasing. Because traditional methods based on manual detection are time-consuming, boring, and costly, high-accuracy tests can be performed faster and at a lower cost with deep learning (DL) methods in kidney tumor detection (KTD). Among the current challenges regarding artificial intelligence-assisted KTD, obtaining more precise programming information and the capacity to group with high accuracy make clinical determination more vital and bring it to an important point for current treatment in KTD prediction. This encourages us to propose a more effective DL model that can effectively assist specialist physicians in the diagnosis of kidney tumors. In this way, the workload of radiologists can be alleviated and errors in clinical diagnoses that may occur due to the complex structure of the kidney can be prevented. A large amount of data is needed during the training of the developed methods. Although various studies have been conducted to reduce the amount of data with feature selection techniques, these techniques provide little improvement in the classification accuracy rate. In this paper, a masked autoencoder (MAE) is proposed for KTD, which can produce effective results on datasets containing some samples and can be directly fine-tuned and pre-trained. Self-supervised learning (SSL) is achieved through self-distillation (SD), which can be reintroduced into the configuration loss calculation using masked patches. The SD loss on the decoder and encoder outputs' latent representation is calculated operating SSLSD-KTD. The encoder obtains local attention, while the decoder transfers its global attention to calculate losses. The SSLSD-KTD method reached 98.04 % classification accuracy on the KAUH-kidney dataset, including 8400 samples, and 82.14 % on the CT-kidney dataset, containing 840 samples. By adding more external information to the SSLSD-KTD method with transfer learning, accuracy results of 99.82 % and 95.24 % were obtained on the same datasets. Experimental results have shown that the SSLSD-KTD method can effectively extract kidney tumor features with limited data and can be an aid or even an alternative for radiologists in decision-making in the diagnosis of the disease.

Kidney Neoplasms , Tomography, X-Ray Computed , Humans , Kidney Neoplasms/diagnostic imaging , Kidney Neoplasms/classification , Tomography, X-Ray Computed/methods , Supervised Machine Learning , Deep Learning , Kidney/diagnostic imaging , Male , Female , Radiographic Image Interpretation, Computer-Assisted/methods

17.

Point based weakly semi-supervised biomarker detection with cross-scale and label assignment in retinal OCT images.

Liu, Xiaoming; Zhu, Xin; Zhang, Ying; Wang, Man.

Comput Methods Programs Biomed ; 251: 108229, 2024 Jun.

Article En | MEDLINE | ID: mdl-38761413

BACKGROUND AND OBJECTIVE: Optical coherence tomography (OCT) is currently one of the most advanced retinal imaging methods. Retinal biomarkers in OCT images are of clinical significance and can assist ophthalmologists in diagnosing lesions. Compared with fundus images, OCT can provide higher resolution segmentation. However, image annotation at the bounding box level needs to be performed by ophthalmologists carefully and is difficult to obtain. In addition, the large variation in shape of different retinal markers and the inconspicuous appearance of biomarkers make it difficult for existing deep learning-based methods to effectively detect them. To overcome the above challenges, we propose a novel network for the detection of retinal biomarkers in OCT images. METHODS: We first address the issue of labeling cost using a novel weakly semi-supervised object detection method with point annotations which can reduce bounding box-level annotation efforts. To extend the method to the detection of biomarkers in OCT images, we propose multiple consistent regularizations for point-to-box regression network to deal with the shortage of supervision, which aims to learn more accurate regression mappings. Furthermore, in the subsequent fully supervised detection, we propose a cross-scale feature enhancement module to alleviate the detection problems caused by the large-scale variation of biomarkers. We also propose a dynamic label assignment strategy to distinguish samples of different importance more flexibly, thereby reducing detection errors due to the indistinguishable appearance of the biomarkers. RESULTS: When using our detection network, our regressor also achieves an AP value of 20.83 s when utilizing a 5 % fully labeled dataset partition, surpassing the performance of other comparative methods at 5 % and 10 %. Even coming close to the 20.87 % result achieved by Point DETR under 20 % full labeling conditions. When using Group R-CNN as the point-to-box regressor, our detector achieves 27.21 % AP in the 50 % fully labeled dataset experiment. 7.42 % AP improvement is achieved compared to our detection network baseline Faster R-CNN. CONCLUSIONS: The experimental findings not only demonstrate the effectiveness of our approach with minimal bounding box annotations but also highlight the enhanced biomarker detection performance of the proposed module. We have included a detailed algorithmic flow in the supplementary material.

Algorithms , Biomarkers , Retina , Tomography, Optical Coherence , Tomography, Optical Coherence/methods , Humans , Retina/diagnostic imaging , Deep Learning , Image Processing, Computer-Assisted/methods , Supervised Machine Learning , Neural Networks, Computer , Image Interpretation, Computer-Assisted/methods

18.

Exploring UMAP in hybrid models of entropy-based and representativeness sampling for active learning in biomedical segmentation.

Tan, Hai Siong; Wang, Kuancheng; Mcbeth, Rafe.

Comput Biol Med ; 176: 108605, 2024 Jun.

Article En | MEDLINE | ID: mdl-38772054

In this work, we study various hybrid models of entropy-based and representativeness sampling techniques in the context of active learning in medical segmentation, in particular examining the role of UMAP (Uniform Manifold Approximation and Projection) as a technique for capturing representativeness. Although UMAP has been shown viable as a general purpose dimension reduction method in diverse areas, its role in deep learning-based medical segmentation has yet been extensively explored. Using the cardiac and prostate datasets in the Medical Segmentation Decathlon for validation, we found that a novel hybrid combination of Entropy-UMAP sampling technique achieved a statistically significant Dice score advantage over the random baseline (3.2% for cardiac, 4.5% for prostate), and attained the highest Dice coefficient among the spectrum of 10 distinct active learning methodologies we examined. This provides preliminary evidence that there is an interesting synergy between entropy-based and UMAP methods when the former precedes the latter in a hybrid model of active learning.

Entropy , Humans , Male , Deep Learning , Prostate/diagnostic imaging , Image Processing, Computer-Assisted/methods , Supervised Machine Learning , Heart

19.

Quality-driven deep cross-supervised learning network for semi-supervised medical image segmentation.

Zhang, Zhenxi; Zhou, Heng; Shi, Xiaoran; Ran, Ran; Tian, Chunna; Zhou, Feng.

Comput Biol Med ; 176: 108609, 2024 Jun.

Article En | MEDLINE | ID: mdl-38772056

Semi-supervised medical image segmentation presents a compelling approach to streamline large-scale image analysis, alleviating annotation burdens while maintaining comparable performance. Despite recent strides in cross-supervised training paradigms, challenges persist in addressing sub-network disagreement and training efficiency and reliability. In response, our paper introduces a novel cross-supervised learning framework, Quality-driven Deep Cross-supervised Learning Network (QDC-Net). QDC-Net incorporates both an evidential sub-network and an vanilla sub-network, leveraging their complementary strengths to effectively handle disagreement. To enable the reliability and efficiency of semi-supervised training, we introduce a real-time quality estimation of the model's segmentation performance and propose a directional cross-training approach through the design of directional weights. We further design a truncated form of sample-wise loss weighting to mitigate the impact of inaccurate predictions and collapsed samples in semi-supervised training. Extensive experiments on LA and Pancreas-CT datasets demonstrate that QDC-Net surpasses other state-of-the-art methods in semi-supervised medical image segmentation. Code release is available at https://github.com/Medsemiseg.

Supervised Machine Learning , Humans , Deep Learning , Image Processing, Computer-Assisted/methods , Pancreas/diagnostic imaging , Tomography, X-Ray Computed

20.

A landmark-supervised registration framework for multi-phase CT images with cross-distillation.

Rao, Fan; Lyu, Tianling; Feng, Zhan; Wu, Yuanfeng; Ni, Yangfan; Zhu, Wentao.

Phys Med Biol ; 69(11)2024 May 31.

Article En | MEDLINE | ID: mdl-38768601

Objective.Multi-phase computed tomography (CT) has become a leading modality for identifying hepatic tumors. Nevertheless, the presence of misalignment in the images of different phases poses a challenge in accurately identifying and analyzing the patient's anatomy. Conventional registration methods typically concentrate on either intensity-based features or landmark-based features in isolation, so imposing limitations on the accuracy of the registration process.Method.We establish a nonrigid cycle-registration network that leverages semi-supervised learning techniques, wherein a point distance term based on Euclidean distance between registered landmark points is introduced into the loss function. Additionally, a cross-distillation strategy is proposed in network training to further improve registration performance which incorporates response-based knowledge concerning the distances between feature points.Results.We conducted experiments using multi-centered liver CT datasets to evaluate the performance of the proposed method. The results demonstrate that our method outperforms baseline methods in terms of target registration error. Additionally, Dice scores of the warped tumor masks were calculated. Our method consistently achieved the highest scores among all the comparing methods. Specifically, it achieved scores of 82.9% and 82.5% in the hepatocellular carcinoma and the intrahepatic cholangiocarcinoma dataset, respectively.Significance.The superior registration performance indicates its potential to serve as an important tool in hepatic tumor identification and analysis.

Image Processing, Computer-Assisted , Liver Neoplasms , Tomography, X-Ray Computed , Humans , Image Processing, Computer-Assisted/methods , Liver Neoplasms/diagnostic imaging , Carcinoma, Hepatocellular/diagnostic imaging , Supervised Machine Learning