Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 970
Filter
1.
J Biomed Opt ; 29(8): 086003, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39099678

ABSTRACT

Significance: Accurate identification of epidermal cells on reflectance confocal microscopy (RCM) images is important in the study of epidermal architecture and topology of both healthy and diseased skin. However, analysis of these images is currently done manually and therefore time-consuming and subject to human error and inter-expert interpretation. It is also hindered by low image quality due to noise and heterogeneity. Aim: We aimed to design an automated pipeline for the analysis of the epidermal structure from RCM images. Approach: Two attempts have been made at automatically localizing epidermal cells, called keratinocytes, on RCM images: the first is based on a rotationally symmetric error function mask, and the second on cell morphological features. Here, we propose a dual-task network to automatically identify keratinocytes on RCM images. Each task consists of a cycle generative adversarial network. The first task aims to translate real RCM images into binary images, thus learning the noise and texture model of RCM images, whereas the second task maps Gabor-filtered RCM images into binary images, learning the epidermal structure visible on RCM images. The combination of the two tasks allows one task to constrict the solution space of the other, thus improving overall results. We refine our cell identification by applying the pre-trained StarDist algorithm to detect star-convex shapes, thus closing any incomplete membranes and separating neighboring cells. Results: The results are evaluated both on simulated data and manually annotated real RCM data. Accuracy is measured using recall and precision metrics, which is summarized as the F 1 -score. Conclusions: We demonstrate that the proposed fully unsupervised method successfully identifies keratinocytes on RCM images of the epidermis, with an accuracy on par with experts' cell identification, is not constrained by limited available annotated data, and can be extended to images acquired using various imaging techniques without retraining.


Subject(s)
Epidermis , Keratinocytes , Microscopy, Confocal , Humans , Microscopy, Confocal/methods , Epidermis/diagnostic imaging , Keratinocytes/cytology , Image Processing, Computer-Assisted/methods , Algorithms , Epidermal Cells , Neural Networks, Computer , Unsupervised Machine Learning
2.
Int J Neural Syst ; 34(10): 2450055, 2024 Oct.
Article in English | MEDLINE | ID: mdl-39136190

ABSTRACT

Automatic seizure detection from Electroencephalography (EEG) is of great importance in aiding the diagnosis and treatment of epilepsy due to the advantages of convenience and economy. Existing seizure detection methods are usually patient-specific, the training and testing are carried out on the same patient, limiting their scalability to other patients. To address this issue, we propose a cross-subject seizure detection method via unsupervised domain adaptation. The proposed method aims to obtain seizure specific information through shallow and deep feature alignments. For shallow feature alignment, we use convolutional neural network (CNN) to extract seizure-related features. The distribution gap of the shallow features between different patients is minimized by multi-kernel maximum mean discrepancies (MK-MMD). For deep feature alignment, adversarial learning is utilized. The feature extractor tries to learn feature representations that try to confuse the domain classifier, making the extracted deep features more generalizable to new patients. The performance of our method is evaluated on the CHB-MIT and Siena databases in epoch-based experiments. Additionally, event-based experiments are also conducted on the CHB-MIT dataset. The results validate the feasibility of our method in diminishing the domain disparities among different patients.


Subject(s)
Electroencephalography , Neural Networks, Computer , Seizures , Unsupervised Machine Learning , Humans , Electroencephalography/methods , Seizures/diagnosis , Seizures/physiopathology , Deep Learning , Signal Processing, Computer-Assisted
3.
Sci Rep ; 14(1): 17956, 2024 08 02.
Article in English | MEDLINE | ID: mdl-39095606

ABSTRACT

The symptoms of diseases can vary among individuals and may remain undetected in the early stages. Detecting these symptoms is crucial in the initial stage to effectively manage and treat cases of varying severity. Machine learning has made major advances in recent years, proving its effectiveness in various healthcare applications. This study aims to identify patterns of symptoms and general rules regarding symptoms among patients using supervised and unsupervised machine learning. The integration of a rule-based machine learning technique and classification methods is utilized to extend a prediction model. This study analyzes patient data that was available online through the Kaggle repository. After preprocessing the data and exploring descriptive statistics, the Apriori algorithm was applied to identify frequent symptoms and patterns in the discovered rules. Additionally, the study applied several machine learning models for predicting diseases, including stepwise regression, support vector machine, bootstrap forest, boosted trees, and neural-boosted methods. Several predictive machine learning models were applied to the dataset to predict diseases. It was discovered that the stepwise method for fitting outperformed all competitors in this study, as determined through cross-validation conducted for each model based on established criteria. Moreover, numerous significant decision rules were extracted in the study, which can streamline clinical applications without the need for additional expertise. These rules enable the prediction of relationships between symptoms and diseases, as well as between different diseases. Therefore, the results obtained in this study have the potential to improve the performance of prediction models. We can discover diseases symptoms and general rules using supervised and unsupervised machine learning for the dataset. Overall, the proposed algorithm can support not only healthcare professionals but also patients who face cost and time constraints in diagnosing and treating these diseases.


Subject(s)
Algorithms , Supervised Machine Learning , Unsupervised Machine Learning , Humans , Male , Female , Support Vector Machine , Middle Aged , Adult , Disease
4.
Environ Health Perspect ; 132(8): 85002, 2024 Aug.
Article in English | MEDLINE | ID: mdl-39106156

ABSTRACT

BACKGROUND: The field of toxicology has witnessed substantial advancements in recent years, particularly with the adoption of new approach methodologies (NAMs) to understand and predict chemical toxicity. Class-based methods such as clustering and classification are key to NAMs development and application, aiding the understanding of hazard and risk concerns associated with groups of chemicals without additional laboratory work. Advances in computational chemistry, data generation and availability, and machine learning algorithms represent important opportunities for continued improvement of these techniques to optimize their utility for specific regulatory and research purposes. However, due to their intricacy, deep understanding and careful selection are imperative to align the adequate methods with their intended applications. OBJECTIVES: This commentary aims to deepen the understanding of class-based approaches by elucidating the pivotal role of chemical similarity (structural and biological) in clustering and classification approaches (CCAs). It addresses the dichotomy between general end point-agnostic similarity, often entailing unsupervised analysis, and end point-specific similarity necessitating supervised learning. The goal is to highlight the nuances of these approaches, their applications, and common misuses. DISCUSSION: Understanding similarity is pivotal in toxicological research involving CCAs. The effectiveness of these approaches depends on the right definition and measure of similarity, which varies based on context and objectives of the study. This choice is influenced by how chemical structures are represented and the respective labels indicating biological activity, if applicable. The distinction between unsupervised clustering and supervised classification methods is vital, requiring the use of end point-agnostic vs. end point-specific similarity definition. Separate use or combination of these methods requires careful consideration to prevent bias and ensure relevance for the goal of the study. Unsupervised methods use end point-agnostic similarity measures to uncover general structural patterns and relationships, aiding hypothesis generation and facilitating exploration of datasets without the need for predefined labels or explicit guidance. Conversely, supervised techniques demand end point-specific similarity to group chemicals into predefined classes or to train classification models, allowing accurate predictions for new chemicals. Misuse can arise when unsupervised methods are applied to end point-specific contexts, like analog selection in read-across, leading to erroneous conclusions. This commentary provides insights into the significance of similarity and its role in supervised classification and unsupervised clustering approaches. https://doi.org/10.1289/EHP14001.


Subject(s)
Machine Learning , Cluster Analysis , Unsupervised Machine Learning , Toxicology/methods , Algorithms
5.
Phys Med Biol ; 69(16)2024 Aug 09.
Article in English | MEDLINE | ID: mdl-39119998

ABSTRACT

Objective.Deep learning has markedly enhanced the performance of sparse-view computed tomography reconstruction. However, the dependence of these methods on supervised training using high-quality paired datasets, and the necessity for retraining under varied physical acquisition conditions, constrain their generalizability across new imaging contexts and settings.Approach.To overcome these limitations, we propose an unsupervised approach grounded in the deep image prior framework. Our approach advances beyond the conventional single noise level input by incorporating multi-level linear diffusion noise, significantly mitigating the risk of overfitting. Furthermore, we embed non-local self-similarity as a deep implicit prior within a self-attention network structure, improving the model's capability to identify and utilize repetitive patterns throughout the image. Additionally, leveraging imaging physics, gradient backpropagation is performed between the image domain and projection data space to optimize network weights.Main Results.Evaluations with both simulated and clinical cases demonstrate our method's effective zero-shot adaptability across various projection views, highlighting its robustness and flexibility. Additionally, our approach effectively eliminates noise and streak artifacts while significantly restoring intricate image details.Significance. Our method aims to overcome the limitations in current supervised deep learning-based sparse-view CT reconstruction, offering improved generalizability and adaptability without the need for extensive paired training data.


Subject(s)
Deep Learning , Image Processing, Computer-Assisted , Tomography, X-Ray Computed , Image Processing, Computer-Assisted/methods , Humans , Diffusion , Signal-To-Noise Ratio , Unsupervised Machine Learning
6.
Gigascience ; 132024 01 02.
Article in English | MEDLINE | ID: mdl-39028588

ABSTRACT

BACKGROUND: Integrative analysis of spatially resolved transcriptomics datasets empowers a deeper understanding of complex biological systems. However, integrating multiple tissue sections presents challenges for batch effect removal, particularly when the sections are measured by various technologies or collected at different times. FINDINGS: We propose spatiAlign, an unsupervised contrastive learning model that employs the expression of all measured genes and the spatial location of cells, to integrate multiple tissue sections. It enables the joint downstream analysis of multiple datasets not only in low-dimensional embeddings but also in the reconstructed full expression space. CONCLUSIONS: In benchmarking analysis, spatiAlign outperforms state-of-the-art methods in learning joint and discriminative representations for tissue sections, each potentially characterized by complex batch effects or distinct biological characteristics. Furthermore, we demonstrate the benefits of spatiAlign for the integrative analysis of time-series brain sections, including spatial clustering, differential expression analysis, and particularly trajectory inference that requires a corrected gene expression matrix.


Subject(s)
Gene Expression Profiling , Transcriptome , Unsupervised Machine Learning , Gene Expression Profiling/methods , Computational Biology/methods , Humans , Algorithms , Animals , Cluster Analysis , Brain/metabolism
7.
Neural Comput ; 36(8): 1449-1475, 2024 Jul 19.
Article in English | MEDLINE | ID: mdl-39028957

ABSTRACT

Dimension reduction on neural activity paves a way for unsupervised neural decoding by dissociating the measurement of internal neural pattern reactivation from the measurement of external variable tuning. With assumptions only on the smoothness of latent dynamics and of internal tuning curves, the Poisson gaussian-process latent variable model (P-GPLVM; Wu et al., 2017) is a powerful tool to discover the low-dimensional latent structure for high-dimensional spike trains. However, when given novel neural data, the original model lacks a method to infer their latent trajectories in the learned latent space, limiting its ability for estimating the neural reactivation. Here, we extend the P-GPLVM to enable the latent variable inference of new data constrained by previously learned smoothness and mapping information. We also describe a principled approach for the constrained latent variable inference for temporally compressed patterns of activity, such as those found in population burst events during hippocampal sharp-wave ripples, as well as metrics for assessing the validity of neural pattern reactivation and inferring the encoded experience. Applying these approaches to hippocampal ensemble recordings during active maze exploration, we replicate the result that P-GPLVM learns a latent space encoding the animal's position. We further demonstrate that this latent space can differentiate one maze context from another. By inferring the latent variables of new neural data during running, certain neural patterns are observed to reactivate, in accordance with the similarity of experiences encoded by its nearby neural trajectories in the training data manifold. Finally, reactivation of neural patterns can be estimated for neural activity during population burst events as well, allowing the identification for replay events of versatile behaviors and more general experiences. Thus, our extension of the P-GPLVM framework for unsupervised analysis of neural activity can be used to answer critical questions related to scientific discovery.


Subject(s)
Hippocampus , Models, Neurological , Neurons , Animals , Normal Distribution , Poisson Distribution , Neurons/physiology , Hippocampus/physiology , Action Potentials/physiology , Unsupervised Machine Learning , Rats
8.
Nat Commun ; 15(1): 6112, 2024 Jul 20.
Article in English | MEDLINE | ID: mdl-39030176

ABSTRACT

Ductal carcinoma in situ (DCIS) is a pre-invasive tumor that can progress to invasive breast cancer, a leading cause of cancer death. We generate a large-scale tissue microarray dataset of chromatin images, from 560 samples from 122 female patients in 3 disease stages and 11 phenotypic categories. Using representation learning on chromatin images alone, without multiplexed staining or high-throughput sequencing, we identify eight morphological cell states and tissue features marking DCIS. All cell states are observed in all disease stages with different proportions, indicating that cell states enriched in invasive cancer exist in small fractions in normal breast tissue. Tissue-level analysis reveals significant changes in the spatial organization of cell states across disease stages, which is predictive of disease stage and phenotypic category. Taken together, we show that chromatin imaging represents a powerful measure of cell state and disease stage of DCIS, providing a simple and effective tumor biomarker.


Subject(s)
Breast Neoplasms , Carcinoma, Intraductal, Noninfiltrating , Chromatin , Humans , Female , Carcinoma, Intraductal, Noninfiltrating/pathology , Carcinoma, Intraductal, Noninfiltrating/genetics , Carcinoma, Intraductal, Noninfiltrating/metabolism , Chromatin/metabolism , Breast Neoplasms/pathology , Breast Neoplasms/genetics , Breast Neoplasms/metabolism , Biomarkers, Tumor/metabolism , Biomarkers, Tumor/genetics , Unsupervised Machine Learning , Image Processing, Computer-Assisted/methods , Tissue Array Analysis , Neoplasm Staging
9.
J Neural Eng ; 21(4)2024 Jul 16.
Article in English | MEDLINE | ID: mdl-38968936

ABSTRACT

Objective.Domain adaptation has been recognized as a potent solution to the challenge of limited training data for electroencephalography (EEG) classification tasks. Existing studies primarily focus on homogeneous environments, however, the heterogeneous properties of EEG data arising from device diversity cannot be overlooked. This motivates the development of heterogeneous domain adaptation methods that can fully exploit the knowledge from an auxiliary heterogeneous domain for EEG classification.Approach.In this article, we propose a novel model named informative representation fusion (IRF) to tackle the problem of unsupervised heterogeneous domain adaptation in the context of EEG data. In IRF, we consider different perspectives of data, i.e. independent identically distributed (iid) and non-iid, to learn different representations. Specifically, from the non-iid perspective, IRF models high-order correlations among data by hypergraphs and develops hypergraph encoders to obtain data representations of each domain. From the non-iid perspective, by applying multi-layer perceptron networks to the source and target domain data, we achieve another type of representation for both domains. Subsequently, an attention mechanism is used to fuse these two types of representations to yield informative features. To learn transferable representations, the maximum mean discrepancy is utilized to align the distributions of the source and target domains based on the fused features.Main results.Experimental results on several real-world datasets demonstrate the effectiveness of the proposed model.Significance.This article handles an EEG classification situation where the source and target EEG data lie in different spaces, and what's more, under an unsupervised learning setting. This situation is practical in the real world but barely studied in the literature. The proposed model achieves high classification accuracy, and this study is important for the commercial applications of EEG-based BCIs.


Subject(s)
Electroencephalography , Electroencephalography/methods , Electroencephalography/classification , Humans , Unsupervised Machine Learning , Algorithms , Neural Networks, Computer
10.
Sensors (Basel) ; 24(13)2024 Jun 22.
Article in English | MEDLINE | ID: mdl-39000846

ABSTRACT

Global Positioning Systems (GPSs) can collect tracking data to remotely monitor livestock well-being and pasture use. Supervised machine learning requires behavioral observations of monitored animals to identify changes in behavior, which is labor-intensive. Our goal was to identify animal behaviors automatically without using human observations. We designed a novel framework using unsupervised learning techniques. The framework contains two steps. The first step segments cattle tracking data using state-of-the-art time series segmentation algorithms, and the second step groups segments into clusters and then labels the clusters. To evaluate the applicability of our proposed framework, we utilized GPS tracking data collected from five cows in a 1096 ha rangeland pasture. Cow movement pathways were grouped into six behavior clusters based on velocity (m/min) and distance from water. Again, using velocity, these six clusters were classified into walking, grazing, and resting behaviors. The mean velocity for predicted walking and grazing and resting behavior was 44, 13 and 2 min/min, respectively, which is similar to other research. Predicted diurnal behavior patterns showed two primary grazing bouts during early morning and evening, like in other studies. Our study demonstrates that the proposed two-step framework can use unlabeled GPS tracking data to predict cattle behavior without human observations.


Subject(s)
Algorithms , Behavior, Animal , Geographic Information Systems , Unsupervised Machine Learning , Cattle , Animals , Behavior, Animal/physiology , Female
11.
BMC Public Health ; 24(1): 1994, 2024 Jul 25.
Article in English | MEDLINE | ID: mdl-39061026

ABSTRACT

BACKGROUND: Recent studies have demonstrated that individuals hospitalized due to COVID-19 can be affected by "long-COVID" symptoms for as long as one year after discharge. OBJECTIVES: Our study objective is to identify data-driven clusters of patients using a novel, unsupervised machine learning technique. METHODS: The study uses data from 437 patients hospitalized in New York City between March 3rd and May 15th of 2020. The data used was abstracted from medical records and collected from a follow-up survey for up to one-year post-hospitalization. Hospitalization data included demographics, comorbidities, and in-hospital complications. The survey collected long-COVID symptoms, and information on general health, social isolation, and loneliness. To perform the analysis, we created a graph by projecting the data onto eight principal components (PCs) and running the K-nearest neighbors algorithm. We then used Louvain's algorithm to partition this graph into non-overlapping clusters. RESULTS: The cluster analysis produced four clusters with distinct health and social connectivity patterns. The first cluster (n = 141) consisted of patients with both long-COVID neurological symptoms (74%) and social isolation/loneliness. The second cluster (n = 137) consisted of healthy patients who were also more socially connected and not lonely. The third cluster (n = 96) contained patients with neurological symptoms who were socially connected but lonely, and the fourth cluster (n = 63) consisted entirely of patients who had traumatic COVID hospitalization, were intubated, suffered symptoms, but were socially connected and experienced recovery. CONCLUSION: The cluster analysis identified social isolation and loneliness as important features associated with long-COVID symptoms and recovery after hospitalization. It also confirms that social isolation and loneliness, though connected, are not necessarily the same. Physicians need to be aware of how social characteristics relate to long-COVID and patient's ability to cope with the resulting symptoms.


Subject(s)
COVID-19 , Hospitalization , Loneliness , Social Isolation , Humans , COVID-19/epidemiology , COVID-19/psychology , New York City/epidemiology , Male , Female , Hospitalization/statistics & numerical data , Middle Aged , Cluster Analysis , Social Isolation/psychology , Aged , Loneliness/psychology , Adult , Post-Acute COVID-19 Syndrome , Unsupervised Machine Learning , SARS-CoV-2
12.
Comput Methods Programs Biomed ; 254: 108315, 2024 Sep.
Article in English | MEDLINE | ID: mdl-38991373

ABSTRACT

BACKGROUND AND OBJECTIVE: Deep learning usually achieves good performance in the supervised way, which requires a large amount of labeled data. However, manual labeling of electrocardiograms (ECGs) is laborious that requires much medical knowledge. Semi-supervised learning (SSL) provides an effective way of leveraging unlabeled data to improve model performance, providing insight for solving this problem. The objective of this study is to improve the performance of cardiovascular disease (CVD) detection by fully utilizing unlabeled ECG. METHODS: A novel SSL algorithm fusing consistency regularization and pseudo-labeling techniques (CPSS) is proposed. CPSS consists of supervised learning and unsupervised learning. For supervised learning, the labeled ECGs are mapped into prediction vectors by the classifier. The cross-entropy loss function is used to optimize the classifier. For unsupervised learning, the unlabeled ECGs are weakly and strongly augmented, and a consistency loss is used to minimize the difference between the classifier's predictions for the two augmentations. Pseudo-labeling techniques include positive pseudo-labeling (PL) and ranking-based negative pseudo-labeling (RNL). PL introduces pseudo-labels for data with high prediction confidence. RNL assigns negative pseudo-labels to the lower-ranked categories in the prediction vectors to leverage data with low prediction confidence. In this study, VGGNet and ResNet are used as classifiers, which are jointly optimized by labeled and unlabeled ECGs. RESULTS: CPSS has been validated on several databases. With the same number of labeled ECGs (10%), it improves the accuracies over pure supervised learning by 13.59%, 4.60%, and 5.38% in the CPSC2018, PTB-XL, and Chapman databases, respectively. CPSS achieves comparable results to the fully supervised method with only 10% of labeled ECGs, which reduces the labeling workload by 90%. In addition, to verify the practicality of CPSS, a cardiovascular disease monitoring system is designed by heterogeneously deploying the trained classifiers on an SoC (system-on-a-chip), which can detect CVD in real time. CONCLUSION: The results of this study indicate that the proposed CPSS can significantly improve the performance of CVD detection using unlabeled ECG, which reduces the burden of ECG labeling in deep learning. In addition, the designed monitoring system makes the proposed CPSS promising for real-world applications.


Subject(s)
Algorithms , Cardiovascular Diseases , Deep Learning , Electrocardiography , Supervised Machine Learning , Humans , Electrocardiography/methods , Cardiovascular Diseases/diagnosis , Unsupervised Machine Learning , Databases, Factual
13.
BMC Cardiovasc Disord ; 24(1): 343, 2024 Jul 05.
Article in English | MEDLINE | ID: mdl-38969974

ABSTRACT

BACKGROUND: Heart failure (HF) with preserved or mildly reduced ejection fraction includes a heterogenous group of patients. Reclassification into distinct phenogroups to enable targeted interventions is a priority. This study aimed to identify distinct phenogroups, and compare phenogroup characteristics and outcomes, from electronic health record data. METHODS: 2,187 patients admitted to five UK hospitals with a diagnosis of HF and a left ventricular ejection fraction ≥ 40% were identified from the NIHR Health Informatics Collaborative database. Partition-based, model-based, and density-based machine learning clustering techniques were applied. Cox Proportional Hazards and Fine-Gray competing risks models were used to compare outcomes (all-cause mortality and hospitalisation for HF) across phenogroups. RESULTS: Three phenogroups were identified: (1) Younger, predominantly female patients with high prevalence of cardiometabolic and coronary disease; (2) More frail patients, with higher rates of lung disease and atrial fibrillation; (3) Patients characterised by systemic inflammation and high rates of diabetes and renal dysfunction. Survival profiles were distinct, with an increasing risk of all-cause mortality from phenogroups 1 to 3 (p < 0.001). Phenogroup membership significantly improved survival prediction compared to conventional factors. Phenogroups were not predictive of hospitalisation for HF. CONCLUSIONS: Applying unsupervised machine learning to routinely collected electronic health record data identified phenogroups with distinct clinical characteristics and unique survival profiles.


Subject(s)
Electronic Health Records , Heart Failure , Stroke Volume , Ventricular Function, Left , Humans , Heart Failure/physiopathology , Heart Failure/diagnosis , Heart Failure/mortality , Female , Male , Aged , Middle Aged , Risk Assessment , United Kingdom/epidemiology , Risk Factors , Prognosis , Aged, 80 and over , Databases, Factual , Unsupervised Machine Learning , Hospitalization , Time Factors , Comorbidity , Cause of Death , Phenotype , Data Mining
14.
Phys Med Biol ; 69(16)2024 Aug 02.
Article in English | MEDLINE | ID: mdl-39047770

ABSTRACT

Objective. Convolutional neural network (CNN) is developing rapidly in the field of medical image registration, and the proposed U-Net further improves the precision of registration. However, this method may discard certain important information in the process of encoding and decoding steps, consequently leading to a decline in accuracy. To solve this problem, a multi-channel semantic-aware and residual attention mechanism network (MSRA-Net) is proposed in this paper.Approach. Our proposed network achieves efficient information aggregation by cleverly extracting the features of different channels. Firstly, a context-aware module (CAM) is designed to extract valuable contextual information. And the depth-wise separable convolution is employed in the CAM to alleviate the computational burden. Then, a new multi-channel semantic-aware module (MCSAM) is designed for more comprehensive fusion of up-sampling features. Additionally, the residual attention module is introduced in the up-sampling process to extract more semantic information and minimize information loss.Main results. This study utilizes Dice score, average symmetric surface distance and negative Jacobian determinant evaluation metrics to evaluate the influence of registration. The experimental results demonstrate that our proposed MSRA-Net has the highest accuracy compared to several state-of-the-art methods. Moreover, our network has demonstrated the highest Dice score across multiple datasets, thereby indicating that the superior generalization capabilities of our model.Significance. The proposed MSRA-Net offers a novel approach to improve medical image registration accuracy, with implications for various clinical applications. Our implementation is available athttps://github.com/shy922/MSRA-Net.


Subject(s)
Imaging, Three-Dimensional , Neural Networks, Computer , Semantics , Imaging, Three-Dimensional/methods , Humans , Unsupervised Machine Learning
15.
BMJ Health Care Inform ; 31(1)2024 Jul 29.
Article in English | MEDLINE | ID: mdl-39074912

ABSTRACT

BACKGROUND: Despite the increasing availability of electronic healthcare record (EHR) data and wide availability of plug-and-play machine learning (ML) Application Programming Interfaces, the adoption of data-driven decision-making within routine hospital workflows thus far, has remained limited. Through the lens of deriving clusters of diagnoses by age, this study investigated the type of ML analysis that can be performed using EHR data and how results could be communicated to lay stakeholders. METHODS: Observational EHR data from a tertiary paediatric hospital, containing 61 522 unique patients and 3315 unique ICD-10 diagnosis codes was used, after preprocessing. K-means clustering was applied to identify age distributions of patient diagnoses. The final model was selected using quantitative metrics and expert assessment of the clinical validity of the clusters. Additionally, uncertainty over preprocessing decisions was analysed. FINDINGS: Four age clusters of diseases were identified, broadly aligning to ages between: 0 and 1; 1 and 5; 5 and 13; 13 and 18. Diagnoses, within the clusters, aligned to existing knowledge regarding the propensity of presentation at different ages, and sequential clusters presented known disease progressions. The results validated similar methodologies within the literature. The impact of uncertainty induced by preprocessing decisions was large at the individual diagnoses but not at a population level. Strategies for mitigating, or communicating, this uncertainty were successfully demonstrated. CONCLUSION: Unsupervised ML applied to EHR data identifies clinically relevant age distributions of diagnoses which can augment existing decision making. However, biases within healthcare datasets dramatically impact results if not appropriately mitigated or communicated.


Subject(s)
Electronic Health Records , Unsupervised Machine Learning , Humans , Child , Child, Preschool , Infant , Adolescent , Cluster Analysis , Infant, Newborn , Male , Female , Age Factors
16.
Biomed Phys Eng Express ; 10(5)2024 Aug 12.
Article in English | MEDLINE | ID: mdl-39019048

ABSTRACT

Precise segmentation for skin cancer lesions at different stages is conducive to early detection and further treatment. Considering the huge cost of obtaining pixel-perfect annotations for this task, segmentation using less expensive image-level labels has become a research direction. Most image-level label weakly supervised segmentation uses class activation mapping (CAM) methods. A common consequence of this method is incomplete foreground segmentation, insufficient segmentation, or false negatives. At the same time, when performing weakly supervised segmentation of skin cancer lesions, ulcers, redness, and swelling may appear near the segmented areas of individual disease categories. This co-occurrence problem affects the model's accuracy in segmenting class-related tissue boundaries to a certain extent. The above two issues are determined by the loosely constrained nature of image-level labels that penalize the entire image space. Therefore, providing pixel-level constraints for weak supervision of image-level labels is the key to improving performance. To solve the above problems, this paper proposes a joint unsupervised constraint-assisted weakly supervised segmentation model (UCA-WSS). The weakly supervised part of the model adopts a dual-branch adversarial erasure mechanism to generate higher-quality CAM. The unsupervised part uses contrastive learning and clustering algorithms to generate foreground labels and fine boundary labels to assist segmentation and solve common co-occurrence problems in weakly supervised skin cancer lesion segmentation through unsupervised constraints. The model proposed in the article is evaluated comparatively with other related models on some public dermatology data sets. Experimental results show that our model performs better on the skin cancer segmentation task than other weakly supervised segmentation models, showing the potential of combining unsupervised constraint methods on weakly supervised segmentation.


Subject(s)
Algorithms , Semantics , Skin Neoplasms , Humans , Skin Neoplasms/diagnostic imaging , Skin Neoplasms/pathology , Image Processing, Computer-Assisted/methods , Image Interpretation, Computer-Assisted/methods , Supervised Machine Learning , Databases, Factual , Skin/diagnostic imaging , Skin/pathology , Unsupervised Machine Learning
17.
Nat Genet ; 56(8): 1604-1613, 2024 Aug.
Article in English | MEDLINE | ID: mdl-38977853

ABSTRACT

Although high-dimensional clinical data (HDCD) are increasingly available in biobank-scale datasets, their use for genetic discovery remains challenging. Here we introduce an unsupervised deep learning model, Representation Learning for Genetic Discovery on Low-Dimensional Embeddings (REGLE), for discovering associations between genetic variants and HDCD. REGLE leverages variational autoencoders to compute nonlinear disentangled embeddings of HDCD, which become the inputs to genome-wide association studies (GWAS). REGLE can uncover features not captured by existing expert-defined features and enables the creation of accurate disease-specific polygenic risk scores (PRSs) in datasets with very few labeled data. We apply REGLE to perform GWAS on respiratory and circulatory HDCD-spirograms measuring lung function and photoplethysmograms measuring blood volume changes. REGLE replicates known loci while identifying others not previously detected. REGLE are predictive of overall survival, and PRSs constructed from REGLE loci improve disease prediction across multiple biobanks. Overall, REGLE contain clinically relevant information beyond that captured by existing expert-defined features, leading to improved genetic discovery and disease prediction.


Subject(s)
Genome-Wide Association Study , Humans , Genome-Wide Association Study/methods , Multifactorial Inheritance/genetics , Genetic Predisposition to Disease , Unsupervised Machine Learning , Genomics/methods , Deep Learning , Polymorphism, Single Nucleotide
18.
PLoS One ; 19(6): e0304017, 2024.
Article in English | MEDLINE | ID: mdl-38870119

ABSTRACT

This article presents an unsupervised method for segmenting brain computed tomography scans. The proposed methodology involves image feature extraction and application of similarity and continuity constraints to generate segmentation maps of the anatomical head structures. Specifically designed for real-world datasets, this approach applies a spatial continuity scoring function tailored to the desired number of structures. The primary objective is to assist medical experts in diagnosis by identifying regions with specific abnormalities. Results indicate a simplified and accessible solution, reducing computational effort, training time, and financial costs. Moreover, the method presents potential for expediting the interpretation of abnormal scans, thereby impacting clinical practice. This proposed approach might serve as a practical tool for segmenting brain computed tomography scans, and make a significant contribution to the analysis of medical images in both research and clinical settings.


Subject(s)
Brain , Tomography, X-Ray Computed , Humans , Tomography, X-Ray Computed/methods , Brain/diagnostic imaging , Image Processing, Computer-Assisted/methods , Algorithms , Unsupervised Machine Learning
19.
BMC Med Inform Decis Mak ; 24(1): 152, 2024 Jun 04.
Article in English | MEDLINE | ID: mdl-38831432

ABSTRACT

BACKGROUND: Machine learning (ML) has emerged as the predominant computational paradigm for analyzing large-scale datasets across diverse domains. The assessment of dataset quality stands as a pivotal precursor to the successful deployment of ML models. In this study, we introduce DREAMER (Data REAdiness for MachinE learning Research), an algorithmic framework leveraging supervised and unsupervised machine learning techniques to autonomously evaluate the suitability of tabular datasets for ML model development. DREAMER is openly accessible as a tool on GitHub and Docker, facilitating its adoption and further refinement within the research community.. RESULTS: The proposed model in this study was applied to three distinct tabular datasets, resulting in notable enhancements in their quality with respect to readiness for ML tasks, as assessed through established data quality metrics. Our findings demonstrate the efficacy of the framework in substantially augmenting the original dataset quality, achieved through the elimination of extraneous features and rows. This refinement yielded improved accuracy across both supervised and unsupervised learning methodologies. CONCLUSION: Our software presents an automated framework for data readiness, aimed at enhancing the integrity of raw datasets to facilitate robust utilization within ML pipelines. Through our proposed framework, we streamline the original dataset, resulting in enhanced accuracy and efficiency within the associated ML algorithms.


Subject(s)
Machine Learning , Humans , Datasets as Topic , Unsupervised Machine Learning , Algorithms , Supervised Machine Learning , Software
20.
Radiol Cardiothorac Imaging ; 6(3): e230247, 2024 Jun.
Article in English | MEDLINE | ID: mdl-38900026

ABSTRACT

Purpose To use unsupervised machine learning to identify phenotypic clusters with increased risk of arrhythmic mitral valve prolapse (MVP). Materials and Methods This retrospective study included patients with MVP without hemodynamically significant mitral regurgitation or left ventricular (LV) dysfunction undergoing late gadolinium enhancement (LGE) cardiac MRI between October 2007 and June 2020 in 15 European tertiary centers. The study end point was a composite of sustained ventricular tachycardia, (aborted) sudden cardiac death, or unexplained syncope. Unsupervised data-driven hierarchical k-mean algorithm was utilized to identify phenotypic clusters. The association between clusters and the study end point was assessed by Cox proportional hazards model. Results A total of 474 patients (mean age, 47 years ± 16 [SD]; 244 female, 230 male) with two phenotypic clusters were identified. Patients in cluster 2 (199 of 474, 42%) had more severe mitral valve degeneration (ie, bileaflet MVP and leaflet displacement), left and right heart chamber remodeling, and myocardial fibrosis as assessed with LGE cardiac MRI than those in cluster 1. Demographic and clinical features (ie, symptoms, arrhythmias at Holter monitoring) had negligible contribution in differentiating the two clusters. Compared with cluster 1, the risk of developing the study end point over a median follow-up of 39 months was significantly higher in cluster 2 patients (hazard ratio: 3.79 [95% CI: 1.19, 12.12], P = .02) after adjustment for LGE extent. Conclusion Among patients with MVP without significant mitral regurgitation or LV dysfunction, unsupervised machine learning enabled the identification of two phenotypic clusters with distinct arrhythmic outcomes based primarily on cardiac MRI features. These results encourage the use of in-depth imaging-based phenotyping for implementing arrhythmic risk prediction in MVP. Keywords: MR Imaging, Cardiac, Cardiac MRI, Mitral Valve Prolapse, Cluster Analysis, Ventricular Arrhythmia, Sudden Cardiac Death, Unsupervised Machine Learning Supplemental material is available for this article. © RSNA, 2024.


Subject(s)
Mitral Valve Prolapse , Phenotype , Unsupervised Machine Learning , Humans , Mitral Valve Prolapse/diagnostic imaging , Female , Male , Middle Aged , Retrospective Studies , Registries , Magnetic Resonance Imaging, Cine/methods , Arrhythmias, Cardiac/diagnostic imaging , Arrhythmias, Cardiac/physiopathology , Adult , Magnetic Resonance Imaging
SELECTION OF CITATIONS
SEARCH DETAIL