Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 1.983
Filtrar
1.
Sci Rep ; 14(1): 10820, 2024 05 11.
Artigo em Inglês | MEDLINE | ID: mdl-38734825

RESUMO

Advancements in clinical treatment are increasingly constrained by the limitations of supervised learning techniques, which depend heavily on large volumes of annotated data. The annotation process is not only costly but also demands substantial time from clinical specialists. Addressing this issue, we introduce the S4MI (Self-Supervision and Semi-Supervision for Medical Imaging) pipeline, a novel approach that leverages advancements in self-supervised and semi-supervised learning. These techniques engage in auxiliary tasks that do not require labeling, thus simplifying the scaling of machine supervision compared to fully-supervised methods. Our study benchmarks these techniques on three distinct medical imaging datasets to evaluate their effectiveness in classification and segmentation tasks. Notably, we observed that self-supervised learning significantly surpassed the performance of supervised methods in the classification of all evaluated datasets. Remarkably, the semi-supervised approach demonstrated superior outcomes in segmentation, outperforming fully-supervised methods while using 50% fewer labels across all datasets. In line with our commitment to contributing to the scientific community, we have made the S4MI code openly accessible, allowing for broader application and further development of these methods. The code can be accessed at https://github.com/pranavsinghps1/S4MI .


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizado de Máquina Supervisionado , Humanos , Processamento de Imagem Assistida por Computador/métodos , Diagnóstico por Imagem/métodos , Algoritmos
2.
PeerJ ; 12: e17361, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38737741

RESUMO

Phytoplankton are the world's largest oxygen producers found in oceans, seas and large water bodies, which play crucial roles in the marine food chain. Unbalanced biogeochemical features like salinity, pH, minerals, etc., can retard their growth. With advancements in better hardware, the usage of Artificial Intelligence techniques is rapidly increasing for creating an intelligent decision-making system. Therefore, we attempt to overcome this gap by using supervised regressions on reanalysis data targeting global phytoplankton levels in global waters. The presented experiment proposes the applications of different supervised machine learning regression techniques such as random forest, extra trees, bagging and histogram-based gradient boosting regressor on reanalysis data obtained from the Copernicus Global Ocean Biogeochemistry Hindcast dataset. Results obtained from the experiment have predicted the phytoplankton levels with a coefficient of determination score (R2) of up to 0.96. After further validation with larger datasets, the model can be deployed in a production environment in an attempt to complement in-situ measurement efforts.


Assuntos
Aprendizado de Máquina , Fitoplâncton , Tecnologia de Sensoriamento Remoto , Tecnologia de Sensoriamento Remoto/métodos , Tecnologia de Sensoriamento Remoto/instrumentação , Oceanos e Mares , Monitoramento Ambiental/métodos , Aprendizado de Máquina Supervisionado
3.
Nat Commun ; 15(1): 3942, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38729933

RESUMO

In clinical oncology, many diagnostic tasks rely on the identification of cells in histopathology images. While supervised machine learning techniques necessitate the need for labels, providing manual cell annotations is time-consuming. In this paper, we propose a self-supervised framework (enVironment-aware cOntrastive cell represenTation learning: VOLTA) for cell representation learning in histopathology images using a technique that accounts for the cell's mutual relationship with its environment. We subject our model to extensive experiments on data collected from multiple institutions comprising over 800,000 cells and six cancer types. To showcase the potential of our proposed framework, we apply VOLTA to ovarian and endometrial cancers and demonstrate that our cell representations can be utilized to identify the known histotypes of ovarian cancer and provide insights that link histopathology and molecular subtypes of endometrial cancer. Unlike supervised models, we provide a framework that can empower discoveries without any annotation data, even in situations where sample sizes are limited.


Assuntos
Neoplasias do Endométrio , Neoplasias Ovarianas , Humanos , Feminino , Neoplasias do Endométrio/patologia , Neoplasias Ovarianas/patologia , Aprendizado de Máquina , Aprendizado de Máquina Supervisionado , Algoritmos , Processamento de Imagem Assistida por Computador/métodos
4.
BMC Microbiol ; 24(1): 162, 2024 May 10.
Artigo em Inglês | MEDLINE | ID: mdl-38730339

RESUMO

BACKGROUND: Coastal areas are subject to various anthropogenic and natural influences. In this study, we investigated and compared the characteristics of two coastal regions, Andhra Pradesh (AP) and Goa (GA), focusing on pollution, anthropogenic activities, and recreational impacts. We explored three main factors influencing the differences between these coastlines: The Bay of Bengal's shallower depth and lower salinity; upwelling phenomena due to the thermocline in the Arabian Sea; and high tides that can cause strong currents that transport pollutants and debris. RESULTS: The microbial diversity in GA was significantly higher than that in AP, which might be attributed to differences in temperature, soil type, and vegetation cover. 16S rRNA amplicon sequencing and bioinformatics analysis indicated the presence of diverse microbial phyla, including candidate phyla radiation (CPR). Statistical analysis, random forest regression, and supervised machine learning models classification confirm the diversity of the microbiome accurately. Furthermore, we have identified 450 cultures of heterotrophic, biotechnologically important bacteria. Some strains were identified as novel taxa based on 16S rRNA gene sequencing, showing promising potential for further study. CONCLUSION: Thus, our study provides valuable insights into the microbial diversity and pollution levels of coastal areas in AP and GA. These findings contribute to a better understanding of the impact of anthropogenic activities and climate variations on biology of coastal ecosystems and biodiversity.


Assuntos
Bactérias , Baías , Microbiota , Filogenia , RNA Ribossômico 16S , Água do Mar , Aprendizado de Máquina Supervisionado , RNA Ribossômico 16S/genética , Bactérias/classificação , Bactérias/genética , Bactérias/isolamento & purificação , Microbiota/genética , Água do Mar/microbiologia , Índia , Baías/microbiologia , Biodiversidade , DNA Bacteriano/genética , Salinidade , Análise de Sequência de DNA/métodos
5.
Comput Biol Med ; 175: 108510, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38691913

RESUMO

BACKGROUND: The seizure prediction algorithms have demonstrated their potential in mitigating epilepsy risks by detecting the pre-ictal state using ongoing electroencephalogram (EEG) signals. However, most of them require high-density EEG, which is burdensome to the patients for daily monitoring. Moreover, prevailing seizure models require extensive training with significant labeled data which is very time-consuming and demanding for the epileptologists. METHOD: To address these challenges, here we propose an adaptive channel selection strategy and a semi-supervised deep learning model respectively to reduce the number of EEG channels and to limit the amount of labeled data required for accurate seizure prediction. Our channel selection module is centered on features from EEG power spectra parameterization that precisely characterize the epileptic activities to identify the seizure-associated channels for each patient. The semi-supervised model integrates generative adversarial networks and bidirectional long short-term memory networks to enhance seizure prediction. RESULTS: Our approach is evaluated on the CHB-MIT and Siena epilepsy datasets. With utilizing only 4 channels, the method demonstrates outstanding performance with an AUC of 93.15% on the CHB-MIT dataset and an AUC of 88.98% on the Siena dataset. Experimental results also demonstrate that our selection approach reduces the model parameters and training time. CONCLUSIONS: Adaptive channel selection coupled with semi-supervised learning can offer the possible bases for a light weight and computationally efficient seizure prediction system, making the daily monitoring practical to improve patients' quality of life.


Assuntos
Eletroencefalografia , Convulsões , Humanos , Eletroencefalografia/métodos , Convulsões/fisiopatologia , Convulsões/diagnóstico , Processamento de Sinais Assistido por Computador , Aprendizado Profundo , Algoritmos , Bases de Dados Factuais , Epilepsia/fisiopatologia , Aprendizado de Máquina Supervisionado
6.
PLoS One ; 19(5): e0299583, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38696410

RESUMO

The mapping of metabolite-specific data to pathways within cellular metabolism is a major data analysis step needed for biochemical interpretation. A variety of machine learning approaches, particularly deep learning approaches, have been used to predict these metabolite-to-pathway mappings, utilizing a training dataset of known metabolite-to-pathway mappings. A few such training datasets have been derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG). However, several prior published machine learning approaches utilized an erroneous KEGG-derived training dataset that used SMILES molecular representations strings (KEGG-SMILES dataset) and contained a sizable proportion (~26%) duplicate entries. The presence of so many duplicates taint the training and testing sets generated from k-fold cross-validation of the KEGG-SMILES dataset. Therefore, the k-fold cross-validation performance of the resulting machine learning models was grossly inflated by the erroneous presence of these duplicate entries. Here we describe and evaluate the KEGG-SMILES dataset so that others may avoid using it. We also identify the prior publications that utilized this erroneous KEGG-SMILES dataset so their machine learning results can be properly and critically evaluated. In addition, we demonstrate the reduction of model k-fold cross-validation (CV) performance after de-duplicating the KEGG-SMILES dataset. This is a cautionary tale about properly vetting prior published benchmark datasets before using them in machine learning approaches. We hope others will avoid similar mistakes.


Assuntos
Redes e Vias Metabólicas , Aprendizado de Máquina Supervisionado , Humanos , Conjuntos de Dados como Assunto
7.
Clin Ter ; 175(3): 98-116, 2024.
Artigo em Inglês | MEDLINE | ID: mdl-38767067

RESUMO

Background: The human microbiome, consisting of diverse bacte-rial, fungal, protozoan and viral species, exerts a profound influence on various physiological processes and disease susceptibility. However, the complexity of microbiome data has presented significant challenges in the analysis and interpretation of these intricate datasets, leading to the development of specialized software that employs machine learning algorithms for these aims. Methods: In this paper, we analyze raw data taken from 16S rRNA gene sequencing from three studies, including stool samples from healthy control, patients with adenoma, and patients with colorectal cancer. Firstly, we use network-based methods to reduce dimensions of the dataset and consider only the most important features. In addition, we employ supervised machine learning algorithms to make prediction. Results: Results show that graph-based techniques reduces dimen-sion from 255 up to 78 features with modularity score 0.73 based on different centrality measures. On the other hand, projection methods (non-negative matrix factorization and principal component analysis) reduce dimensions to 7 features. Furthermore, we apply supervised machine learning algorithms on the most important features obtained from centrality measures and on the ones obtained from projection methods, founding that the evaluation metrics have approximately the same scores when applying the algorithms on the entire dataset, on 78 feature and on 7 features. Conclusions: This study demonstrates the efficacy of graph-based and projection methods in the interpretation for 16S rRNA gene sequencing data. Supervised machine learning on refined features from both approaches yields comparable predictive performance, emphasizing specific microbial features-bacteroides, prevotella, fusobacterium, lysinibacillus, blautia, sphingomonas, and faecalibacterium-as key in predicting patient conditions from raw data.


Assuntos
Microbiota , RNA Ribossômico 16S , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina não Supervisionado , Humanos , Microbiota/genética , RNA Ribossômico 16S/genética , RNA Ribossômico 16S/análise , Neoplasias Colorretais/microbiologia , Microbioma Gastrointestinal/genética , Algoritmos , Fezes/microbiologia , Adenoma/microbiologia
8.
Biometrics ; 80(2)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38768225

RESUMO

Conventional supervised learning usually operates under the premise that data are collected from the same underlying population. However, challenges may arise when integrating new data from different populations, resulting in a phenomenon known as dataset shift. This paper focuses on prior probability shift, where the distribution of the outcome varies across datasets but the conditional distribution of features given the outcome remains the same. To tackle the challenges posed by such shift, we propose an estimation algorithm that can efficiently combine information from multiple sources. Unlike existing methods that are restricted to discrete outcomes, the proposed approach accommodates both discrete and continuous outcomes. It also handles high-dimensional covariate vectors through variable selection using an adaptive least absolute shrinkage and selection operator penalty, producing efficient estimates that possess the oracle property. Moreover, a novel semiparametric likelihood ratio test is proposed to check the validity of prior probability shift assumptions by embedding the null conditional density function into Neyman's smooth alternatives (Neyman, 1937) and testing study-specific parameters. We demonstrate the effectiveness of our proposed method through extensive simulations and a real data example. The proposed methods serve as a useful addition to the repertoire of tools for dealing dataset shifts.


Assuntos
Algoritmos , Simulação por Computador , Modelos Estatísticos , Probabilidade , Humanos , Funções Verossimilhança , Biometria/métodos , Interpretação Estatística de Dados , Aprendizado de Máquina Supervisionado
9.
Cell Syst ; 15(5): 475-482.e6, 2024 May 15.
Artigo em Inglês | MEDLINE | ID: mdl-38754367

RESUMO

Image-based spatial transcriptomics methods enable transcriptome-scale gene expression measurements with spatial information but require complex, manually tuned analysis pipelines. We present Polaris, an analysis pipeline for image-based spatial transcriptomics that combines deep-learning models for cell segmentation and spot detection with a probabilistic gene decoder to quantify single-cell gene expression accurately. Polaris offers a unifying, turnkey solution for analyzing spatial transcriptomics data from multiplexed error-robust FISH (MERFISH), sequential fluorescence in situ hybridization (seqFISH), or in situ RNA sequencing (ISS) experiments. Polaris is available through the DeepCell software library (https://github.com/vanvalenlab/deepcell-spots) and https://www.deepcell.org.


Assuntos
Aprendizado Profundo , Perfilação da Expressão Gênica , Hibridização in Situ Fluorescente , Transcriptoma , Hibridização in Situ Fluorescente/métodos , Transcriptoma/genética , Perfilação da Expressão Gênica/métodos , Software , Humanos , Análise de Célula Única/métodos , Processamento de Imagem Assistida por Computador/métodos , Imagem Individual de Molécula/métodos , Animais , Aprendizado de Máquina Supervisionado
10.
J Neural Eng ; 21(3)2024 May 17.
Artigo em Inglês | MEDLINE | ID: mdl-38757187

RESUMO

Objective.Aiming for the research on the brain-computer interface (BCI), it is crucial to design a MI-EEG recognition model, possessing a high classification accuracy and strong generalization ability, and not relying on a large number of labeled training samples.Approach.In this paper, we propose a self-supervised MI-EEG recognition method based on self-supervised learning with one-dimensional multi-task convolutional neural networks and long short-term memory (1-D MTCNN-LSTM). The model is divided into two stages: signal transform identification stage and pattern recognition stage. In the signal transform recognition phase, the signal transform dataset is recognized by the upstream 1-D MTCNN-LSTM network model. Subsequently, the backbone network from the signal transform identification phase is transferred to the pattern recognition phase. Then, it is fine-tuned using a trace amount of labeled data to finally obtain the motion recognition model.Main results.The upstream stage of this study achieves more than 95% recognition accuracy for EEG signal transforms, up to 100%. For MI-EEG pattern recognition, the model obtained recognition accuracies of 82.04% and 87.14% with F1 scores of 0.7856 and 0.839 on the datasets of BCIC-IV-2b and BCIC-IV-2a.Significance.The improved accuracy proves the superiority of the proposed method. It is prospected to be a method for accurate classification of MI-EEG in the BCI system.


Assuntos
Interfaces Cérebro-Computador , Eletroencefalografia , Imaginação , Redes Neurais de Computação , Eletroencefalografia/métodos , Humanos , Imaginação/fisiologia , Aprendizado de Máquina Supervisionado , Reconhecimento Automatizado de Padrão/métodos
11.
BMC Med Inform Decis Mak ; 24(1): 126, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38755563

RESUMO

BACKGROUND: Chest X-ray imaging based abnormality localization, essential in diagnosing various diseases, faces significant clinical challenges due to complex interpretations and the growing workload of radiologists. While recent advances in deep learning offer promising solutions, there is still a critical issue of domain inconsistency in cross-domain transfer learning, which hampers the efficiency and accuracy of diagnostic processes. This study aims to address the domain inconsistency problem and improve autonomic abnormality localization performance of heterogeneous chest X-ray image analysis, particularly in detecting abnormalities, by developing a self-supervised learning strategy called "BarlwoTwins-CXR". METHODS: We utilized two publicly available datasets: the NIH Chest X-ray Dataset and the VinDr-CXR. The BarlowTwins-CXR approach was conducted in a two-stage training process. Initially, self-supervised pre-training was performed using an adjusted Barlow Twins algorithm on the NIH dataset with a Resnet50 backbone pre-trained on ImageNet. This was followed by supervised fine-tuning on the VinDr-CXR dataset using Faster R-CNN with Feature Pyramid Network (FPN). The study employed mean Average Precision (mAP) at an Intersection over Union (IoU) of 50% and Area Under the Curve (AUC) for performance evaluation. RESULTS: Our experiments showed a significant improvement in model performance with BarlowTwins-CXR. The approach achieved a 3% increase in mAP50 accuracy compared to traditional ImageNet pre-trained models. In addition, the Ablation CAM method revealed enhanced precision in localizing chest abnormalities. The study involved 112,120 images from the NIH dataset and 18,000 images from the VinDr-CXR dataset, indicating robust training and testing samples. CONCLUSION: BarlowTwins-CXR significantly enhances the efficiency and accuracy of chest X-ray image-based abnormality localization, outperforming traditional transfer learning methods and effectively overcoming domain inconsistency in cross-domain scenarios. Our experiment results demonstrate the potential of using self-supervised learning to improve the generalizability of models in medical settings with limited amounts of heterogeneous data. This approach can be instrumental in aiding radiologists, particularly in high-workload environments, offering a promising direction for future AI-driven healthcare solutions.


Assuntos
Radiografia Torácica , Aprendizado de Máquina Supervisionado , Humanos , Aprendizado Profundo , Interpretação de Imagem Radiográfica Assistida por Computador/métodos , Conjuntos de Dados como Assunto
12.
BMC Med Inform Decis Mak ; 24(1): 127, 2024 May 16.
Artigo em Inglês | MEDLINE | ID: mdl-38755570

RESUMO

BACKGROUND: Medical records are a valuable source for understanding patient health conditions. Doctors often use these records to assess health without solely depending on time-consuming and complex examinations. However, these records may not always be directly relevant to a patient's current health issue. For instance, information about common colds may not be relevant to a more specific health condition. While experienced doctors can effectively navigate through unnecessary details in medical records, this excess information presents a challenge for machine learning models in predicting diseases electronically. To address this, we have developed 'al-BERT', a new disease prediction model that leverages the BERT framework. This model is designed to identify crucial information from medical records and use it to predict diseases. 'al-BERT' operates on the principle that the structure of sentences in diagnostic records is similar to regular linguistic patterns. However, just as stuttering in speech can introduce 'noise' or irrelevant information, similar issues can arise in written records, complicating model training. To overcome this, 'al-BERT' incorporates a semi-supervised layer that filters out irrelevant data from patient visitation records. This process aims to refine the data, resulting in more reliable indicators for disease correlations and enhancing the model's predictive accuracy and utility in medical diagnostics. METHOD: To discern noise diseases within patient records, especially those resembling influenza-like illnesses, our approach employs a customized semi-supervised learning algorithm equipped with a focused attention mechanism. This mechanism is specifically calibrated to enhance the model's sensitivity to chronic conditions while concurrently distilling salient features from patient records, thereby augmenting the predictive accuracy and utility of the model in clinical settings. We evaluate the performance of al-BERT using real-world health insurance data provided by Taiwan's National Health Insurance. RESULT: In our study, we evaluated our model against two others: one based on BERT that uses complete disease records, and another variant that includes extra filtering techniques. Our findings show that models incorporating filtering mechanisms typically perform better than those using the entire, unfiltered dataset. Our approach resulted in improved outcomes across several key measures: AUC-ROC (an indicator of a model's ability to distinguish between classes), precision (the accuracy of positive predictions), recall (the model's ability to find all relevant cases), and overall accuracy. Most notably, our model showed a 15% improvement in recall compared to the current best-performing method in the field of disease prediction. CONCLUSION: The conducted ablation study affirms the advantages of our attention mechanism and underscores the crucial role of the selection module within al-BERT.


Assuntos
Registros Eletrônicos de Saúde , Humanos , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina
13.
Cell ; 187(10): 2502-2520.e17, 2024 May 09.
Artigo em Inglês | MEDLINE | ID: mdl-38729110

RESUMO

Human tissue, which is inherently three-dimensional (3D), is traditionally examined through standard-of-care histopathology as limited two-dimensional (2D) cross-sections that can insufficiently represent the tissue due to sampling bias. To holistically characterize histomorphology, 3D imaging modalities have been developed, but clinical translation is hampered by complex manual evaluation and lack of computational platforms to distill clinical insights from large, high-resolution datasets. We present TriPath, a deep-learning platform for processing tissue volumes and efficiently predicting clinical outcomes based on 3D morphological features. Recurrence risk-stratification models were trained on prostate cancer specimens imaged with open-top light-sheet microscopy or microcomputed tomography. By comprehensively capturing 3D morphologies, 3D volume-based prognostication achieves superior performance to traditional 2D slice-based approaches, including clinical/histopathological baselines from six certified genitourinary pathologists. Incorporating greater tissue volume improves prognostic performance and mitigates risk prediction variability from sampling bias, further emphasizing the value of capturing larger extents of heterogeneous morphology.


Assuntos
Imageamento Tridimensional , Neoplasias da Próstata , Humanos , Imageamento Tridimensional/métodos , Neoplasias da Próstata/patologia , Neoplasias da Próstata/diagnóstico por imagem , Masculino , Prognóstico , Aprendizado Profundo , Microtomografia por Raio-X/métodos , Aprendizado de Máquina Supervisionado
14.
Comput Methods Programs Biomed ; 250: 108164, 2024 Jun.
Artigo em Inglês | MEDLINE | ID: mdl-38718709

RESUMO

BACKGROUND AND OBJECTIVE: Current automatic electrocardiogram (ECG) diagnostic systems could provide classification outcomes but often lack explanations for these results. This limitation hampers their application in clinical diagnoses. Previous supervised learning could not highlight abnormal segmentation output accurately enough for clinical application without manual labeling of large ECG datasets. METHOD: In this study, we present a multi-instance learning framework called MA-MIL, which has designed a multi-layer and multi-instance structure that is aggregated step by step at different scales. We evaluated our method using the public MIT-BIH dataset and our private dataset. RESULTS: The results show that our model performed well in both ECG classification output and heartbeat level, sub-heartbeat level abnormal segment detection, with accuracy and F1 scores of 0.987 and 0.986 for ECG classification and 0.968 and 0.949 for heartbeat level abnormal detection, respectively. Compared to visualization methods, the IoU values of MA-MIL improved by at least 17 % and at most 31 % across all categories. CONCLUSIONS: MA-MIL could accurately locate the abnormal ECG segment, offering more trustworthy results for clinical application.


Assuntos
Algoritmos , Eletrocardiografia , Aprendizado de Máquina Supervisionado , Eletrocardiografia/métodos , Humanos , Frequência Cardíaca , Bases de Dados Factuais , Processamento de Sinais Assistido por Computador
16.
Med Image Anal ; 94: 103150, 2024 May.
Artigo em Inglês | MEDLINE | ID: mdl-38574545

RESUMO

Self-supervised representation learning can boost the performance of a pre-trained network on downstream tasks for which labeled data is limited. A popular method based on this paradigm, known as contrastive learning, works by constructing sets of positive and negative pairs from the data, and then pulling closer the representations of positive pairs while pushing apart those of negative pairs. Although contrastive learning has been shown to improve performance in various classification tasks, its application to image segmentation has been more limited. This stems in part from the difficulty of defining positive and negative pairs for dense feature maps without having access to pixel-wise annotations. In this work, we propose a novel self-supervised pre-training method that overcomes the challenges of contrastive learning in image segmentation. Our method leverages Information Invariant Clustering (IIC) as an unsupervised task to learn a local representation of images in the decoder of a segmentation network, but addresses three important drawbacks of this approach: (i) the difficulty of optimizing the loss based on mutual information maximization; (ii) the lack of clustering consistency for different random transformations of the same image; (iii) the poor correspondence of clusters obtained by IIC with region boundaries in the image. Toward this goal, we first introduce a regularized mutual information maximization objective that encourages the learned clusters to be balanced and consistent across different image transformations. We also propose a boundary-aware loss based on cross-correlation, which helps the learned clusters to be more representative of important regions in the image. Compared to contrastive learning applied in dense features, our method does not require computing positive and negative pairs and also enhances interpretability through the visualization of learned clusters. Comprehensive experiments involving four different medical image segmentation tasks reveal the high effectiveness of our self-supervised representation learning method. Our results show the proposed method to outperform by a large margin several state-of-the-art self-supervised and semi-supervised approaches for segmentation, reaching a performance close to full supervision with only a few labeled examples.


Assuntos
Processamento de Imagem Assistida por Computador , Aprendizagem , Humanos , Aprendizado de Máquina Supervisionado
17.
J Neural Eng ; 21(2)2024 Apr 17.
Artigo em Inglês | MEDLINE | ID: mdl-38588700

RESUMO

Objective. The instability of the EEG acquisition devices may lead to information loss in the channels or frequency bands of the collected EEG. This phenomenon may be ignored in available models, which leads to the overfitting and low generalization of the model.Approach. Multiple self-supervised learning tasks are introduced in the proposed model to enhance the generalization of EEG emotion recognition and reduce the overfitting problem to some extent. Firstly, channel masking and frequency masking are introduced to simulate the information loss in certain channels and frequency bands resulting from the instability of EEG, and two self-supervised learning-based feature reconstruction tasks combining masked graph autoencoders (GAE) are constructed to enhance the generalization of the shared encoder. Secondly, to take full advantage of the complementary information contained in these two self-supervised learning tasks to ensure the reliability of feature reconstruction, a weight sharing (WS) mechanism is introduced between the two graph decoders. Thirdly, an adaptive weight multi-task loss (AWML) strategy based on homoscedastic uncertainty is adopted to combine the supervised learning loss and the two self-supervised learning losses to enhance the performance further.Main results. Experimental results on SEED, SEED-V, and DEAP datasets demonstrate that: (i) Generally, the proposed model achieves higher averaged emotion classification accuracy than various baselines included in both subject-dependent and subject-independent scenarios. (ii) Each key module contributes to the performance enhancement of the proposed model. (iii) It achieves higher training efficiency, and significantly lower model size and computational complexity than the state-of-the-art (SOTA) multi-task-based model. (iv) The performances of the proposed model are less influenced by the key parameters.Significance. The introduction of the self-supervised learning task helps to enhance the generalization of the EEG emotion recognition model and eliminate overfitting to some extent, which can be modified to be applied in other EEG-based classification tasks.


Assuntos
Eletroencefalografia , Emoções , Aprendizado de Máquina Supervisionado , Aprendizado de Máquina Supervisionado/normas , Conjuntos de Dados como Assunto , Humanos
18.
Sensors (Basel) ; 24(7)2024 Mar 29.
Artigo em Inglês | MEDLINE | ID: mdl-38610406

RESUMO

Wearable sensors could be beneficial for the continuous quantification of upper limb motor symptoms in people with Parkinson's disease (PD). This work evaluates the use of two inertial measurement units combined with supervised machine learning models to classify and predict a subset of MDS-UPDRS III subitems in PD. We attached the two compact wearable sensors on the dorsal part of each hand of 33 people with PD and 12 controls. Each participant performed six clinical movement tasks in parallel with an assessment of the MDS-UPDRS III. Random forest (RF) models were trained on the sensor data and motor scores. An overall accuracy of 94% was achieved in classifying the movement tasks. When employed for classifying the motor scores, the averaged area under the receiver operating characteristic values ranged from 68% to 92%. Motor scores were additionally predicted using an RF regression model. In a comparative analysis, trained support vector machine models outperformed the RF models for specific tasks. Furthermore, our results surpass the literature in certain cases. The methods developed in this work serve as a base for future studies, where home-based assessments of pharmacological effects on motor function could complement regular clinical assessments.


Assuntos
Doença de Parkinson , Humanos , Doença de Parkinson/diagnóstico , Aprendizado de Máquina , Movimento , Aprendizado de Máquina Supervisionado , Mãos
19.
Int J Mol Sci ; 25(7)2024 Mar 28.
Artigo em Inglês | MEDLINE | ID: mdl-38612602

RESUMO

Molecular property prediction is an important task in drug discovery, and with help of self-supervised learning methods, the performance of molecular property prediction could be improved by utilizing large-scale unlabeled dataset. In this paper, we propose a triple generative self-supervised learning method for molecular property prediction, called TGSS. Three encoders including a bi-directional long short-term memory recurrent neural network (BiLSTM), a Transformer, and a graph attention network (GAT) are used in pre-training the model using molecular sequence and graph structure data to extract molecular features. The variational auto encoder (VAE) is used for reconstructing features from the three models. In the downstream task, in order to balance the information between different molecular features, a feature fusion module is added to assign different weights to each feature. In addition, to improve the interpretability of the model, atomic similarity heat maps were introduced to demonstrate the effectiveness and rationality of molecular feature extraction. We demonstrate the accuracy of the proposed method on chemical and biological benchmark datasets by comparative experiments.


Assuntos
Benchmarking , Descoberta de Drogas , Animais , Fontes de Energia Elétrica , Estro , Aprendizado de Máquina Supervisionado
20.
Brief Bioinform ; 25(3)2024 Mar 27.
Artigo em Inglês | MEDLINE | ID: mdl-38605640

RESUMO

Language models pretrained by self-supervised learning (SSL) have been widely utilized to study protein sequences, while few models were developed for genomic sequences and were limited to single species. Due to the lack of genomes from different species, these models cannot effectively leverage evolutionary information. In this study, we have developed SpliceBERT, a language model pretrained on primary ribonucleic acids (RNA) sequences from 72 vertebrates by masked language modeling, and applied it to sequence-based modeling of RNA splicing. Pretraining SpliceBERT on diverse species enables effective identification of evolutionarily conserved elements. Meanwhile, the learned hidden states and attention weights can characterize the biological properties of splice sites. As a result, SpliceBERT was shown effective on several downstream tasks: zero-shot prediction of variant effects on splicing, prediction of branchpoints in humans, and cross-species prediction of splice sites. Our study highlighted the importance of pretraining genomic language models on a diverse range of species and suggested that SSL is a promising approach to enhance our understanding of the regulatory logic underlying genomic sequences.


Assuntos
Splicing de RNA , Vertebrados , Animais , Humanos , Sequência de Bases , Vertebrados/genética , RNA , Aprendizado de Máquina Supervisionado
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA