Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 13 de 13
Filtrar
1.
Artigo em Inglês | MEDLINE | ID: mdl-38758623

RESUMO

Excessive invalid explorations at the beginning of training lead deep reinforcement learning process to fall into the risk of overfitting, further resulting in spurious decisions, which obstruct agents in the following states and explorations. This phenomenon is termed primacy bias in online reinforcement learning. This work systematically investigates the primacy bias in online reinforcement learning, discussing the reason for primacy bias, while the characteristic of primacy bias is also analyzed. Besides, to learn a policy generalized to the following states and explorations, we develop an online reinforcement learning framework, termed self-distillation reinforcement learning (SDRL), based on knowledge distillation, allowing the agent to transfer the learned knowledge into a randomly initialized policy at regular intervals, and the new policy network is used to replace the original one in the following training. The core idea for this work is distilling knowledge from the trained policy to another policy can filter biases out, generating a more generalized policy in the learning process. Moreover, to avoid the overfitting of the new policy due to excessive distillations, we add an additional loss in the knowledge distillation process, using L2 regularization to improve the generalization, and the self-imitation mechanism is introduced to accelerate the learning on the current experiences. The results of several experiments in DMC and Atari 100k suggest the proposal has the ability to eliminate primacy bias for reinforcement learning methods, and the policy after knowledge distillation can urge agents to get higher scores more quickly.

2.
Artigo em Inglês | MEDLINE | ID: mdl-37339032

RESUMO

Introducing deep learning technologies into the medical image processing field requires accuracy guarantee, especially for high-resolution images relayed through endoscopes. Moreover, works relying on supervised learning are powerless in the case of inadequate labeled samples. Therefore, for end-to-end medical image detection with overcritical efficiency and accuracy in endoscope detection, an ensemble-learning-based model with a semi-supervised mechanism is developed in this work. To gain a more accurate result through multiple detection models, we propose a new ensemble mechanism, termed alternative adaptive boosting method (Al-Adaboost), combining the decision-making of two hierarchical models. Specifically, the proposal consists of two modules. One is a local region proposal model with attentive temporal-spatial pathways for bounding box regression and classification, and the other one is a recurrent attention model (RAM) to provide more precise inferences for further classification according to the regression result. The proposal Al-Adaboost will adjust the weights of labeled samples and the two classifiers adaptively, and the nonlabel samples are assigned pseudolabels by our model. We investigate the performance of Al-Adaboost on both the colonoscopy and laryngoscopy data coming from CVC-ClinicDB and the affiliated hospital of Kaohsiung Medical University. The experimental results prove the feasibility and superiority of our model.

3.
IEEE Trans Cybern ; 53(3): 1699-1711, 2023 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-34506297

RESUMO

Some researchers have introduced transfer learning mechanisms to multiagent reinforcement learning (MARL). However, the existing works devoted to cross-task transfer for multiagent systems were designed just for homogeneous agents or similar domains. This work proposes an all-purpose cross-transfer method, called multiagent lateral transfer (MALT), assisting MARL with alleviating the training burden. We discuss several challenges in developing an all-purpose multiagent cross-task transfer learning method and provide a feasible way of reusing knowledge for MARL. In the developed method, we take features as the transfer object rather than policies or experiences, inspired by the progressive network. To achieve more efficient transfer, we assign pretrained policy networks for agents based on clustering, while an attention module is introduced to enhance the transfer framework. The proposed method has no strict requirements for the source task and target task. Compared with the existing works, our method can transfer knowledge among heterogeneous agents and also avoid negative transfer in the case of fully different tasks. As far as we know, this article is the first work denoted to all-purpose cross-task transfer for MARL. Several experiments in various scenarios have been conducted to compare the performance of the proposed method with baselines. The results demonstrate that the method is sufficiently flexible for most settings, including cooperative, competitive, homogeneous, and heterogeneous configurations.

4.
BMC Bioinformatics ; 22(Suppl 5): 93, 2021 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-34749631

RESUMO

BACKGROUND: Atrial fibrillation is a paroxysmal heart disease without any obvious symptoms for most people during the onset. The electrocardiogram (ECG) at the time other than the onset of this disease is not significantly different from that of normal people, which makes it difficult to detect and diagnose. However, if atrial fibrillation is not detected and treated early, it tends to worsen the condition and increase the possibility of stroke. In this paper, P-wave morphology parameters and heart rate variability feature parameters were simultaneously extracted from the ECG. A total of 31 parameters were used as input variables to perform the modeling of artificial intelligence ensemble learning model. RESULTS: This paper applied three artificial intelligence ensemble learning methods, namely Bagging ensemble learning method, AdaBoost ensemble learning method, and Stacking ensemble learning method. The prediction results of these three artificial intelligence ensemble learning methods were compared. As a result of the comparison, the Stacking ensemble learning method combined with various models finally obtained the best prediction effect with the accuracy of 92%, sensitivity of 88%, specificity of 96%, positive predictive value of 95.7%, negative predictive value of 88.9%, F1 score of 0.9231 and area under receiver operating characteristic curve value of 0.911. CONCLUSION: In feature extraction, this paper combined P-wave morphology parameters and heart rate variability parameters as input parameters for model training, and validated the value of the proposed parameters combination for the improvement of the model's predicting effect. In the calculation of the P-wave morphology parameters, the hybrid Taguchi-genetic algorithm was used to obtain more accurate Gaussian function fitting parameters. The prediction model was trained using the Stacking ensemble learning method, so that the model accuracy had better results, which can further improve the early prediction of atrial fibrillation.


Assuntos
Fibrilação Atrial , Algoritmos , Inteligência Artificial , Fibrilação Atrial/diagnóstico , Eletrocardiografia , Humanos , Aprendizado de Máquina , Curva ROC
5.
BMC Bioinformatics ; 22(Suppl 5): 94, 2021 Nov 08.
Artigo em Inglês | MEDLINE | ID: mdl-34749635

RESUMO

BACKGROUND: Differentiating and counting various types of white blood cells (WBC) in bone marrow smears allows the detection of infection, anemia, and leukemia or analysis of a process of treatment. However, manually locating, identifying, and counting the different classes of WBC is time-consuming and fatiguing. Classification and counting accuracy depends on the capability and experience of operators. RESULTS: This paper uses a deep learning method to count cells in color bone marrow microscopic images automatically. The proposed method uses a Faster RCNN and a Feature Pyramid Network to construct a system that deals with various illumination levels and accounts for color components' stability. The dataset of The Second Affiliated Hospital of Zhejiang University is used to train and test. CONCLUSIONS: The experiments test the effectiveness of the proposed white blood cell classification system using a total of 609 white blood cell images with a resolution of 2560 × 1920. The highest overall correct recognition rate could reach 98.8% accuracy. The experimental results show that the proposed system is comparable to some state-of-art systems. A user interface allows pathologists to operate the system easily.


Assuntos
Aprendizado Profundo , Leucemia , Medula Óssea , Humanos , Processamento de Imagem Assistida por Computador , Leucócitos
6.
Sci Prog ; 104(3_suppl): 368504221110856, 2021 07.
Artigo em Inglês | MEDLINE | ID: mdl-35818893

RESUMO

In a pineapple exporting factory, manual lines are usually built to screen fruits of non-ripen hitting sounds from millions of undecided fruits for long-haul transportation. However, human workers cannot concentratedly listen and make consistent judgments over long hours. Pineapple screening becomes arbitrary after approximately an hour. We developed a non-destructive screening device aside from the conveyor sorter to classify pineapples automatically. The device makes intelligent judgments by tapping a sound source to the skin of pineapples and analyzing the penetrated sounds by wavelet kernel decomposition and unsupervised machine learning (ML). The sound tapping relies on the well-touch of the skin. We also design several acoustic couplers to adapt the vibrator to the skin and pick high-quality penetrated sounds. A Taguchi experiment design was used to determine the most suitable coupler. We found that our unsupervised ML method achieves 98.56% accuracy and 0.93 F1-score by using a specially designed thorn-board for assisting tapping sound to fruit skin.


Assuntos
Ananas , Acústica , Frutas , Humanos , Aprendizado de Máquina não Supervisionado
7.
Comput Med Imaging Graph ; 84: 101763, 2020 09.
Artigo em Inglês | MEDLINE | ID: mdl-32805673

RESUMO

Conventional computer-aided detection systems (CADs) for colonoscopic images utilize shape, texture, or temporal information to detect polyps, so they have limited sensitivity and specificity. This study proposes a method to extract possible polyp features automatically using convolutional neural networks (CNNs). The objective of this work aims at building up a light-weight dual encoder-decoder model structure for polyp detection in colonoscopy Images. This proposed model, though with a relatively shallow structure, is expected to have the capability of a similar performance to the methods with much deeper structures. The proposed CAD model consists of two sequential encoder-decoder networks that consist of several CNN layers and full connection layers. The front end of the model is a hetero-associator (also known as hetero-encoder) that uses backpropagation learning to generate a set of reliably corrupted labeled images with a certain degree of similarity to a ground truth image, which eliminates the need for a large amount of training data that is usually required for medical images tasks. This dual CNN architecture generates a set of noisy images that are similar to the labeled data to train its counterpart, the auto-associator (also known as auto-encoder), in order to increase the successor's discriminative power in classification. The auto-encoder is also equipped with CNNs to simultaneously capture the features of the labeled images that contain noise. The proposed method uses features that are learned from open medical datasets and the dataset of Zhejiang University (ZJU), which contains around one thousand images. The performance of the proposed architecture is compared with a state-of-the-art detection model in terms of the metrics of the Jaccard index, the DICE similarity score, and two other geometric measures. The improvements in the performance of the proposed model are attributed to the effective reduction in false positives in the auto-encoder and the generation of noisy candidate images by the hetero-encoder.


Assuntos
Processamento de Imagem Assistida por Computador , Redes Neurais de Computação , Colonoscopia , Humanos , Sensibilidade e Especificidade
8.
ISA Trans ; 98: 434-444, 2020 Mar.
Artigo em Inglês | MEDLINE | ID: mdl-31543262

RESUMO

For deep reinforcement learning (DRL) system, it is difficult to design a reward function for complex tasks, so this paper proposes a framework of behavior fusion for the actor-critic architecture, which learns the policy based on an advantage function that consists of two value functions. Firstly, the proposed method decomposes a complex task into several sub-tasks, and merges the trained policies for those sub-tasks into a unified policy for the complex task, instead of designing a new reward function and training for the policy. Each sub-task is trained individually by an actor-critic algorithm using a simple reward function. These pre-trained sub-tasks are building blocks that are used to rapidly assemble a rapid prototype of a complicated task. Secondly, the proposed method integrates modules in the calculation of the policy gradient by calculating the accumulated returns to reduce variation. Thirdly, two alternative methods to acquire integrated returns for the complicated task are also proposed. The Atari 2600 pong game and a wafer probe task are used to validate the performance of the proposed methods by comparison with the method using a gate network.

9.
Technol Health Care ; 26(1): 17-27, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-29060950

RESUMO

BACKGROUND: Effective neurological rehabilitation requires long term assessment and treatment. The rapid progress of virtual reality-based assistive technologies and tele-rehabilitation has increased the potential for self-rehabilitation of various neurological injuries under clinical supervision. OBJECTIVE: The objective of this study was to develop a fuzzy inference mechanism for a smart mobile computing system designed to support in-home rehabilitation of patients with neurological injury in the hand by providing an objective means of self-assessment. METHODS: A commercially available tablet computer equipped with a Bluetooth motion sensor was integrated in a splint to obtain a smart assistive device for collecting hand motion data, including writing performance and the corresponding grasp force. A virtual reality game was also embedded in the smart splint to support hand rehabilitation. Quantitative data obtained during the rehabilitation process were modeled by fuzzy logic. Finally, the improvement in hand function was quantified with a fuzzy rule database of expert opinion and experience. RESULTS: Experiments in chronic stroke patients showed that the proposed system is applicable for supporting in-home hand rehabilitation. CONCLUSIONS: The proposed virtual reality system can be customized for specific therapeutic purposes. Commercial development of the system could immediately provide stroke patients with an effective in-home rehabilitation therapy for improving hand problems.


Assuntos
Computadores de Mão , Lógica Fuzzy , Mãos/fisiologia , Reabilitação do Acidente Vascular Cerebral/métodos , Telerreabilitação/métodos , Realidade Virtual , Doença Crônica , Força da Mão , Humanos , Tecnologia Assistiva , Reabilitação do Acidente Vascular Cerebral/instrumentação , Telerreabilitação/instrumentação , Redação
10.
IEEE Trans Cybern ; 45(5): 964-76, 2015 May.
Artigo em Inglês | MEDLINE | ID: mdl-25122850

RESUMO

In a multiagent system, if agents' experiences could be accessible and assessed between peers for environmental modeling, they can alleviate the burden of exploration for unvisited states or unseen situations so as to accelerate the learning process. Since how to build up an effective and accurate model within a limited time is an important issue, especially for complex environments, this paper introduces a model-based reinforcement learning method based on a tree structure to achieve efficient modeling and less memory consumption. The proposed algorithm tailored a Dyna-Q architecture to multiagent systems by means of a tree structure for modeling. The tree-model built from real experiences is used to generate virtual experiences such that the elapsed time in learning could be reduced. As well, this model is suitable for knowledge sharing. This paper is inspired by the concept of knowledge sharing methods in multiagent systems where an agent could construct a global model from scattered local models held by individual agents. Consequently, it can increase modeling accuracy so as to provide valid simulated experiences for indirect learning at the early stage of learning. To simplify the sharing process, the proposed method applies resampling techniques to grafting partial branches of trees containing required and useful experiences disseminated from experienced peers, instead of merging the whole trees. The simulation results demonstrate that the proposed sharing method can achieve the objectives of sample efficiency and learning acceleration in multiagent cooperation applications.

11.
IEEE Trans Neural Netw Learn Syst ; 24(5): 776-88, 2013 May.
Artigo em Inglês | MEDLINE | ID: mdl-24808427

RESUMO

The objective of this paper is to accelerate the process of policy improvement in reinforcement learning. The proposed Dyna-style system combines two learning schemes, one of which utilizes a temporal difference method for direct learning; the other uses relative values for indirect learning in planning between two successive direct learning cycles. Instead of establishing a complicated world model, the approach introduces a simple predictor of average rewards to actor-critic architecture in the simulation (planning) mode. The relative value of a state, defined as the accumulated differences between immediate reward and average reward, is used to steer the improvement process in the right direction. The proposed learning scheme is applied to control a pendulum system for tracking a desired trajectory to demonstrate its adaptability and robustness. Through reinforcement signals from the environment, the system takes the appropriate action to drive an unknown dynamic to track desired outputs in few learning cycles. Comparisons are made between the proposed model-free method, a connectionist adaptive heuristic critic, and an advanced method of Dyna-Q learning in the experiments of labyrinth exploration. The proposed method outperforms its counterparts in terms of elapsed time and convergence rate.

12.
IEEE Trans Syst Man Cybern B Cybern ; 35(2): 255-68, 2005 Apr.
Artigo em Inglês | MEDLINE | ID: mdl-15828654

RESUMO

An adaptive multiagent reinforcement learning method for solving congestion control problems on dynamic high-speed networks is presented. Traditional reactive congestion control selects a source rate in terms of the queue length restricted to a predefined threshold. However, the determination of congestion threshold and sending rate is difficult and inaccurate due to the propagation delay and the dynamic nature of the networks. A simple and robust cooperative multiagent congestion controller (CMCC), which consists of two subsystems: a long-term policy evaluator, expectation-return predictor and a short-term rate selector composed of action-value evaluator and stochastic action selector elements has been proposed to solve the problem. After receiving cooperative reinforcement signals generated by a cooperative fuzzy reward evaluator using game theory, CMCC takes the best action to regulate source flow with the features of high throughput and low packet loss rate. By means of learning procedures, CMCC can learn to take correct actions adaptively under time-varying environments. Simulation results showed that the proposed approach can promote the system utilization and decrease packet losses simultaneously.


Assuntos
Algoritmos , Inteligência Artificial , Sistemas Computacionais , Lógica Fuzzy , Armazenamento e Recuperação da Informação/métodos , Modelos Estatísticos , Processamento de Sinais Assistido por Computador , Redes de Comunicação de Computadores , Simulação por Computador , Retroalimentação
13.
Artigo em Inglês | MEDLINE | ID: mdl-18238198

RESUMO

Based on the feedback linearization theory, this paper presents how a reinforcement learning scheme that is adopted to construct artificial neural networks (ANNs) can linearize a nonlinear system effectively. The proposed reinforcement linearization learning system (RLLS) consists of two sub-systems: The evaluation predictor (EP) is a long-term policy selector, and the other is a short-term action selector composed of linearizing control (LC) and reinforce predictor (RP) elements. In addition, a reference model plays the role of the environment, which provides the reinforcement signal to the linearizing process. The RLLS thus receives reinforcement signals to accomplish the linearizing behavior to control a nonlinear system such that it can behave similarly to the reference model. Eventually, the RLLS performs identification and linearization concurrently. Simulation results demonstrate that the proposed learning scheme, which is applied to linearizing a pendulum system, provides better control reliability and robustness than conventional ANN schemes. Furthermore, a PI controller is used to control the linearized plant where the affine system behaves like a linear system.

SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...