RESUMO
BACKGROUND: Peri-operative chemo-radiotherapyplayed important rolein locally advanced gastric cancer. Whether preoperative strategy can improve the long-term prognosis compared with postoperative treatment is unclear. The study purpose to compare oncologic outcomes in locally advanced gastric cancer patients treated with preoperative chemo-radiotherapy (pre-CRT) and postoperative chemo-radiotherapy (post-CRT). METHODS: From January 2009 to April 2019, 222 patients from 2 centers with stage T3/4 and/or N positive gastric cancer who received pre-CRT and post-CRT were included. After propensity score matching (PSM), comparisons of local regional control (LC), distant metastasis-free survival (DMFS), disease-free survival (DFS) and overall survival (OS) were performed using Kaplan-Meier analysis and log-rank test between pre- and post-CRT groups. RESULTS: The median follow-up period was 30 months. 120 matched cases were generated for analysis. Three-year LC, DMFS, DFS and OS for pre- vs. post-CRT groups were 93.8% vs. 97.2% (p = 0.244), 78.7% vs. 65.7% (p = 0.017), 74.9% vs. 65.3% (p = 0.042) and 74.4% vs. 61.2% (p = 0.055), respectively. Pre-CRT were significantly associated with DFS in uni- and multi-variate analysis. CONCLUSION: Preoperative CRT showed advantages of oncologic outcome compared with postoperative CRT. TRIAL REGISTRATION: ClinicalTrial.gov NCT01291407 , NCT03427684 and NCT04062058 , date of registration: Feb 8, 2011.
Assuntos
Quimiorradioterapia Adjuvante/métodos , Gastrectomia , Neoplasias Gástricas/terapia , Adulto , Idoso , Quimiorradioterapia Adjuvante/mortalidade , Intervalo Livre de Doença , Feminino , Seguimentos , Humanos , Estimativa de Kaplan-Meier , Masculino , Pessoa de Meia-Idade , Período Pós-Operatório , Período Pré-Operatório , Prognóstico , Pontuação de Propensão , Neoplasias Gástricas/mortalidade , Taxa de Sobrevida , Resultado do TratamentoRESUMO
OBJECTIVE: The predictive effect of preoperative chemoradiotherapy (CRT) is low and difficult in guiding individualized treatment. We examined a surrogate endpoint for long-term outcomes in locally advanced gastric cancer patients after preoperative CRT. METHODS: From April 2012 to April 2019, 95 patients with locally advanced gastric cancer who received preoperative concurrent CRT and who were enrolled in three prospective studies were included. All patients were stage T3/4N+. Local control, distant metastasis-free survival (DMFS), disease-free survival (DFS) and overall survival (OS) were evaluated. Clinicopathological factors related to long-term prognosis were analyzed using univariate and multivariate analyses. The down-staging depth score (DDS), which is a novel method of evaluating CRT response, was used to predict long-term outcomes. RESULTS: The median follow-up period for survivors was 30 months. The area under the curve (AUC) of the receiver operating characteristic (ROC) curve predicted by the DDS was 0.728, which was better than the pathological complete response (pCR), histological response and ypN0. Decision curve analysis further affirmed that DDS had the largest net benefit. The DDS cut-off value was 4. pCR and ypN0 were associated with OS (P=0.026 and 0.049). Surgery and DDS are correlated with DMFS, DFS and OS (surgery: P=0.001, <0.001 and <0.001, respectively; and DDS: P=0.009, 0.013 and 0.032, respectively). Multivariate analysis showed that DDS was an independent prognostic factor of DFS (P=0.021). CONCLUSIONS: DDS is a simple, short-term indicator that was a better surrogate endpoint than pCR, histological response and ypN0 for DFS.
RESUMO
Assessing the prognosis of patients with hepatocellular carcinoma (HCC) by the number and size of tumors is sometimes difficult. The main purpose of the study was to evaluate the prognostic value of total tumor volume (TTV), which combines the two factors, in patients with HCC who underwent liver resection. We retrospectively reviewed 521 HCC patients from January 2001 to December 2008 in our center. Patients were categorized using the tertiles of TTV. The prognostic value of TTV was assessed. With a median follow-up of 116 months, the 1-, 3-, and 5-year overall survival (OS) rates of the patients were 93.1 , 69.9, and 46.3 %, respectively. OS was significantly differed by TTV tertile groups, and higher TTV was associated with shorter OS (P < 0.001). Multivariate analysis revealed that TTV was an independent prognostic factor for OS. Larger TTV was significantly associated with higher alpha-fetoprotein level, presence of macrovascular invasion, multiple tumor lesions, larger tumor size, and advanced tumor stages (all P < 0.05). Within the first and second tertiles of TTV (TTV ≤ 73.5 cm(3)), no significant differences in OS were detected in patients within and beyond Milan criteria (P = 0.183). TTV-based Cancer of the Liver Italian Program (CLIP) score gained the lowest Akaike information criterion value, the highest χ (2) value of likelihood ratio test, and the highest C-index among the tested staging systems. Our results suggested that TTV is a good indicator of tumor burden in patients with HCC. Further studies are warranted to validate the prognostic value of TTV.
Assuntos
Carcinoma Hepatocelular/mortalidade , Carcinoma Hepatocelular/patologia , Hepatectomia/mortalidade , Neoplasias Hepáticas/mortalidade , Neoplasias Hepáticas/patologia , Carga Tumoral , Carcinoma Hepatocelular/cirurgia , Feminino , Seguimentos , Humanos , Neoplasias Hepáticas/cirurgia , Masculino , Pessoa de Meia-Idade , Estadiamento de Neoplasias , Prognóstico , Estudos Retrospectivos , Taxa de SobrevidaRESUMO
For on-policy reinforcement learning (RL), discretizing action space for continuous control can easily express multiple modes and is straightforward to optimize. However, without considering the inherent ordering between the discrete atomic actions, the explosion in the number of discrete actions can possess undesired properties and induce a higher variance for the policy gradient (PG) estimator. In this article, we introduce a straightforward architecture that addresses this issue by constraining the discrete policy to be unimodal using Poisson probability distributions. This unimodal architecture can better leverage the continuity in the underlying continuous action space using explicit unimodal probability distributions. We conduct extensive experiments to show that the discrete policy with the unimodal probability distribution provides significantly faster convergence and higher performance for on-policy RL algorithms in challenging control tasks, especially in highly complex tasks such as Humanoid. We provide theoretical analysis on the variance of the PG estimator, which suggests that our attentively designed unimodal discrete policy can retain a lower variance and yield a stable learning process.
RESUMO
Communication-based multiagent reinforcement learning (MARL) has shown promising results in promoting cooperation by enabling agents to exchange information. However, the existing methods have limitations in large-scale multiagent systems due to high information redundancy, and they tend to overlook the unstable training process caused by the online-trained communication protocol. In this work, we propose a novel method called neighboring variational information flow (NVIF), which enhances communication among neighboring agents by providing them with the maximum information set (MIS) containing more information than the existing methods. NVIF compresses the MIS into a compact latent state while adopting neighboring communication. To stabilize the overall training process, we introduce a two-stage training mechanism. We first pretrain the NVIF module using a randomly sampled offline dataset to create a task-agnostic and stable communication protocol, and then use the pretrained protocol to perform online policy training with RL algorithms. Our theoretical analysis indicates that NVIF-proximal policy optimization (PPO), which combines NVIF with PPO, has the potential to promote cooperation with agent-specific rewards. Experiment results demonstrate the superiority of our method in both heterogeneous and homogeneous settings. Additional experiment results also demonstrate the potential of our method for multitask learning.
RESUMO
Predicting future trajectories of pairwise traffic agents in highly interactive scenarios, such as cut-in, yielding, and merging, is challenging for autonomous driving. The existing works either treat such a problem as a marginal prediction task or perform single-axis factorized joint prediction, where the former strategy produces individual predictions without considering future interaction, while the latter strategy conducts conditional trajectory-oriented prediction via agentwise interaction or achieves conditional rollout-oriented prediction via timewise interaction. In this article, we propose a novel double-axis factorized joint prediction pipeline, namely, conditional goal-oriented trajectory prediction (CGTP) framework, which models future interaction both along the agent and time axes to achieve goal and trajectory interactive prediction. First, a goals-of-interest network (GoINet) is designed to extract fine-grained features of goal candidates via hierarchical vectorized representation. Furthermore, we propose a conditional goal prediction network (CGPNet) to produce multimodal goal pairs in an agentwise conditional manner, along with a newly designed goal interactive loss to better learn the joint distribution of the intermediate interpretable modes. Explicitly guided by the goal-pair predictions, we propose a goal-oriented trajectory rollout network (GTRNet) to predict scene-compliant trajectory pairs via timewise interactive rollouts. Extensive experimental results confirm that the proposed CGTP outperforms the state-of-the-art (SOTA) prediction models on the Waymo open motion dataset (WOMD), Argoverse motion forecasting dataset, and In-house cut-in dataset. Code is available at https://github.com/LiDinga/CGTP/.
RESUMO
Recent works have demonstrated that transformer can achieve promising performance in computer vision, by exploiting the relationship among image patches with self-attention. They only consider the attention in a single feature layer, but ignore the complementarity of attention in different layers. In this article, we propose broad attention to improve the performance by incorporating the attention relationship of different layers for vision transformer (ViT), which is called BViT. The broad attention is implemented by broad connection and parameter-free attention. Broad connection of each transformer layer promotes the transmission and integration of information for BViT. Without introducing additional trainable parameters, parameter-free attention jointly focuses on the already available attention information in different layers for extracting useful information and building their relationship. Experiments on image classification tasks demonstrate that BViT delivers superior accuracy of 75.0%/81.6% top-1 accuracy on ImageNet with 5M/22M parameters. Moreover, we transfer BViT to downstream object recognition benchmarks to achieve 98.9% and 89.9% on CIFAR10 and CIFAR100, respectively, that exceed ViT with fewer parameters. For the generalization test, the broad attention in Swin Transformer, T2T-ViT and LVT also brings an improvement of more than 1%. To sum up, broad attention is promising to promote the performance of attention-based models. Code and pretrained models are available at https://github.com/DRL/BViT.
RESUMO
Communicating agents with each other in a distributed manner and behaving as a group are essential in multi-agent reinforcement learning. However, real-world multi-agent systems suffer from restrictions on limited bandwidth communication. If the bandwidth is fully occupied, some agents are not able to send messages promptly to others, causing decision delay and impairing cooperative effects. Recent related work has started to address the problem but still fails in maximally reducing the consumption of communication resources. In this article, we propose an event-triggered communication network (ETCNet) to enhance communication efficiency in multi-agent systems by communicating only when necessary. For different task requirements, two paradigms of the ETCNet framework, event-triggered sending network (ETSNet) and event-triggered receiving network (ETRNet), are proposed for learning efficient sending and receiving protocols, respectively. Leveraging the information theory, the limited bandwidth is translated to the penalty threshold of an event-triggered strategy, which determines whether an agent at each step participates in communication or not. Then, the design of the event-triggered strategy is formulated as a constrained Markov decision problem and reinforcement learning finds the feasible and optimal communication protocol that satisfies the limited bandwidth constraint. Experiments on typical multi-agent tasks demonstrate that ETCNet outperforms other methods in reducing bandwidth occupancy and still preserves the cooperative performance of multi-agent systems at the most.
RESUMO
In single-agent Markov decision processes, an agent can optimize its policy based on the interaction with the environment. In multiplayer Markov games (MGs), however, the interaction is nonstationary due to the behaviors of other players, so the agent has no fixed optimization objective. The challenge becomes finding equilibrium policies for all players. In this research, we treat the evolution of player policies as a dynamical process and propose a novel learning scheme for Nash equilibrium. The core is to evolve one's policy according to not just its current in-game performance, but an aggregation of its performance over history. We show that for a variety of MGs, players in our learning scheme will provably converge to a point that is an approximation to Nash equilibrium. Combined with neural networks, we develop an empirical policy optimization algorithm, which is implemented in a reinforcement-learning framework and runs in a distributed way, with each player optimizing its policy based on own observations. We use two numerical examples to validate the convergence property on small-scale MGs, and a pong example to show the potential on large games.
RESUMO
Multiagent reinforcement learning methods, such as VDN, QMIX, and QTRAN, that adopt centralized training with decentralized execution (CTDE) framework have shown promising results in cooperation and competition. However, in some multiagent scenarios, the number of agents and the size of the action set actually vary over time. We call these unshaped scenarios, and the methods mentioned above fail in performing satisfyingly. In this article, we propose a new method, called Unshaped Networks for Multiagent Systems (UNMAS), which adapts to the number and size changes in multiagent systems. We propose the self-weighting mixing network to factorize the joint action-value. Its adaption to the change in agent number is attributed to the nonlinear mapping from each-agent Q value to the joint action-value with individual weights. Besides, in order to address the change in an action set, each agent constructs an individual action-value network that is composed of two streams to evaluate the constant environment-oriented subset and the varying unit-oriented subset. We evaluate UNMAS on various StarCraft II micromanagement scenarios and compare the results with several state-of-the-art MARL algorithms. The superiority of UNMAS is demonstrated by its highest winning rates especially on the most difficult scenario 3s5z_vs_3s6z. The agents learn to perform effectively cooperative behaviors, while other MARL algorithms fail. Animated demonstrations and source code are provided in https://sites.google.com/view/unmas.
RESUMO
The Nash equilibrium is an important concept in game theory. It describes the least exploitability of one player from any opponents. We combine game theory, dynamic programming, and recent deep reinforcement learning (DRL) techniques to online learn the Nash equilibrium policy for two-player zero-sum Markov games (TZMGs). The problem is first formulated as a Bellman minimax equation, and generalized policy iteration (GPI) provides a double-loop iterative way to find the equilibrium. Then, neural networks are introduced to approximate Q functions for large-scale problems. An online minimax Q network learning algorithm is proposed to train the network with observations. Experience replay, dueling network, and double Q-learning are applied to improve the learning process. The contributions are twofold: 1) DRL techniques are combined with GPI to find the TZMG Nash equilibrium for the first time and 2) the convergence of the online learning algorithm with a lookup table and experience replay is proven, whose proof is not only useful for TZMGs but also instructive for single-agent Markov decision problems. Experiments on different examples validate the effectiveness of the proposed algorithm on TZMG problems.
RESUMO
Existing model-based value expansion (MVE) methods typically leverage a world model for value estimation with a fixed rollout horizon to assist policy learning. However, a proper horizon setting is essential to world-model-based policy learning. Meanwhile, choosing an appropriate horizon value is time-consuming, especially for visual control tasks. In this article, we investigate the idea of adaptively using the model knowledge for value expansion. We propose a novel world-model-based method called dynamic-horizon MVE (DMVE) to adjust the use of the world model with adaptive rollout horizon selection. Based on the reconstruction-based technique, the raw and reconstructed images are both used to obtain multihorizon rollouts by utilizing latent imagination. Then, a horizon reliability degree detection approach is given to select appropriate horizons and obtain more accurate value estimation by the reconstructed value expansion errors. Experimental results on the mainstream benchmark visual control tasks show that DMVE outperforms all baselines in sample efficiency and final performance. In addition, experiments on the autonomous driving lane-changing task further demonstrate the scalability of our method. The codes of DMVE are available at https://github.com/JunjieWang95/dmve.
RESUMO
Multisensor fusion-based road segmentation plays an important role in the intelligent driving system since it provides a drivable area. The existing mainstream fusion method is mainly to feature fusion in the image space domain which causes the perspective compression of the road and damages the performance of the distant road. Considering the bird's eye views (BEVs) of the LiDAR remains the space structure in the horizontal plane, this article proposes a bidirectional fusion network (BiFNet) to fuse the image and BEV of the point cloud. The network consists of two modules: 1) the dense space transformation (DST) module, which solves the mutual conversion between the camera image space and BEV space and 2) the context-based feature fusion module, which fuses the different sensors information based on the scenes from corresponding features. This method has achieved competitive results on the KITTI dataset.
Assuntos
Redes Neurais de ComputaçãoRESUMO
Although neural the architecture search (NAS) can bring improvement to deep models, it always neglects precious knowledge of existing models. The computation and time costing property in NAS also means that we should not start from scratch to search, but make every attempt to reuse the existing knowledge. In this article, we discuss what kind of knowledge in a model can and should be used for a new architecture design. Then, we propose a new NAS algorithm, namely, ModuleNet, which can fully inherit knowledge from the existing convolutional neural networks. To make full use of the existing models, we decompose existing models into different modules, which also keep their weights, consisting of a knowledge base. Then, we sample and search for a new architecture according to the knowledge base. Unlike previous search algorithms, and benefiting from inherited knowledge, our method is able to directly search for architectures in the macrospace by the NSGA-II algorithm without tuning parameters in these modules. Experiments show that our strategy can efficiently evaluate the performance of a new architecture even without tuning weights in convolutional layers. With the help of knowledge we inherited, our search results can always achieve better performance on various datasets (CIFAR10, CIFAR100, and ImageNet) over original architectures.
Assuntos
Algoritmos , Redes Neurais de ComputaçãoRESUMO
The 3-D object detection is crucial for many real-world applications, attracting many researchers' attention. Beyond 2-D object detection, 3-D object detection usually needs to extract appearance, depth, position, and orientation information from light detection and ranging (LiDAR) and camera sensors. However, due to more degrees of freedom and vertices, existing detection methods that directly transform from 2-D to 3-D still face several challenges, such as exploding increase of anchors' number and inefficient or hard-to-optimize objective. To this end, we present a fast segmentation method for 3-D point clouds to reduce anchors, which can largely decrease the computing cost. Moreover, taking advantage of 3-D generalized Intersection of Union (GIoU) and L1 losses, we propose a fused loss to facilitate the optimization of 3-D object detection. A series of experiments show that the proposed method has alleviated the abovementioned issues effectively.
RESUMO
Efficient neural architecture search (ENAS) achieves novel efficiency for learning architecture with high-performance via parameter sharing and reinforcement learning (RL). In the phase of architecture search, ENAS employs deep scalable architecture as search space whose training process consumes most of the search cost. Moreover, time-consuming model training is proportional to the depth of deep scalable architecture. Through experiments using ENAS on CIFAR-10, we find that layer reduction of scalable architecture is an effective way to accelerate the search process of ENAS but suffers from a prohibitive performance drop in the phase of architecture estimation. In this article, we propose a broad neural architecture search (BNAS) where we elaborately design broad scalable architecture dubbed broad convolutional neural network (BCNN) to solve the above issue. On the one hand, the proposed broad scalable architecture has fast training speed due to its shallow topology. Moreover, we also adopt RL and parameter sharing used in ENAS as the optimization strategy of BNAS. Hence, the proposed approach can achieve higher search efficiency. On the other hand, the broad scalable architecture extracts multi-scale features and enhancement representations, and feeds them into global average pooling (GAP) layer to yield more reasonable and comprehensive representations. Therefore, the performance of broad scalable architecture can be promised. In particular, we also develop two variants for BNAS that modify the topology of BCNN. In order to verify the effectiveness of BNAS, several experiments are performed and experimental results show that 1) BNAS delivers 0.19 days which is 2.37× less expensive than ENAS who ranks the best in RL-based NAS approaches; 2) compared with small-size (0.5 million parameters) and medium-size (1.1 million parameters) models, the architecture learned by BNAS obtains state-of-the-art performance (3.58% and 3.24% test error) on CIFAR-10; and 3) the learned architecture achieves 25.3% top-1 error on ImageNet just using 3.9 million parameters.
Assuntos
Aprendizagem/classificação , Aprendizado de Máquina , Redes Neurais de Computação , Reforço PsicológicoRESUMO
This paper investigates the automatic exploration problem under the unknown environment, which is the key point of applying the robotic system to some social tasks. The solution to this problem via stacking decision rules is impossible to cover various environments and sensor properties. Learning-based control methods are adaptive for these scenarios. However, these methods are damaged by low learning efficiency and awkward transferability from simulation to reality. In this paper, we construct a general exploration framework via decomposing the exploration process into the decision, planning, and mapping modules, which increases the modularity of the robotic system. Based on this framework, we propose a deep reinforcement learning-based decision algorithm that uses a deep neural network to learning exploration strategy from the partial map. The results show that this proposed algorithm has better learning efficiency and adaptability for unknown environments. In addition, we conduct the experiments on the physical robot, and the results suggest that the learned policy can be well transferred from simulation to the real robot.
RESUMO
Tongue diagnosis plays a pivotal role in traditional Chinese medicine (TCM) for thousands of years. As one of the most important tongue characteristics, tooth-marked tongue is related to spleen deficiency and can greatly contribute to the symptoms differentiation and treatment selection. Yet, the tooth-marked tongue recognition for TCM practitioners is subjective and challenging. Most of the previous studies have concentrated on subjectively selected features of the tooth-marked region and gained accuracy under 80%. In the present study, we proposed an artificial intelligence framework using deep convolutional neural network (CNN) for the recognition of tooth-marked tongue. First, we constructed relatively large datasets with 1548 tongue images captured by different equipments. Then, we used ResNet34 CNN architecture to extract features and perform classifications. The overall accuracy of the models was over 90%. Interestingly, the models can be successfully generalized to images captured by other devices with different illuminations. The good effectiveness and generalization of our framework may provide objective and convenient computer-aided tongue diagnostic method on tracking disease progression and evaluating pharmacological effect from a informatics perspective.
RESUMO
This paper is concerned about the nonlinear optimization problem of nonzero-sum (NZS) games with unknown drift dynamics. The data-based integral reinforcement learning (IRL) method is proposed to approximate the Nash equilibrium of NZS games iteratively. Furthermore, we prove that the data-based IRL method is equivalent to the model-based policy iteration algorithm, which guarantees the convergence of the proposed method. For the implementation purpose, a single-critic neural network structure for the NZS games is given. To enhance the application capability of the data-based IRL method, we design the updating laws of critic weights based on the offline and online iterative learning methods, respectively. Note that the experience replay technique is introduced in the online iterative learning, which can improve the convergence rate of critic weights during the learning process. The uniform ultimate boundedness of the critic weights are guaranteed using the Lyapunov method. Finally, the numerical results demonstrate the effectiveness of the data-based IRL algorithm for nonlinear NZS games with unknown drift dynamics.