Learning 3-opt heuristics for traveling salesman problem via deep reinforcement learning

Jingyan Sui (Institute of Computing Technology, Chinese Academy of Sciences)*; Shi-Zhe Ding (Institute of Computing Technology, Chinese Academy of Sciences); Ruizhi Liu (Institute of Computing Technology, Chinese Academy of Sciences); Liming Xu (Institute of Computing Technology, Chinese Academy of Sciences); Dongbo Bu (Insitute of Computing Technology, Chinese Academy of Sciences)
PMLR Page

Abstract

Traveling salesman problem (TSP) is a classical combinatorial optimization problem. As it represents a large number of important practical problems, it has received extensive studies and a great variety of algorithms have been proposed to solve it, including exact and heuristic algorithm. The success of heuristic algorithms relies heavily on the design of powerful heuristic rules, and most of the existing heuristic rules were manually designed by experienced experts to model their insights and observations on TSP instances and solutions. Recent studies have shown an alternative promising design strategy that directly learns heuristic rules from TSP instances without any manual interference. Here, we report an iterative improvement approach (called Neural-3-OPT) that solves TSP through automatically learning effective 3-opt heuristics via deep reinforcement learning. In the proposed approach, we adopt a pointer network to select 3 links from the current tour, and a feature-wise linear modulation network to select an appropriate way to reconnect the segments after removing the selected 3 links. We demonstrate that our approach achieves state-of-the-art performance on both real TSP instances and randomly-generated instances than, to the best of our knowledge, the existing neural network-based approaches.