A deep reinforcement learning approach for solving the Traveling Salesman Problem with Drone


Creative Commons License

Bogyrbayeva A., Yoon T., Ko H., Lim S., Yun H., Kwon C.

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, vol.148, 2023 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 148
  • Publication Date: 2023
  • Doi Number: 10.1016/j.trc.2022.103981
  • Journal Name: TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, Environment Index, Geobase, INSPEC, Metadex, Civil Engineering Abstracts
  • Keywords: Vehicle routing, Traveling salesman problem, Drones, Reinforcement learning, Neural networks, NEIGHBORHOOD SEARCH, OPTIMIZATION, LOGISTICS, TRUCK
  • Süleyman Demirel University Affiliated: No

Abstract

Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination-a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose a hybrid model that uses an attention encoder and a Long Short-Term Memory (LSTM) network decoder, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for the coordinated routing of multiple vehicles than the attention-based model. The proposed model demonstrates comparable results as the operations research baseline methods.