Deep Reinforcement Learning for Optimal Sailing Upwind


We describe the application of deep reinforcement learning (DRL) methods to determine the optimal decision policy when sailing a sailboat towards a target point located upwind from the boat’s current position, under the conditions of wind direction and speed that vary according to an unknown stochastic process, as is typical in real sailing races. A model of the dynamics of the sailboat is described together with a suitable choice of actions, in the form of a Markov decision process (MDP), which allows the application of a wide variety of DRL algorithms. Empirical results show that the learned policy outperforms baseline control algorithms that do not take into consideration the variability in wind strength and direction, and instead assume that the current wind conditions will persist indefinitel