Published August 6, 2022 | Version Journal
Journal article Open

Reinforcement learning applied to production planning and control


The objective of this paper is to examine the use and applications of reinforcement learning (RL) techniques in the production planning and control (PPC) field addressing the following PPC areas: facility resource planning, capacity planning, purchase and supply management, production scheduling and inventory management. The main RL characteristics, such as method, context, states, actions, reward and highlights, were analysed. The considered number of agents, applications and RL software tools, specifically, programming language, platforms, application programming interfaces and RL frameworks, among others, were identified, and 181 articles were sreviewed. The results showed that RL was applied mainly to production scheduling problems, followed by purchase and supply management. The most revised RL algorithms were model-free and single-agent and were applied to simplified PPC environments. Nevertheless, their results seem to be promising compared to traditional mathematical programming and heuristics/metaheuristics solution methods, and even more so when they incorporate uncertainty or non-linear properties. Finally, RL value-based approaches are the most widely used, specifically Q-learning and its variants and for deep RL, deep Q-networks. In recent years however, the most widely used approach has been the actor-critic method, such as the advantage actor critic, proximal policy optimisation, deep deterministic policy gradient and trust region policy optimisation.


The funding for the research work that has led to the obtained results came from the following grants: CADS4.0 (Ref. RTI2018-101344-B-I00) and NIOTOME (Ref. RTI2018-102020-B-I00), financed by MCIN/AEI/10.13039/501100011033 and 'ERDF A way of making Europe'; 'Industrial Production and Logistics Optimization in Industry 4.0' (i4OPT) (Ref. PROMETEO/2021/065) and 'Resilient, Sustainable and PeopleOriented Supply Chain 5.0 Optimization Using Hybrid Intelligence' (RESPECT) (Ref. IGE/2021/159) Projects were funded by the Generalitat Valenciana (Valencian Regional Government).


Reinforcement learning applied to production planning and control.pdf

Files (3.2 MB)

Additional details


ZDMP – Zero Defect Manufacturing Platform 825631
European Commission
i4Q – Industrial Data Services for Quality Control in Smart Manufacturing 958205
European Commission