Model-based Reinforcement Learning for Accelerated Learning From CFD Simulations
This thesis presents and evaluates an approach for model-based deep reinforcement learning used for active flow control of a 2D flow past a cylinder to accelerate the learning process of the DRL agent. In the wake of a 2D flow past a cylinder, a von Kármán vortex street can be observed. By rotating the cylinder a DRL agent tries to find a proper control law for mitigating the vorticity and therefore the drag and the oscillations of the drag and the lift at the cylinder. Since the DRL agent only has access to 400 fixed pressure sensors on the cylinder's surface a feed-forward neural network is developed for predicting the pressure values of the next state using the pressure from the previous states and the action taken by the agent. The environment model is used autoregressively to predict whole trajectories from only one start state. The presented approach shows the general feasibility of model-based trajectory sampling for active flow control using DRL. Furthermore, the influence of the number of subsequent previous states used for the prediction of the next state is investigated, showing that more subsequent states yield a better prediction accuracy. Also, the reduction of the number of pressure sensors used for the environment model input is investigated considering the memory consumption and prediction accuracy. The resulting model predicts the next state containing 400 pressure values, as well as the drag and lift coefficient from 30 subsequent time steps containing only 16 pressure values plus the action. The influence of the number of neurons per hidden layer has also been examined, revealing that the prediction accuracy rises with a rising number of neurons per hidden layer but the models have not been able to provide a stable and promising DRL training. Although the tested neural network architectures are not sufficient enough for conducting a working model-based DRL training run, this thesis reveals several pitfalls and challenges of environment modeling for this flow problem and proposes the next steps to take from here.