Data-driven hospital personnel scheduling optimization through patients prediction

With the rapid development of the modern city, technologies of smart cities are indispensable for solving urban problems. Medical services are one of the key areas related to the lives of urban residents. In particular, how to effectively manage the human resources of a hospital is a complex and challenging problem to improve treatment capabilities. Due to the grievous shortage of medical personnel, hospitals have to make quality schedules to improve the efficiency of the hospital and the utilization rate of human resources. Although there have been a large number of researches on hospital staff scheduling, few people also consider future patient population forecasts, doctor scheduling and hospital structure. These factors are very important in the hospital staff scheduling problem. Concerning this, this paper establishes an optimization system combining a two-layer mixed-integer linear programming and an extended prophet model for the hospital personnel scheduling. The model considers factors such as weather, disease types, number of patients, room resources, doctor resources, working hours, etc., and can quickly obtain a timetable with complex constraints. Finally, the convergence and the practicability of the model has been verified with real data from a hospital in China.


Introduction
The hospital is the basic element in the healthcare system, and it is very important to human health.With the development of economy and the improvement of living standard, people urgently demand hospitals to provide more efficient and better services.However, due to the limitation of various resources, most hospitals can only improve the efficiency and quality of services by optimizing scheduling, especially by optimizing personnel scheduling.Personnel scheduling is a critical part of hospital scheduling and is also a classic and challenging problem in the field of Operations Research (OR) (Turhan and Bilgen 2020).Personnel scheduling not only improves the patients' experience but also bring benefits to hospitals as it increases the efficiency of medical personnel, reduces idle time of facilities, and decreases hospital operating cost.Therefore, in many countries recently, the shortage of doctors and the high turnover rate (Habsari and Ilyas 2019) have also forced hospitals to continuously optimize personnel scheduling to compensate for the decline in service efficiency caused by the shortage of doctors.
Medical personnel and patients are the two most important factors that make personnel scheduling in hospitals intricate.On the one hand, the problem of personnel scheduling is subject to many restrictions, such as working hours and the number of doctors of different types, so the optimization of this problem is very complicated.On the other hand, in the past studies, generally only one aspect was carried out, such as the prediction of the future number of patients, or some adjustments to the future hospital staff scheduling based on the historical number of patients, but the content of the two parts should be complementary.The cyclical changes of different types of patients over time, weather and other factors are equally important to hospital scheduling.When the number of certain patients will change drastically in the future (for example, when the season changes, the number of colds will increase), the hospital should respond in advance in the schedule in order to provide better services to the patients.At the same time, considering the situation of the hospital and the patient will make the problem more complicated, and there are many more factors to consider than ordinary scheduling.Different from the number of doctors in a hospital, the future changes in the number of patients are data that cannot be accurately obtained, which makes it more difficult to accurately schedule medical staff.
This paper proposes a new optimization system combining an extended Prophet prediction model and a two-layer mixed-integer linear programming (MILP) to address the personnel scheduling problem in hospitals.The extended prophet prediction model can accurately predict the number of patients, thereby reducing the uncertainty of the patient flow.Then based on the results from the prediction model, the two-layer MILP can optimize the schedules with simultaneously considering doctors and patients.MILP has been proven to be an effective approach to address the scheduling problem in hospitals (Pulido et al. 2014) even more complex optimization problem (Schempp et al. 2019;Xu et al. 2019).At the same time, this architecture combining prediction and optimization had been successfully applied to solve other complicated optimization problems and demonstrated its practicality and reliability, for example, the accurate prediction and regulation of the renewable energy supply chain (Yan et al. 2020) and the rescue demand forecast and distribution plan for sudden disasters (Schempp et al. 2018).This article's contributions can be summarized as: • This article propose a new system consisting of prediction and optimization.The system considers the conditions of doctors and patients at the same time, which can improve the practicality of the model.The remainder of this paper is organized as follows.Section 2 gives a literature review on the hospital personnel scheduling.In Sect.3, The problem description and adopted methodology are given.Section 4 details the two parts of the optimization system, namely prediction and optimization.In Sect.5, the real data from a hospital in China is applied the optimization system and the results demonstrate the model's convergence and practicability.Conclusions are provided in Sect.6.

Related work
Generally, the researches on personnel scheduling in hospitals can be roughly divided into 3 main categories according to scheduling approaches: manual scheduling, mathematical programming scheduling, and artificial intelligence (AI) scheduling.
In manual scheduling, the work shift schedules of medical personnel are typically decided by the hospital managers.The schedule repeats over a certain time period, usually 1 week or 1 month, but once determined, it is fixed for a long time.Therefore, manual scheduling minimizes the uncertainty and allow doctors, nurses even patients to prepare in advance.MARCHIONNO (1987) proposed a set of rules to help schedule the nurses in three-shift rotation.Further, Hu et al. (2009) developed a set of dispatching rules for the outpatient examination based on the statistic analysis of the hospital's historical data, thereby reducing the waiting time of the outpatients.However, manual scheduling is not suitable when the hospital situations often changes or incidents happen, which may cause unreasonable allocation of medical resources.
Mathematical programming scheduling is a widely used traditional approach to address the personnel scheduling problem.This approach obtains results by solving a series of equations, which are formula expressions of various realistic limitations and constraints.Essentially, mathematical programming scheduling is to search for an optimal solution over a solution space specified by various constraints.Some previous studies have proved that mathematical approaches including integer programming (Jaumard et al. 1998;Gartner and Kolisch 2014;Sitepu et al. 2018;Belien and Demeulemeester 2006;Beaulieu et al. 2000), stochastic programming (Wang and Tang 2014;Kim and Mehrotra 2015;Bagheri et al. 2016), multi-objective programming (Gharbi et al. 2017;Parr and Thompson 2007;Topaloglu 2006), are powerful techniques for hospital personnel scheduling problem.As the number of constraints becomes larger and larger, searching the solutions would become an extremely time-consuming task.Hence, the trend of recent studies is to integrated mathematical programming with other optimization techniques like simulated annealing (Turhan and Bilgen 2020) and local search (Burke et al. 2010).Actually, in this hybrid approach, the final results are mainly determined by mathematical programming instead of the optimization techniques.Those techniques just refine the preliminary solutions obtained by mathematical programming, which satisfy all the hard constraints.Thus, those hybrid approaches are essentially mathematical programming approaches.Although only approximate optimal solutions are obtained, such hybrid approaches can greatly reduce the computational time when the constraints are numerous and complex.
Applying AI techniques to solve urban safety problems (Chen et al. 2016) become popular in recent years.Spyropoulos (2000) gave an overview of AI scheduling in the hospital environment at an early stage, explaining that AI can be well applied to hospital management and medical planning arrangements, and summarizing the existing artificial intelligence achievements at that time.His conclusion is that artificial intelligence will soon be applied to the practice of hospital scheduling.Further, data-driven emergency management (Song et al. 2020;Haoran et al. 2019) including hospital emergency management (Xia et al. 2019) has gradually been integrated into the real system and has been applied to daily life.Currently, a variety of artificial intelligence technologies have been widely used in hospital scheduling problems.For example, Artificial Neural Network (ANN) is used to build an information-driven model of patient arrival rate and service time (Rajakumari and Madhunisha 2020); Machine Learning (ML) is used to improve patients' satisfaction with the appointment system (AS) and resources Utilization (Srinivas and Ravindran 2018); Reinforcement Learning (RL) is used to find the optimal emergency department patient scheduling plan (Lee and Lee 2020).Khaldi et al. (2019) combines ANN, Ensemble Empirical Mode decomposition (EEMD), and ARIMA model to predict the weekly patient visits in the emergency department, so as to optimize the hospital resources.Unlike the mathematical programming scheduling, AI scheduling only guarantees the solution is the approximately optimal not necessarily the global optimal solution.Even so, it is still a popular approach to address the hospital personnel scheduling problem due to its outstanding performance.
Additionally, there are other indirect approaches instead of directly optimizing the hospital personnel scheduling approach to improving the efficiency and quality of hospital services.The most commonly used approach is simulationbased optimization.Generally, this approach is aiming to optimize the allocations of various hospital resources including room resources so as to improve efficiency.Sasanfar et al. (2020) proposed a simulation-based optimization approach to evaluate a new layout design and optimize patients' waiting time and staff allocation.Cincar and Ivascu (2019) proposed a multi-agent-based hospital scheduling system that could continuously adjust schedules according to the dynamic changing environment.The simulation-based approach does not guarantee the optimal solution as well, but it can obtain the results even the constrains are intricate.Furthermore, by integrating other scheduling approaches, the schedule quality can be further improved.
Overall, three points can be concluded from the above literature review.Firstly, the hospital personnel scheduling problem is different as the requirements/constraints are different, so it is difficult to judge the best approach.Secondly, most of the previous studies only consider one role, the doctors, the nurses or the patients, and very few of them simultaneously considered multi-roles.In the real case, the interactions among doctors, nurses, and patients are also significant in the scheduling.Thirdly, few papers verify their approaches under a real hospital situation and give a clear baseline.

Problem definition
The hospital needs to provide services for all patients who arrive that day with limited human resources.Therefore, the hospital needs a reasonable shift schedule to maximize the use of current human resources, meet the service needs of patients, and improve the overall service quality of the hospital.This schedule needs to meet the needs of all patients of different types.Figure 1 shows a two-step optimization system with four inputs: the type of disease, the number of patients of each disease, the time series of weather and the human resource, and two outputs: the predicted number of patients in the next week and the work shift schedule of medical personnel.
Patients with different diseases in hospitals also have different requirements for medical resources.For example, patients with fractures need X-ray assisted examinations, while patients with common colds only need to go to the pharmacy to get medicines after diagnosis by the outpatient doctor.Ignoring the heterogeneity of the disease will reduce the accuracy of the final model.By analyzing the outpatient records of a hospital in Shenzhen, it can be found that the number of visits of different types of patients is highly correlated with the weather in recent days.When the temperature suddenly changes, the number of people who catch a cold increases significantly, while the number of people who are not urgently ill when there is a heavy rain drops drastically.In the study of solar power grids (Zheng et al. 2020), it is found that compared with the general artificial neural network, the model that takes into account the time series can improve the accuracy of prediction.Therefore, in the first step of the model, the prediction part comprehensively considers many factors, which may affect the number of visits for different types of patients in the next week (weather, temperature, the number of patient visits in history) and the number of patients is separately predicted.Only after obtaining the number of patients that each department may visit each day, the subsequent optimization model can better plan the number of future doctors.
In the second part of the model, mathematical programming will be used to search for the optimal solution of the optimized model in the solution space.In the traditional scheduling, because there is no forecast for the number of patients visiting in the future, medical resources can only be evenly distributed.As a result, regardless of the number of patients, the number of doctors will hardly change substantially.When the number of patients is large, the patient experience is poor, and when the number of patients is small, medical resources are wasted.By incorporating the output of the prediction model into the optimization model, an optimization model that can be dynamically adjusted can be better realized.

Prediction
After analysis, it can be found that the number of patients in different departments has three main characteristics: (1) strong periodicity; (2) affected by special dates, such as holidays and school opening time; (3) affected by weather factors, such as rainfall and temperature .The prediction model combines the prophet model and the Random Forest model to consider the above three influencing factors.

Prophet
The Prophet model (Taylor et al. 2018) performs well in dealing with the effects of periodicity and holidays.It is a time series prediction model proposed by Facebook in 2017 and it is used for forecasting time series data based on an additive model (Harvey 2006), where non-linear trends are fit with yearly, weekly, and daily seasonality, plus holiday effects.And it works best with time series that have strong seasonal effects and several seasons of historical data.Additionally, Prophet is robust to missing data and shifts in the trend, and typically handles outliers well.
In general, Prophet model could be written in the following equation: Among the four items, g(t) is the trend function which models non-periodic changes in the value of the time series.s(t) represents the periodicity function, h(t) is the holiday affection function, t represents other kinds of influence factors.
The trend function g(t) has two variants: (1) nonlinear, saturating growth, and (2) linear trend with changepoints.In this case, the number of patients is a linear multivariate, so the second variable will be used as the trend term.In a linear model, the trend of the curve will not always remain unchanged, and it will change at certain specific moments or periodic points.This kind of point is called changepoint.Assuming there are S changepoints which happens at timestamp s j , 1 ≤ j ≤ S .The change rate of growth rate at time s j is defined as j , = { 1 , 2 , … , S } .Then, the growth rate at time is k + ∑ j∶t>s j  j .As well as, Indicator function is defined as a(t) ∈ {0, 1} S , which is (1) So the growth rate at time t could be represented as k + a T .Additionally, Because the growth rate of the curve is constantly changing and may no longer be continuous, it is necessary to adjust the parameter m to keep the curve continuous.The adjustment amount of m is j = −s j j .Therefore, the function for g(t) is For the periodicity item, Prophet model uses Fourier series to provide periodic changes.Suppose parameter P is the period of time series, its s(t) function is The holiday function item is represent as a one-hot vector to indicate which day is a holiday.For each holiday, let D i be the date that affected by holiday, parameter i represents the Influence intensity.is normal distribution.Suppose the dataset has L holiday periods, so the functions for h(t) are

Random forest
However, Prophet model cannot consider weather factors such as rainfall and high temperatures, which largely affect people's desire to go out.For some patients with minor illnesses, heavy rain, and high temperatures probably let them stay at home but not go to the hospital today.Therefore, it is a good idea to use Random Forest (Breiman 2001) to analyze the prediction residuals of the Prophet model to simulate the impact of rainfall and high temperature on patients' willingness to seek medical attention.Compared with Decision Trees, Random Forest can avoid overfitting effectively.

Patient flow modeling
There are several assumptions in the optimization model to reduce the complexity of the model: (1) The average walking speed of patients is taken as the walking speed of all patients.The average speed is obtained through fieldwork.(2) The time the patient arrives at the hospital is (2) a j (t) = 1, t ≥ s j 0, t < s j (3) approximately a negative exponential distribution within a day.(3) The patient will always choose the best route.By inquiring about his electronic payment records, his treatment in the hospital and the medical resources he needs can be known.According to the data given by hospital officials, the optimal sequence of actions for patients should be as shown in Fig. 2. Combining the patient's payment records and the patient's optimal order of action provided by the hospital, the patient's action process is as follows: First, the patient will go to the outpatient department of the corresponding department for the first diagnosis.Based on the results of the first diagnosis, the patient will undergo further physical examination, CT, treatment, and even small outpatient surgery.After receiving the initial treatment, the patient returns to the initial outpatient clinic for follow-up, collects the medication after paying the fee, and then leaves the hospital.
Obviously, in this simplified patient treatment model, the movement trajectory of all patients in the hospital is limited.Due to the difficulty of changing the location of the room where large-scale medical equipment is placed and the pharmacy, the structural optimization of the outpatient department is mainly concentrated in the optimization of the outpatient clinics in each department.However, because the hospital has fewer rooms for each function, there are only 2 pharmacies, 10 CT rooms, and the largest number of outpatient clinics does not exceed 100, so the enumeration method can be used to obtain the shortest route for patients in different departments to complete the process of seeing a doctor in different clinics.Combining the number of patients in different departments and treatment pathways can calculate the weighted average distance that each patient needs to travel when each clinic is used as an outpatient clinic in different departments.

Double-layer MILP model
The model aims to minimize the total time the patient spends in the hospital.As shown in formula 6, the time spent by patients in the hospital can be divided into three parts, the time spent in the treatment process, the time spent walking, and the time spent in queuing.
Since the time spent in the treatment of patients cannot be reduced, the model focuses on reducing the time spent by patients walking and queuing as much as possible.If the same model is used to optimize queuing time and walking time at the same time, the objective function of this model will be a non-linear function.For the Mixed-Integer Non-Linear ProgrammingMINLP problem, the time and memory (6) Time total = Time treat + Time walk + Time queue space required to solve the problem will be greatly improved.
In the study of real data, it can be found that the time spent in queuing is greater than the time spent walking.If a twostep model is used to deal with this problem step by step, the first part is used to optimize the queuing time that has a greater impact on the whole, and the second part is used to optimize the walking time that has less impact while ensuring that the queuing time is short, then the objective function will be transformed from a complex non-linear polynomial to two simpler linear polynomials, and the time and space required for solving are greatly reduced.

Description:
The purpose of this layer of the optimization model is to minimize the waiting time of patients.This research found that the waiting time of patients has nothing to do with the distribution of outpatient clinics, but is only related to the number of outpatient clinics in corresponding departments and the number of patients.According to the queuing theory formula proposed by Kendall (1953), a queuing and waiting model can be constructed to calculate the average queuing time of patients.
For each department, the number of parallel service desks is equal to the number of outpatient clinics opened.The capacity of the queuing system and the number of customer sources can be understood as unlimited, but there is a limit that the outpatient clinic must complete treatment for all arriving patients every day.The patient's queuing rule is set to first come first served (FCFS), so the queuing model is Given parameters: M, the outpatient set I, the outpatient departments set DAY, optimized days set, since the doctor's schedule is a week, the optimized date is also a week.
P k , the number of doctors in k th department, k ∈ I N k,i , when optimizing the distribution of outpatient clin- ics through historical data, the number of patients are the average number of patients from Monday to Sunday in the historical data.When optimizing the schedule based on the predicted number of patients, the number of patients is equal to the predicted number of patients in the next week Therefore, according to the queuing theory model, the average queuing time of patients when the k th department opens u outpatient clinics on the i th day can be calculated.The calculation process is as follows.
In formula 7, represents the service efficiency of a single outpatient clinic for patients, T s is the service time.Different service time is consumed for different services.Take the outpatient examination as an example, its service time is 5 min on average.
In formula 8, k,i represents the visit frequency of patients in the k th department on the i th day, T day is the opening hours of the outpatient clinic every day, 8 h (480 min) of outpatient Fig. 2 The process of patient treatment opening time from 8 am to 12 noon, and 1:30 pm to 5:30 pm.Every department needs to open at least one outpatient clinic every day, even if there is no corresponding patient.
In formula 9, if k,i ∕u≥1,this means that the arrival speed of patients in a certain department is greater than the processing speed of the outpatient service, and the queue will be infinitely long, which is unacceptable for the hospital.Therefore, in the subsequent average queuing time calculation, the queuing time of this situation will be set to gigantic (in the experiment this value is set to 10 6 ) to prevent this from happening.
Formula 10 is the probability formula to get probability when the average team length is 0 in a steady state, and the subsequent average team length can be derived from it.
Formula 11 represents the average queue length in a steady state.The average queue time T w k,i,u of patients can be obtained by the length of the queue and the frequency of arrival of patients in formula 12.

Uncertainty variables:
Q k,i,u ,a matrix consisting of only 0 and 1, when Q k,i,u = 1 , it means that u outpatient clinics should be opened on the i th day of the k th department Q max k,u , a matrix consisting of only 0 and 1, used to record the maximum value of outpatient clinics opened by each department.When Q max k,u = 0 , it means that the number of outpatient clinics open in this department in any day should not be greater than u.
Q num k,i ,a matrix that records the number of doctors work- ing in the outpatient clinic in the k th department on the i th day.When optimizing the allocation of the number of outpatients in each department through historical data, Q num k,i is used as a variable.However, when the future number of patients obtained through prediction is arranged for future scheduling, Q num k,i is a given parameter to restrict the alloca- tion of outpatients.

Object function:
As shown in formula 13, the goal is to optimize the overall division of the number of outpatient clinics and the number of outpatient clinics open each day, so that the total queue time of patients during the optimization period is minimized.

Constraints:
For the matrix Q max that records the maximum number of outpatient clinics opened, it is obvious that for each department, this matrix is non-increasing, as shown in formula 14.
The number of outpatient clinics opened by the k th depart- ment on any day should not be greater than the maximum number of outpatient clinics in the corresponding department, as shown in formula 15.
In the hospital, there is one and only one doctor in each clinic.Therefore, the number of clinics opened is equal to the number of doctors in the corresponding department.Therefore, the number of clinics opened each day cannot be greater than the total number of doctors in the corresponding department, as shown in formula 16.
The number of opening clinics is not only limited by the number of doctors, but also by the total number of clinics in the hospital.The total number of outpatient clinics opened cannot be greater than the total number of outpatient clinics in the hospital, as shown in formula 17.
Obviously, each department can only open a fixed number of outpatient clinics a day, as shown in formula 18.
In the hospital's doctor schedule, each doctor cannot work more than 5 days a week, as shown in formula 19.
When optimizing the allocation of the number of outpatients in each department through historical data, if the number of doctors in the k th department on the i th day, as shown in formula 20.
However, in future personnel scheduling arrangements, the allocation of outpatient clinics will be affected by the historical best outpatient allocation, as shown in formula 21.

Description:
The main optimization goal of the second-level MILP model is the allocation of departments in outpatient clinics to minimize the total walking time of patients.The distribution of outpatient clinics is related to the structure of the hospital, the number of patients in different departments, and the different treatment routes received by patients with different diseases.This model uses week as a unit, according to the history and forecast of the number of patients in different departments, dynamically plan outpatient functions, opening hours, and doctors' schedules within a week.
Given parameters: M, the outpatient set I, the outpatient departments set DAY, optimized days set u Q k,i,u , a matrix that records the number of doctors working in the outpatient clinic in the k th department on the i th day, obtained from the first layer MILP model.
N k,i , when optimizing the distribution of outpatient clin- ics through historical data, the number of patients are the average number of patients from Monday to Sunday in the historical data.When optimizing the schedule based on the predicted number of patients, the number of patients is equal to the predicted number of patients in the next week.
D k,j , the weighted average distance that patients in the k th department need to travel through the j th outpatient clinic.
v, the average speed of the patient is obtained by sampling in the hospital.
Uncertainty variables: BSP i,j,k , a matrix consisting of only 0 and 1, when BSP i,j,k = 1 , it means that the j th clinic is open on the i th day and belongs to the k th department.
BP j,k ,a matrix for recording outpatient functions.When BP j,k = 1 , it means that the j th outpatient clinic is used by the k th department.This matrix is used as a variable when optimizing the distribution of clinics through historical data.It is used as a known parameter when optimizing the schedule based on the predicted number of patients.
Object function: As shown in formula 22, the goal of this layer model is to minimize the average walking time of the patient.Because the patient's speed is replaced by the overall patient's moving speed, the objective function is equivalent to finding the minimum average moving distance of the patient during treatment.Compared with the waiting time, the patient's walking time is relatively small, but optimizing the patient's walking distance in the hospital can greatly improve the patient's mood and improve the quality of hospital services.

Constraints:
Since different departments use different medical equipment in outpatient clinics, the cost of changing outpatient functions will be very high.According to the requirements put forward by the hospital, it can be considered that each outpatient department can only be used by one department.Even if the department chooses a certain outpatient department to take a day off, other departments cannot occupy the clinic, as shown in the formula 23.
The number of outpatient clinics in different departments should be equal to the optimal outpatient number allocation of the previous MILP model, as shown in formula 24.
When optimizing the distribution of outpatient clinics based on historical data, when an outpatient clinic is used by a department once, the outpatient clinic will only belong to this department, as shown in formula 25.
When optimizing future personnel scheduling, BP j,k is used as a given parameter to constrain the future outpatient distribution, as shown in formula 26.

Dataset
This experiment used data from a hospital in Shenzhen, Guangdong Province, China to verify the feasibility of the model.The data used include the CAD structure of the hospital, the electronic payment records of the top ten diseases with the largest number of patients in the third quarter of 2019, and the number of doctors in each department.The experiment first counted the number of doctors in different departments in the hospital and the number of outpatient clinics, as shown in Fig. 3.This part of the data will be used as known parameters in the follow-up.
Subsequently, the CAD drawing of the hospital is processed, and the Manhattan Distance between the rooms is calculated by analyzing the structure drawing of the hospital, so as to obtain the distance matrix between the rooms in the hospital, as shown in Fig. 4.
The electronic payment slip contains the following items: laboratory fees, inspection fees, treatment fees, surgical fees, western medicine, Chinese herbal medicines, and Chinese patent medicines.According to the information provided by the relevant hospital personnel, the relationship between the payment record and the type of room can be obtained, as shown in the Table 1.
In this way, the electronic payment records in the data set are converted into a series of moving states between rooms.The distance matrix between the room and the room is obtained from the CAD drawing of the hospital in the data set, and is shown in Fig. 3.For each clinic, the model uses an enumeration method to find the best path for each medical route starting from the clinic, and records the weighted average path of all patients in the corresponding department.Figure 5 shows the weighted average path of different clinics in different departments.

Prediction
Date used in prediction is the daily patients number of the hospital during July to September, 2019.Green line in Fig. 6 shows the distribution of daily patients number along with time.Besides, the weather data comes from the Shenzhen Data Open Platform, which is a open source data platform provided by Shenzhen municipal government.The red triangle in Fig. 6 indicates the holiday date input to Prophet, including three specific holidays.
In the experiment, 60 days of historical patient data were used to train the model and predict the number of patients in different departments in the next 14 days.The data used for training has the following characteristics:  1. August 31 to September 1, these days are not an actual holiday, but the days that school reopens after summer vacation, which will let students and their parents much busier, hardly having time to go to the hospital if it is only a minor illness.The effects of the period are similar to the holiday, so we consider it as a holiday item.2. September 13 to September 15, the Mid-Autumn Festival in China.However, the impact of the Mid-Autumn Festival is very slight.Because in 2019, the three days are Friday, Saturday, Sunday respectively.The effects of the holidays are offset by the weekends.3. September 29.People have to work on that day to save a longer Nation Day holiday started on October 1.
The Random Forest model selected five features: daily rainfall, daily maximum temperature, yesterday's rainfall, yesterday's maximum temperature, and tomorrow's rainfall as the input of the random forest model after analysis.In subsequent experiments, the Prophet-RandomForest model was compared with other models, including LSTM and SVM model with Gaussian kernel.These two models are common time series forecasting models, and the effect is very good.They used one-hot encoding to describe elements such as weather types and holidays, and uses the same data as the Prophet-Randomforest model.The compared data is shown in Fig. 9 and Table 2.In the experiment, they compared their MSE, RMSE and MAPE.In the comparison, it can be seen that the Prophet-RandomForest model can achieve better results in predicting the number of patients in the future.

Optimization
The optimization model is optimized for the distribution of outpatient clinics in the outpatient building of a hospital in Shenzhen.The number of patients in the hospital is shown in the Fig. 10.Obviously, there are far more patients in internal medicine and gynecology than in surgery and dermatology.The original arrangement of the hospital outpatient department is shown in the Table 3.However, the number and distribution of outpatient clinics is unreasonable compared with the number of patients.After prediction and optimization, the new clinic distribution is shown in Table 3. Considering the number of doctors in different departments, it can be found that the original outpatient structure design of the hospital is unreasonable.Due to the small number of outpatient doctors, outpatient clinics in many departments are not fully utilized.After optimization, the new clinic arrangement frees up many redundant empty clinics for other purposes.
Regarding the data in the first 4 weeks of September as the test set, comparing the total time of the patient in the hospital in the original hospital and the predicted and optimized model, the results are shown in Table 4.
The optimized model uses Manhattan distance as the distance that patients move between different rooms during the process of seeing a doctor in the hospital.But in fact, due to the complex structure of the hospital and the uncertainty of the patient's movement trajectory, there is a certain gap between the actual movement distance and the Manhattan distance.In order to be able to more accurately simulate the movement trajectory of the patient in the hospital and estimate the average time for the patient to see the doctor, This experiment uses the social force model to simulate the trajectory of patients in the hospital.
Social force model proposed by (Helbing et al. 1995), based on multi-particle self-driven crowd simulation model.In real life, people often choose their own way of advancing based on their current environment and destination at the time, that is, direction and speed, and they will maintain a distance from others or obstacles in the process of advancing.This model uses social forces to describe how people are affected by the surrounding environment (obstacles, other  pedestrians).In this way, the patient will continuously adjust his current actual speed according to his current direction and the destination during the movement, so as to be closer to the actual situation.
In this experiment, a social force model is used to simulate the movement of people, and the trajectories of some patients are shown in the Fig. 11.It can be seen that the patient's trajectory is no longer a straight line, but a curve with a clear purpose.By counting the patient's movement time in the social force model, replacing the average movement time obtained by Manhattan distance and average movement speed, the model can obtain a value closer to reality.
In the overall moving speed statistics of the social force model before and after optimization, this experiment found that the walking time simulated by the simplified model and the social force model is not much different, and the optimized outpatient distribution is compared with the pre-optimized outpatient distribution.The overall walking time of the outpatient distribution has been reduced by 4%.

Discussion
In Prophet model, holiday effects are represented as a onehot vector, which makes every day during a specific holiday period has the same effect.But in real life, the first day and the last day of a holiday are often affected by people more than other days.Considering this may improve the performance of Prophet model.For Random forest, traffic conditions are also an important factor affecting the patients number.They have little effect on people's desire to go out, but can directly affect the number of people arriving at the hospital.Imagine that a patient plans to go to the hospital at 4 pm.Due to traffic congestion, he may not arrive at the hospital until 6 pm.At this time, most of the outpatient has already closed.

Conclusion
In prediction, we combine Prophet and Random forest to make a prediction.Comparing with the single Prophet model, Improving the prediction accuracy and interpretability of prediction result.
In terms of outpatient personnel scheduling optimization, we comprehensively consider the hospital building structure, the type, and quantity of patients, as well as the electronic payment bill, the number of doctors, and working hours.At the same time, we propose a simplified patient visit model, which can transform the patient's electronic bill data into the mobile path in the hospital.Then we construct a two-layer MILP model to solve the optimal distribution of outpatient number and location respectively and meet the arrangement of doctors at work.
In general, the new hospital outpatient layout can reduce the walking time and waiting time of patients in the hospital by about 9%, and save some spare outpatient services.
Defan Feng, Yu Mo and Zhiyao Tang are equal contributors.This work is partially supported by the Guangdong Provincial Key Laboratory (Grant No. 2020B121201001) and the Transnational Partnership for Excellent Research and Education in Big Data and Emergency Management Project funded by the Research Council of Norway (RCN), the Norwegian Centre for International Cooperation in Education (SiU) through the INTPART programme.

Fig. 3
Fig. 3 Number of doctors and rooms

Fig. 5 Fig. 6
Fig. 5 Distance matrix visualization Figures 7 and 8 show the predicted results.Figure 7 is based on the Prophet Model only and Fig. 8 is a combination of the the Prophet-Randomforest Model.They used the data from 7.1 to 9.12 to predict the number of patients from 9.13 to 9.26.As a result, in the prediction, Prophet without Random Forest residual regression has MAPE 9.08%.Using Random Forest residual regression will improve the MAPE to 7.14%, increased by 1.94%.Compared with the basic Prophet model, Prophet-Randomforest model has improved performance in both experiments.

Fig. 10
Fig. 10 Number of patients

Fig. 11
Fig. 11 Part of the Patient Trajectory Simulated by the Social Force Model This article utilized a two-layer MILP model to optimize the schedules.The two-layer MILP model can consider the conditions of hospitals and patients separately, and compared to the MINLP model, the two-layer MILP model can converge faster, which improves the practicability of the model.
• This article considers the number of patients, time and weather in the prediction model.The data shows that continuous weather changes will also affect the number of patients in the next few days.After considering the continuous changes in weather and the influence of time on the number of patients of different types, the predic-tion model can obtain a more accurate number of future patients of each type.•

Table 1
The relationship between payment records and the treatment process received by the patient

Table 3
Original arrangement of the outpatient

Table 4
The total time the patient stays in the hospital each week