Towards a fully personalized food recommendation tool

We present1 a personalized ingredient-based Deep Learning recommender on the food domain that exploits ingredients and nutrition information to create recipe representations and propose to every user a more personalized and healthier meal. The recommender will be a critical component in our Meal Prediction Tool (MPT) designed with a focus on the personalization of services, increasing business efficiency and sustainability in the hospitality, restaurant and catering (HoReCa) industry.


INTRODUCTION
Social media like TripAdvisor and Yelp are actively changing the global dining landscape, offering the possibility for consumers to interact online with data; social users can generate ratings, recommendations or upload a variety of photos and location data. The exploitation of such data now enters into the mainstream of HoReCa industry, since data-driven services make it possible to predict the needs of diverse and demanding consumer audiences influenced by rapidly evolving food trends and consumption patterns. Analyzing such trends and patterns can help both food consumers and providers in making informed decisions on their activities (e.g. design personalized meal recommenders, empower businesses to consider various types of external influences). At the same time, detailed analysis of recipes and cookbooks available online reveals a wealth of data that may lead to a number of new cultural correlations and findings. Such data can be exploited in order to integrate social factors into the design of food supply chains, putting emphasis on blendings of flavours and popular ingredients, perceptions of different tastes, food identities or healthiness factors. This paper presents a personalized ingredientbased Deep Learning recommender on the recipe domain designed to propose to each user a more personalized and healthier meal based on ingredients and nutrition information (Section 3). The recommender will be a critical component in our MPT 2 developed in the context of the CAPSELLA 3 project (Section 2). The integration of the recommender will offer the additional possibility to filter the MPT personalized recommendations based on criteria such as food healthiness, considering new aspects previously not addressed by mainstream food services.

THE CAPSELLA MEAL PREDICTION TOOL
The MPT combines different sets of data coming from actual consumers who are searching for dining options. This data provides insights on the demographics of demand and locations of where these searches are taking place around a city. By sharing data on their specific demands such as cuisine preferences (e.g. Chinese or vegetarian), occasion (e.g. breakfast, lunch), and specific meals (i.e. by submitting queries in the respective user interface), consumers receive personalized recommendations on different restaurants which match their food preferences. The results are ranked based on the proximity of a restaurant to the user location and sentiment ratings (Fig. 1).

Figure 1: Meal Prediction Tool Customer
Interface. This data then becomes useful for restaurant owners, who now have a way to predict demand, identify food trends in real time and understand the demographics of their audiences. This user generated data is then combined with other data coming from restaurant menus and social media, which are further analysed to deliver enhanced recommendations that are accurate, personalized and on real time.

Related Work
Recommendation of recipes can be tackled by breaking recipes to ingredients and scoring them based on the ingredients that a user has rated positively [1], or by taking also into account the ingredients that a user dislikes [2]. The use of ingredient complements and networks of substitutes that identify which ingredients fit well together and which can be substituted has been proposed [3] as a means to get accurate results. Other approaches [4][5] focus on recommending not only desired recipes, but also healthy ones. For example, the authors in [5] apply a post-filtering approach and re-rank the recommendations according to their healthiness scores. Given that user preferences variations are mainly context-dependent, in this paper we propose a contentoriented approach that exploits ingredients as well as nutrition information in order to create recipe embeddings, and to propose to each user a more personalized and healthier meal.

Data Collection
Our dataset encompasses 40.225 recipes along with the respective metadata (i.e. recipe name, description, category, preparation steps, ingredients list, nutrition information, user reviews & ratings) collected from allrecipes.com using a standard webcrawler. To determine the healthiness of a recipe we followed the international standards introduced by the Food Standards Agency (FSA 4 ) [6], and we calculated a score for each of the following nutrients: fat, saturated fat, sugar and sodium measured in 100g of each recipe. The FSA score relates only to these four macronutrients. The scale is green for "healthy", amber for "less healthy" and red for "unhealthy". We derived a single metric following the work of [7] that first assigns an integer value to each color (green=1, amber=2, red=3), and then sums up the scores so as to derive a total score for the recipe. The final score can vary from 4 (very healthy recipe) to 12 (very unhealthy recipe).

Methodology
We implemented a Paragraph Vector method [8] to identify similar recipes based on recipe embeddings similarity. We used an embedding size of 50 and the distributed Bag of Words (PV-DBOW) model instead of the distributed memory model (PV-DM), since it requires less data storage. The PV-DBOW model uses as input paragraph vectors and tries to predict other words in the paragraph. To get the recipe embeddings we used vectors with processed ingredients data (i.e. after removing blacklist words like cup, kg, etc.) as input for the Doc2Vec model, and for each user's preferred recipe we got the n most similar recipes (Fig 2). In this way, we created a list of recipes similar to the recipes rated by every user. We did the assumption that since a user has rated a recipe, she also likes its ingredients. Finally, we sorted the final recipe list on similarity index and FSA score and recommended the top-k recipes. To validate the recommendation results, we have used a 5-fold cross validation and the MAP@k performance metric. Our task was to predict the top-k recipes that each user would probably rate higher. At second phase B we set n=2 (i.e. for each recipe a user has rated we selected the 2 most similar). We report the results after sorting on similarity index as well as on FSA Score. The mean FSA Scores for the top 5, 10 and 15 recommendations are also reported ( Table 1). As expected, if we sort on similarity index after step B we get better accuracies and worst FSA Scores.

FUTURE WORK
Future work will focus on the development of a fully personalized meal recommendation engine that will consider user preferences, nutritional needs or restrictions, allergy information and cooking habits in order to make a more robust recommendation.