Walk the line: Toward an efficient user model for recommendations in museums

Contrary to many application domains, recommending items within a museum is not only a question of preferences. Of course, the visitors expect suggestions that are likely to interest or please them. However, additional factors should be taken into account. Recent works use the visiting styles [1] or the shortest distance between items [2] to adapt the list of recommendations. But, as far as we know, no model of the literature aims at inferring in real time an holistic user model which includes variables such as the crowd tolerance, the distance tolerance, the expected user control, the fatigue, the congestion points, etc. As a work-in-progress, we propose a new representation model which includes psychological, physical and social variables so as to increase user satisfaction and enjoyment. We show how we can infer these characteristics from the user observations (geolocalization over time, moving speed, ...) and we discuss how we can use them jointly for a sequence recommendation purpose. This work is still in an early stage of development and remains more theoretical than experimental.


I. INTRODUCTION
"I keep a close watch on this heart of mine, I keep my eyes wide open all the time". 1 These lyrics from the famous folk singer Johnny Cash could fit with a museum visiting experience, since art is often a question of discovery, preferences, and emotions.However, the line to follow within this museum does not necessarily have to be straight and similar for every user.In this paper, we propose a new way to model users from their behaviors, so as to recommend them enjoyable and thought-worthy exhibits through efficient, alternative and/or surprising paths within a museum.Nowadays, mobile devices offer everyone an easy access to a huge amount of information and new possibilities of interaction.These devices are particularly relevant for tourism, by enriching the user experience (augmented reality, serious games) and by providing additional historical information to visitors in cities or museums [1].However, this opportunity comes with two major issues: (i) the sheer amount of data available is way beyond the human capability to process it, 1 Johnny Cash.I Walk the Line.9th track of the album "With His Hot and Blue Guitar", 1956.and (ii) the context in which the person belongs is primordial to really understand his/her needs.
For more than 20 years, researchers address the first issue by conceiving recommender systems, whose goal is to narrow the scope of the available information to a human mind understanding level.To do so, the system must have a relative understanding of the user preferences to be able to conjecture about his expectations.This leads to the second issue on the context, which is an equally discussed field of research.Maybe more than everywhere else, taking the context into account is a key part of a relevant recommender system for tourism.In addition, such a context is dynamic and depends on the situation and the visitor.For example, a virtual guide must recommend the best monuments or exhibits for a visitor according to both his preferences and his physical localization in order to optimize his quality of experience and his path.
These two issues are often seen as the faces of the same problem.For this reason, location-aware recommender system (LARS) usually rely on the active user's preference model and the user context to come up with recommendations likely to interest him/her [3], [4].In the frame of museums and physical spaces, we argue that the user experience can be improved if we consider the path recommendation as an aggregation problem: in addition to user preferences and context, the recommended items should depend on psychological, physical and social variables, and on some specific constraints.We will thus propose a new representation model of users in LARS, and show how it is possible to infer this model from classical observations such as geolocalization over time.
This work takes place in the Horizon 2020 European Union program which funds a variety of interdisciplinary research and innovation for the economical and social challenges we are facing.More specifically, we are working as a part of the CrossCult project 2 which is a three-year EU-funded research project started in mars 2016 and composed of 11 European Institutions and 14 associated partners.In this project, we are given the opportunity to work on a large scale with some museums and cities like the London National Gallery or the Luxembourg City.Those places have hundreds if not thousands of points of interest, hence the importance of a recommender system capable of giving to a visitor the best sample of those points of interest.There are also some archaeological sites from our partners, where the visitors can have a direct experience of ancient contexts.This paper is organized as follows: Section II presents a brief overview of the literature as regards the context-aware and location-aware recommender systems in physical spaces.Section III describes our proposal of user model, including the formalism we adopted and our global recommendation architecture.Section IV characterizes how we plan to infer the characteristics of the user and Section V provides the guidelines concerning our future works on this project.

A. Research context
The goal of the CrossCult project is to spur a change in the way European citizens explore, reflect and interpret their common History by asking them to (re-) interpret what they may have learnt, in the light of cross-border connections among historic sources, cultural venues and other citizens' viewpoints.To this aim, the project focuses on three axes, all facilitated by technology and mobile apps: (i) building an extensive knowledge base that makes the connections explicit across an unrestricted set of repositories of digital cultural heritage resources, based on knowledge modeling and semantic reasoning; (ii) creating a technological platform to support the creation of interactive experiences for individuals and groups, who may be in one venue or in several interconnected ones; and (iii) through personalization and content adaptation, creating narratives for the interactive experiences that maximize situational curiosity and serendipitous learning, taking into account the cognitive/emotional profiles of the participants as well as temporal, spatial or other miscellaneous contextual elements.
This context leads naturally to focus on personalized recommendations in physical spaces, of not only points of interests to see, but of sequences of such POIs.CrossCult targets in particular the development of new approaches to such personalized path recommendations, combining trajectory mining with recommendation techniques (mainly knowledgebased, content-based and collaborative filtering).

B. Recommendations in a physical space
Every recommender systems rely on a 4-step process in order to provide the active user with interesting items, regardless of the machine learning techniques used [5].First, they collect raw interaction data, also called observations.Second, they use these data to infer a high-level abstraction of the active user whose representation is called user model.Third, they compute adapted recommendations in accordance with the active user's preferences and expectations, from what they have learnt in the user model.At last, they have to propose these recommendations at the right time and in the good manner through the interface.
Depending on the recommendation algorithm, the third step may require additional entries, such as a knowledge base or an ontology-based representation of the context.Integrating the context into the recommendation process is an increasing research field known as CARS, acronym for Context Aware Recommender Systems.In their state of-the-art, Adomavicius et al. present several approaches like contextual modeling, pre/post filtering method for using contextual factors in order to adapt recommendation to the user's context [3].However, until recently, very few recommenders were considering spatial properties of users nor items.This led to the emergence of a sub-family of CARS, known as Location-Aware Recommender Systems (LARS) [6].
Since the democratization of PDA devices and smartphones along with ubiquitous internet, there is an increasing need for location-aware applications capable of intelligent and personalized services.In the early 2000s, Cheverst et al. [7] built a mobile guide system named GUIDE whose goal is to help Lancaster visitors to efficiently find their way through the city and its points of interest.To do so, the system takes into account contextual information like the localization of the visitor, the hour and the date of the day, the schedule of the visits, the opening and closing times of attractions, etc.The system offers to the visitor several functionalities among which the possibility to get more information about a monument, or to create his own route.Even if the GUIDE system did not include any sort of recommendations, it paved the way to contextual recommender system in physical spaces.Since then, more and more projects have been developed, benefiting from the technological progress and the ubiquitous internet revolution.Chou et al. [8] proposed a context-aware museum tour guide based on a semantic web framework.While not describing how the recommender module works, the final application plan to offer personalized recommendations based on contextual information (visitor interests, exhibits already seen, current location, time available).
The issue of smart routing in museums and the use of personalized recommendations to guide visitors, targeting an increase of Quality of Experience, has been investigated in many papers and summarized in [9].Grieser et al. [10] introduce a content-based approach, which relies on the set of exhibits already seen and the textual data related to each of these exhibits (language-based conceptual model) to compute recommendations.On the opposite, Bohnert et al. [11] consider a numeric approach by exploiting the user ratings on each exhibit in a collaborative way.In these 2 approaches, the concept of path is not present and recommendations offer very few diversity.This notion of path is discussed by Van Hage et al. [2], who propose to adapt the route of the active visitor according to 3 dimensions: the critiques of the visitor ("give me more of that"), the time constraints (how much time the user is ready to spent in the museum), and the physical distance between items since their system aim at providing the shortest path.
Also in order to adapt the visitor path, several works take an interest in cognitive and behavioral user characteristics.Naudet et al. [12] propose a recommender system for museum guidance exploiting in particular the user cognitive style, in addition to their interests.In this context the authors define a visitor model, defined as a tuple including a cognitive profile, personal interests, personal profile, location, activity and time constraints.The recommender itself computes the sequence of exhibits that best matches each visitor, including actions that are suggested.Coupled with a Facebook game allowing to retrieve interests and cognitive styles, it has been implemented as a mobile application and experimented in a museum in Athens (Greece) where it received good feedback from visitors.Lykourentzou et al. [1] introduce the use of visiting style, representing the way visitors tend to behave and move in a museum [13], so as to handle crowd management in a museum considering personalization and constraints due to the environment.The authors developed an agent-based crowd simulator for museums, taking into account the visiting style together with other parameters linked to the visitors (interest per exhibit, available time, walking speed and time spent per exhibit) and to the museum (number of rooms, exhibits per room, distance per exhibit, distance between rooms, exhibit crowd limit).They show in simulation the benefit of a personalized recommendation path built step by step (recommendations suggest the next exhibit at each step), using a theoretical QoE function of the number of seen and missed exhibits of interest and walking time.

C. Discussion
Although recent works and experiments have shown the interest and feasibility of visit personalization, they all target specific cases and focus on the known preferences of each visitor, taking care only very partially of their characteristics and the environment in which they evolve.As an example, the congestion points within the museum can strongly degrade the quality of the user experience if he/she has a low crowd tolerance.In this context, the best path is not always the shortest, since some users are willing to travel more distance and see more items to avoid congestion points.Similarly, it is reasonable to consider that users do not retain the same visiting style throughout their visit.They can act like ants at the beginning of the visit (by observing all exhibits of the museum and walking close to exhibits) and, because they run out of time or have an increasing level of fatigue, opt for the grasshopper visiting style (spending a long time to see selected exhibits, but ignoring the rest of exhibits). 3onsequently, we argue that recommending paths in physical spaces (and particularly in museums) requires a better understanding and integration of variables which could explain users' decisions and maximize their satisfaction.In the next section, we propose a more generic and holistic way to model the active user and its environment, so as to address the issues mentioned above.Even if our user model representation is more complex, we will show in Section IV that it is possible to infer it from classical user observations.We will also briefly discuss in Sections IV-B and IV-C about ways to jointly consider all these variables to provide recommendations that offer the best compromise.

A. Global Architecture
The architecture of our path recommender system is shown in Figure 1.Just like many real world systems, our model is a loop, meaning that each step will be repeated as long as new information (or feedback) enter the system.The "Observations" module could however serves as a starting point to describe how the whole architecture functions.As its name implies, this module will collect all the observations available.The taxonomy of these observations and the data collection process is described in Section III-B.
All these pieces of information will then be used in the profiling engine to infer a personalized user model whose structure is described in Section III-C.The main goal of this profiling engine is to extract user characteristics from the observations.It will rely on usage mining techniques and deep learning to build a representation of the active user's internal state as exhaustive as possible.The "Inference" module inside the profiling engine is detailed in Section IV-A.An evaluation module inside our profiling engine will confront predicted characteristics in the user model with the user's feedback and a posteriori exploratory behaviors (which are both part of the observations) to (i) remove noise, inconsistencies and false predictions in the user model, and (ii) train the parameters of our inference module in real time so as to improve its performances.
Once the user model initialized 4 , it will be used for recommendation purpose.The system will also need to extract knowledge about items from the museum database and/or from the web (i.e.processing a large volume of data in order to discover knowledge units that are significant and reusable).Additionally, we will include information about the user context (position of items, opening and closing times, hour and date, map of the museum, global traffic data, exhibit crowd limit, . . .).At last, some external constraints could be taken into account, like the historical scenarios for example, since one goal of the CrossCult project is to facilitate the reinterpretation of the History by visitors under the supervision of historians.We thus plan to interleave recommendations from historians with the user-centered recommendations in a coherent sequence.Once more, the quality of recommenda-tions will be constantly monitored by an evaluation module whose goal is to optimize the aggregation process within this multi-dimensional representation space.The output of our path recommendation can be a single recommendation of exhibit at each time step, a set of good alternatives (if the visitor wants to maintain a high user control) or, more likely, a sequence of recommendations since we would like to monitor in real time the progressiveness and the relevancy of the path relatively to user characteristics and contextual factors.Discussion on how to recommend a path within a museum is provided in Section IV-B.Let us notice than, within the frame of the CrossCult project, one objective is to emphasize inter-connections between partner museums.Thus, most of recommended items will be physically present in the visited museum, but some recommendations can be "virtual" (i.e.accessible through the application, but exhibited in another museum).The virtual recommendations will be integrated within the recommended path, will have no cost as regards the required traveling distance, but a cost as regards the system intrusiveness (see Section IV).

B. Observations
The observations constitute a retranscription of all possible events and explicit feedback inside the museum.These events can be related to a single user, a group of users, or the whole population of visitors.
We can distinguish two kinds of observations on a single user: those who are related to user preferences, and those which concern his/her localization.Elicitating preferences in a physical space is more complex than in traditional recommender system.Of course, users still have the possibility to browse the catalog of items through the application so as to explicitly express preferences about items or categories of items 5 (like/dislike, add to favorites, rate, write a tag/opinion, select an emotion on the Abraham-Hicks emotional guidance scale, . . .).However, these interactions are very timeconsuming and can affect the user experience if we request too many user feedback.We then expect a lot of missing data as regards the explicit preferences.Rather than systematically browsing the whole catalog of items, the system will initialize the observation set (and thus the user preference model) by asking visitors to select preferred items or categories in a small list, like Netflix6 does after subscription.This list could be a representative subset of the themes present in the museum.We will also combine the use of the visitors' smartphone camera with image recognition algorithms to make them rate items in a click.User preferences will also be implicitly inferred from their traveled paths (see Section IV).Finally, users will be able to provide a global rating to express their satisfaction as regards to their visit in the museum.
In addition to user explicit preferences, we will have observations about the path traveled by each visitor.Depending on the technology used to gather the localization data (beacon, RFID, Bluetooth, wifi) 7 , we will have a certain degree of accuracy concerning the real position and direction of the visitor (continuous modelling over time or waypoints, estimated position error).To be as adaptable and generic as possible, we propose to take into account this positioning uncertainty in our model as a numerical factor u ∈ [0; 1] with 0 meaning that the localization data are not precise enough to determine which points of interest are seen by the active visitor, and 1 meaning that the data are precise enough (including the sense of direction) to discriminate the list of every single point of interest noticed by the active visitor in the museum.
A point of interest is a location where a remarkable entity is present.In an art museum, such an entity could obviously be an exhibit, but it could also be the museum reception, a resting area, a cafeteria and so on.In order to also take into account the sequences where the visitor is not near any defined point of interest, a special label will be used (for example a point of interest named "nothing").Given a visitor v, we note his/her geographic positions from the start of his visit to time t: GP v = {gp v,1 , . . ., gp v,t }.The location of all points of interest is called P OI.The traveled path of v corresponds to the following ordered set: path v = {(gp v,1 , P r(poi 1 |gp v,1 , u), d gpv,1 , r gpv,1 ), . . ., (gp v,t , P r(poi t |gp v,t , u), d gpv,t , r gpv,t )} with d gpv1 standing for the duration during which the visitor v stood at the position gp 1 , P r(poi 1 |gp v,1 , u) the probability that gp v,1 corresponds to the point of interest poi 1 according to the positioning uncertainty u (only the more likely POI is displayed here, but we could easily have a list of all the possible POIs too), and r the room to which this location belongs.From this ordered set we will be able to compute a variety of different metrics capable of depicting the context of the user.We will present those metrics in Section IV.
From these individual localization data, we can deduce additional information such as the belonging of each visitor to a group, the crowd density, the number of visitors per room, the average traveled distance and speed, etc.If several visitors follow the same path from the beginning, they probably belong to the same group, and we could provide group recommendations rather than individual ones, so as to be sure that they stay together during the visit.

C. User Model
The user model is made of explicit data and implicit user characteristics.Among the explicit data, we can directly import/copy the explicit preferences stated as observations (see Section above), assuming that the active user has a good knowledge about himself/herself.We can also include some demographic data (age, gender) and the global available time for the visit.
The implicit user characteristics includes the visiting style, the fatigue, the implicit preferences, the distance tolerance, the crowd tolerance, the precision tolerance, the system intrusiveness tolerance, and the user control.These characteristics will be inferred thanks to usage mining techniques, as explained in Section IV.Let us note, as part of a recently started research project, this inference process is just a first proposal to prove the feasibility of such a user model.Once implemented, this first proposal will serve as a proof-of-concept, while we build a sufficient training dataset containing both observations and global evaluations of users' satisfaction relatively to their visits.We then plan to apply deep learning techniques to this training dataset, so as to create a better abstraction and a more complete user model.

A. Implicit profiling
To get the user characteristics unavailable via the explicit feedback, our model has to be able to infer them from the observations left by the user.As explained above, we will mainly use the information about the paths traveled by users (positions and speeds collected for each user as many times as possible).
visiting style -From localization data over time, we will be able to compute the following metrics: AvT as the average time spent at each geographic position in seconds; Completeness as the percentage of exhibits seen by the active visitor, and Order ∈ [0; 1] as a score to determine if the visitor follows the natural order of visit in the museum.We plan to use these 3 metrics, as proposed by Kuflik et al. [14], to discover the current visiting style of each visitor.The only difference is that we will base our classification on a short history of locations (rather than the whole traveled path), and we will check if the active user's visiting style is changing over time.The visiting style of a user will affect the number of items our system should suggest per room.
fatigue -This characteristic will be inferred from the transition of visiting styles (fatigue can be detected¿ when visitors move from ants or butterflies to fishes or grasshoppers), and/or from the evolution of the duration of each geographic position d v,gp .If we note d v (t) the discrete function that gives the standing duration on each point of interest over time, then the fatigue occurs when the derived values dv (t)  dt decrease for a certain amount of time.The threshold at which the visitor will be considered tired by the system is not yet defined but will be the subject of futures experimentations.The system will adapt the number of remaining recommendations within the path to this fatigue metrics.implicit preferences -We can take inspiration of the implicit preference modeling function in [5].Let's suppose that we want to infer the preference of the active visitor v for the item poi.From the information about the active user's traveled path, we can compute a set of normalized criteria such as the time spent in front of poi, the visiting frequency (v can visit several times the same item), or the recency of visit (assuming that there is a progressiveness in the traveled path and that we should favor items consulted recently to compute recommendations).We can then compute the estimated preference for this item as a weighted sum of the different implicit criteria.t = Dist t , then it means that the active user's traveling speed is similar to other visitors.Therefore T v distance will be equal to 0.5.If we have access to more information about the visitor (e.g.his/her age and some eventual physical disabilities), we could adapt the threshold by only taking people belonging to the same subset, or normalizing the distance tolerance according to the average traveling speed per age. .A visitor with a high distance tolerance and low crowd tolerance may accept a longer alternative path to avoid congestion points inside the museum.
precision tolerance -T v precision ∈ [0; 1] where 0 means the visitor gives positive feedback to recommendation of items that maximize the relative diversity in comparison with his/her explicit known preferences [15], and 1 means the visitor gives positive feedback to recommendation of items very similar to his/her past history.system intrusiveness tolerance -T v intrusiveness ∈ [0; 1] where 0 means the visitor does not tolerate any kind of interaction/request from the system and 1 means the opposite.This metrics will impact the number the number and the frequency of recommendations.user control -The acceptance and adoption rates of each visitor regarding our recommendations will help us to infer their expected level of control [15].High rates mean that they are willing to follow our recommendations.On the opposite, the system should provide some alternatives of recommendations through the interface, so as to let the user decide of the next step.
In order to be as adaptable as possible, our model will not be given static thresholds to compute the tolerance metrics.As a part of the Crosscult project, our system could be used in different museums.In this case, the signification of a "crowded room" may vary a lot according to the size of the museums themselves.The same applies to the distance and precision metrics.Thus, dynamic thresholds will be computed for each physical space where the system will be deployed.
We hypothesize that each of these user characteristics is highly correlated with the visitor's quality of experience and global satisfaction.The implementation of our user model and future studies will help us to confirm these assumptions.

B. Path recommendation
As the algorithm entries are from many different forms, we will rely on various recommendation techniques (contentbased, knowledge-based, discovery-based, diversity-based and collaborative filtering).The recommendation process is then, at the same time, an hybridization problem and a sequencing problem.Indeed, a route within a museum can be seen as a probabilistic graphical model.At each step, the system knows the path traveled by the active user until now and measures the transition probabilities toward possible future points of interest.Depending on the quality of the localization data (discrete or continuous, fully or partially observable), the nature of paths (cyclic or acyclic), and on the hypothesis of memorylessness (the probability distribution of the next state only depending on the current state, or not), the sequence of recommendations can take the form of an automata, a bayesian network, a Markov chain, or an Hidden Markov Model.
The probability of transitions between the current state and the possible alternatives of recommendations (or recommendation paths) will be determined by the user preferences (implicit and explicit), the inferred user characteristics, the user context and historians' constraints (who provide interesting stories to entertain and educate visitors).These factors will impact the content and size of the recommended sequence, and the frequency of recommendations.

C. Evaluation framework
Every recommender system needs a framework to evaluate the past recommendations in order to improve the future ones.There exists many scientific works in the literature about evaluating recommender systems [16].However, as systems mainly produce only one recommendation at a time, most of the available metrics evaluate the performance at each time step.We plan to develop a generic metrics capable of evaluating a whole sequence of recommendations as regards the end-user satisfaction.Of course, the reasons which explain the satisfaction may vary from one user to another.As an example, some of them may be satisfied if they do not miss the items corresponding to their highest preferences, while others may accept a lower average precision if they avoid congestion points or long distances.We will thus propose a multi-metric evaluation approach that can incorporate all of these factors and give them the appropriate weight according to the user and his context.

V. CONCLUSION AND PERSPECTIVES
The CrossCult project aims at helping real users to find their way through major cities and museums while giving them the best visiting experience as regards their personal preferences and characteristics.In order to satisfy these needs, we described in this article a multidimensional user model and recommender architecture capable of taking into account all the crucial contextual features needed to produce suitable recommendations.This proposition of formalism is the first step of an on-going research.
In order to experience the validity of our theoretical model, we plan to collect observations, and build a training dataset (with localization data and preferences of as many visitors as possible, along with additional feedback at the end of the visit -e.g. a questionnaire will be given at the end of the visit for the visitors who agreed to participate to the experience) with our partner museums.This should allow us to confront the findings of our profiling and recommending engines with what the visitors really experienced.

Fig. 1 .
Fig. 1.Global architecture of our system distance tolerance -T v distance ∈ [0; 1] where 0 means the visitor has very low traveling speed and traveled distance, and 1 means the opposite.To compute this metrics, we propose to compare the traveled distance Dist v t of the active visitor v at the instant t of his visit with the traveled distance of others visitors at the same instant.Depending on the attendance level of the museum, we will either use the set of every visitors or the subset of the visitors entered at the same time that our active visitor.To effectively compare those two values, we propose to use the mean Dist t = deviation σ of the traveled distance distribution of other visitors.If Dist v crowd tolerance -T v crowd ∈ [0; 1] where 0 means the visitor avoid as much as possible the crowded areas, and 1 means the opposite.We propose to compute this metrics similarly to the distance tolerance by comparing the average local crowd density Dens v around the active visitor v with the average of the crowd density around every other n visitors Dens =