A comprehensive mechanism for hotel recommendation to achieve personalized search engine

. Search engines are as important as recommender systems for hotel selections. However, the recommended lists of search engines are usually non-personalized and low accuracy. In order to deal with these issues in search engines, a comprehensive mechanism for hotel recommendation is proposed. In this mechanism, we consider users’ personalized preferences by identifying users’ attributes about interest, trust and consumption capacity. Meanwhile, the quantiﬁcation method for each attribute is presented by using fuzzy theory. Moreover, this paper improves the method to evaluate the hotel, which respects to the criteria price, rating, and online review by using fuzzy theory. In addition, this proposed approach uses TOPSIS, a classical multi-criteria decision making method, to improve the accuracy further. Finally, a case study is conducted based on Tripadvisor.com to illustrate the validity of the proposed method for hotel recommendation in search engines. The results of the case study indicate that it not only solves the problem of non-personalization, but also improves the accuracy in search engine.


Introduction
The development of electronic-commerce (ecommerce) and social media has changed users' styles for booking hotels. Tourists are accustomed to booking hotels in advance on e-commerce platforms, such as HolidayInn.com, ctrip.com and so on. However, the large quantity, varied prices and ragged quality of hotels make it is difficult for users to find satisfactory hotels. Recommender systems can filter the vast amount of information in networks to assist consumer in making the best choices [1][2][3][4][5][6]. Besides the recommender systems, search engines are the other common tools to filter information and recom-is often slight. In the circumstances, it is difficult to distinguish the quality of hotels [7] on the basis of average ratings, which in turn causes low accuracy of the recommendations for search engines. In this study, we will propose a new method to deal with the ratings and a comprehensive hotel recommendation method to improve the performance of search engines.
The basic framework of consumer's purchasing decision-making in the context of online shopping is same with that of off-line way. The decisionmaking process of items selection is the most critical stage in the process of consumer's online shopping [8]. At this stage, consumers need to compare, analyze and evaluate the selected items according to the purchase criteria, so that they can make a judgment of the quality of the items, to generate the final evaluation. The researches of the current recommendation methods often focus on the demand evoking part, which ignore the process of consumer decision. With respect to search engines, consumers provide their demands directly by entering keywords. Therefore, there is a relatively low requirement on identifying the demands of consumers. In addition, decision-making process of items selection is more important in the methods for search engines than in the traditional recommendation methods. In this study, we will ameliorate search engines by researching decision-making process of items selection.
The decision-making process of items selection is influenced by many factors for consumers [9][10][11]. The rating toward item is the most used criterion for item selection [12,13]. However, most studies use only a comprehensive score, which results in a serious loss of information [14,15]. In recent years, with the development of text analysis technology, several researches have combined ratings and online reviews to select items [16,17].
As online reviews are in the form of text, which contain more information than ratings, more and more scholars have studied the impact of online reviews and applied them to improve the performance of recommender systems [18][19][20][21]. Most of them utilize online reviews to identify users' or items' preferences by identifying keywords in online reviews, then use these keywords as the properties of items or users [22,23]. Some other researches use online reviews to evaluate items [24]. Zhang et al. [25] proposed a methods to quantify online reviews by using neutrosophic theory. In their study, they did not translate online reviews into the neutrosophic numbers actually, but translated ratings into interval-valued neutrosophic numbers instead. In this study, on the basis of research in [25], we will use the sentiment analysis technology to transform online reviews into single valued neutrosophic numbers (SVNNs).
Apart from ratings and online reviews, it has been proved that price is a pivotal factor that influences consumers' decision-making [26][27][28][29][30]. Few studies have applied the price in product or hotel recommendation. Largely because of regional economy and personal incomes, different consumers may have different opinions on the question whether the price of a hotel is expensive or not. The result is that it cannot directly rely on the value of the price to evaluate or recommend hotel. Besides, due to the impact of various factors, such as social position, vanity and income, different consumers have different preferences for products' prices. One point to be sure is that, for the same consumer, his consumption capacity will be consistent for a long time. The higher the similarity between the hotel's price and customer's consumption capacity, the greater the likelihood for the hotel to be booked by the customer.
Based on the researches outlined above, in the decision-making process of item selection, we will combine the impact of rating, online reviews, as well as price to improve the accuracy of search engine.
In the hotel recommendation field, some studies are devoted to the excavation of hotel evaluation rules. Yu and Chang [31] designed a hotel evaluation rule with five criteria including distance, traveler preference, room rate, facilities, and rating these five criteria. Levi et al. [32] mined hotel reviews to determine the importance of each criterion of hotels. Some literatures designed personalized hotel recommendation methods by giving different weights for different criteria based on the target user's personalized preference [33,34]. Nevertheless, most existing researches give the same weights for all users who have comment on the hotels. Actually, for a target user, the ratings and online reviews of similar users have more influence than those of dissimilar users. Therefore, in this study, we need to mine users' preferences, to identify similar group, to adjust the weights of groups with different similarities to target user in terms of ratings and online reviews, and to deal with the problem of non-personalization in search engine.
Fuzzy tools have been used to model uncertain and vague preferences in recommender systems [34,35]. Usually, the fuzzy sets used in different studies are different because of the different data forms and recommendation strategies. Combining the features of the collected data and the feature of each fuzzy set, this paper will use the appropriate fuzzy set to quantify prices, ratings, and online reviews. The form of information about each criterion to evaluate hotels is different, so the method of quantification varies considerably with criteria. For this reason, aggregating the criterion evaluation values directly to sort hotels is not efficient. We could rank hotels and recommend them by using the TOPSIS (Technique for Order Preference by Similarity to Solution) method, to solve the multi-criteria decision making problems [36][37][38][39].
The remainder of this paper is organized as follows. In Section 2, we briefly review some basic concepts related to this study. Section 3 proposes a comprehensive mechanism for hotel recommendation. To verify the feasibility of the method, Section 4 conducts a case study using data from TripAdvisor.com. Finally, Section 5 concludes the study and suggests directions for future research.

Basic concepts
In this section, we briefly review the definitions of the intuitionistic fuzzy set (IFS), the single valued neutrosophic set (SVNS), and the hesitant fuzzy linguistic term set (HFLS), as well as their operations and distance methods. These definitions will be used in the proposed hotel recommendation method. Definition 1. [40] Let X be a nonempty classical set, X = {x 1 , x 2 , . . . , x n }, the intuitionistic fuzzy set defined on X is represented as:

Definition 2. [40] Let
, v B (x) |x ∈ X } be two IFSs on X, the operations of A and B are defined as: |x ∈ X } are two IFSs on X, then the normalized Euclidean distance between A and B can be defined as:

Definition 4.
[42] Let X be a universe of discourse, then single valued neutrosophic set a in X is defined as: Definition 5. [43] Let A and B be two SVNNs, for any x ∈ X, there are some operations, and defined as: Definition 6. [44] Let A and B be two SVNNs, then the Euclidean distance between A and B is defined as : when the numbers of elements in two sets are different, complement the set with fewer elements.

A comprehensive mechanism for hotel recommendation
In this section, we propose a comprehensive hotel recommendation method to improve the performance of search engine. The framework of the proposed method is shown as Fig. 1.
The details of the proposed method will be expatiated in the rest of this section.

The quantification of user attribute
In a socialized business environment, users' data have increased exponentially. User's purchasing records, browsing records, ratings, online reviews, tags and other data, to some extent, reflect user's interest, trust, consumption capacity and other personalized information. How to utilize massive unstructured information, to mine user's attribute information accurately and effectively is one of the key issues in this study. In this part, we will introduce the quantification methods for user's attributes in detail.

User interest.
User interest directly reflects consumer's purchase preference. Mining users' interest is one of the indispensable steps to provide users with accurate recommendations. This study will extract user interest attribute from user's purchasing records and online reviews.
The steps of interest recognition are as follows.
(i) For user u, gather all titles of hotels user u stayed and user's online reviews to form a long text, denoted as T u . (ii) For each long text, eliminate irrelevant words such as stop words and so on, and carry out word segmentation and word frequency statistics. Then, classify the high-frequency words. Because the same interest can be expressed in different ways, it is necessary to sort out the high-frequency words further: classify synonyms into one category, and use each category vocabulary as a feature of user u's interest, the feature of user's interest denoting as f u . users, whose purchasing and comment behaviors are irregular and untruthfulness. The online reviews and ratings of these users are unconvincing. Their trust degrees are low, and the recommendations based on the historic records of them are inaccurate. The lower the trust of user, the lower the reliability of the user, and the lower the accuracy of the recommendation based on his historic records. Therefore, the trust of user is also important to the identification of similar group, and the users in similar group are with a high degree of trust. Many e-commerce platforms have the mechanisms to evaluate user's trust, for example, Fig. 2 shows the mechanism to evaluate user's trust on Tripadvisor.com. The more the number of "helpful votes" a single review gets, the more the trustworthy the user possesses. By comparison, some e-commerce platforms do not have the mechanisms to evaluate user's trust. For the former, there is the standardization of user's trust: where t u is the trust of user u on the e-commerce platform, max(t) is the maximum value in all users' trust. For the latter, the trust quantitative method needs to be further studied in the future.
User consumption capacity. Similar users have similar consumption capacities. User's consumption capacity can be reflected by the prices of hotels user has stayed. In this study, we will utilize linguistic term sets to express users' consumption capacities. Firstly, assume that there is a set L of nine linguistic terms S to depict the price, where Expensive", s 2 = "Expensive", s 3 = "Significantly Expensive" and s 4 = "Certainly Expensive".
Secondly, collect the price distributions of different cities, and divide the prices of hotels in each city into 9 levels corresponding to linguistic terms S. For the same city, the higher the price of hotel user booked, the stronger the consumption capacity of user. Finally, for each user, consumption capacity is denoted as H u in the form of linguistic term sets.

The definition of similar group
In this study, we identify consumers with high similarity in terms of interest, trust, and consumption capacity as similar group, and they tend to have a similar purchasing preference.
(1) Generally, the more similar the characteristics of hotels and the aspects when commenting on hotels, the higher the preference similarity between users. The interest preference similarity between target user u and the user v can be computed as follow: where N u , N v are the numbers of interest features for user u and user v; j denotes the jth common interest feature for user u and user v; m is the total number of common interest features for user u and user v, freq uj and freq vj are the frequencies of the jth common interest feature in user u and user v interest features, respectively.
(2) The greater the value of user's trust, the more the target user trusts another user, which means the higher the trust similarity degree between target user and the other user. In addition, the greater the trust value of the target user himself, the higher the requirement in trust for the users in the similar group. The method of calculating the trust similarity is shown as: where trust u , and trust v denote the trust degrees of user u and user v, min(trust u , trust v ) is the minimum value in trust u , trust v .
(3) For user u and user v, their consumption capacities are denoted as H u and H v , then the consumption capacity similarity between them is defined as Then, for user u and user v, the comprehensive similarity is calculated as follow: According to the comprehensive similarity of users, we divide the users into three groups: • Similar group (G 1 ): the user whose user similarity is higher than θ 1 is a member of similar group. • Weak similar group (G 2 ): the user whose user similarity is less than θ 1 and higher than θ 2 is a member of weak similar group. • Dissimilar group (G 3 ): the user whose user similarity is less than θ 2 is a member of dissimilar group.
The equation to identify each group is as follows: In this study, for G 1 , G 2 , G 3 , the weight of each group is defined as follow: where user u is the target user, sim average(u, G i ) is the average of the similarities between members in group G i and the target user.

The quantification of hotel evaluation criteria
In order to improve the accuracy of recommendations, this study comprehensively considers three factors, which are rating, online reviews and price as hotel evaluation criteria.
Rating. The rating reflects the satisfaction degree of a user towards a hotel. For a hotel, some users rate it with high value, while others give low ratings. Simply seeking the average rating of all users just reflects little information. In order to retain the rating preferences of different users for the hotel as much as possible, for each group, we use an intuitive fuzzy number to express the satisfaction of group towards hotel.
In most of the electronic commerce platforms, the rating scale ranges from 1 to 5. A score of 4 or 5 represents that the user likes the hotel, and a score of 1 or 2 represents that the user does not like the hotel. Therefore, when the scale is in the range of 1 to 5, the ratios of ratings above 3 and less than 3 in the group indicate that the degree of membership and the degree of non-membership towards hotel. G 1 represents similar groups. G 2 means weak similar group and G 3 means dissimilar group.
Compared to users with low similarity, users with high similarity have a greater impact on the purchasing decision of target user. The ratings and online reviews of similar group are more important than those of weak similar group or dissimilar group are. For each group (G 1 , G 2 , G 3 ), we can calculate their weights based on Equation (13). By giving different groups different weights, it not only makes the result more accurate than directly calculating the average of all users' ratings and online reviews, but also makes the recommendation result personalized.
The comprehensive rating for all users towards the hotel is IF (G): Online reviews. Because one online review may have positive, neutral and negative evaluations at the same time. In order to depict online reviews appropriately and reduce the loss of information, we use single valued neutrosophic numbers to express online reviews. According to the definition of SVNNs, the online review of user u on the hotel i is expressed asR ui = T ui , I ui , F ui . T indicates the positive degree of the positive online review, and I indicates the neutral degree of the neutral online review and F indicates the negative degree of the negative online review.
Further, we regard all users in the same group as one virtual user. Then the online reviews of users in the same group are regarded as one online review. The online review for group G i is denoted as: Then all users' online reviews are calculated as: Price. The quantitation method of price is similar to that of consumption capacity in Section 3.1. First, define the set of linguistic terms that represent the price. Assume that there is a set S of nine linguistic terms to depict the price, where S = {s −4 , s −3 , s −2 , s −1 , s 0 , s 1 , s 2 , s 3 , s 4 }, Cheap", s −3 = "Significantly Cheap", s −2 = "Cheap", s −1 = "Somewhat Cheap", s 0 = "Medially ", s 1 = "Somewhat Expensive", s 2 = "Expensive", s 3 = "Significantly Expensive" and s 4 = "Certainly Expensive". Then, collect the price distribution of hotels in each city, and divide the price into 9 levels corresponding to the price linguistic term of each city. The price of each hotel can be expressed as a linguistic term, and denoted as s β . Let the evaluation of price for hotel i be denoted as P i : where s α is a linguistic term, α is the average value of subscripts in the linguistic term set which is used to express the target user's consumption capacity.

The model of recommendation based on TOPSIS
Because the expressions of three criteria to evaluate hotels are different, criterion values cannot be aggregated directly by aggregation operators. The TOPSIS method aggregates all criteria based on distance measure, and sorts the alternatives according to the degree of closeness. There is no special requirement for the data form of each criterion value. In this part, we will sort hotels by TOPSIS method and form final recommended list. The following are the steps: (1) Obtain the positive ideal solution and the negative ideal solution.
The positive ideal solution consists of the positive ideal value of each criterion, and the negative ideal solution consists of the negative ideal value of each criterion. For the criterion of rating, the positive ideal value is 1, 0 , which means all users have high ratings on hotel (4 or 5, in a scale of [1][2][3][4][5]. For the criterion of online review, the positive ideal value is 1, 0, 0 , which means all online reviews contains only positive comments, in which the positive-membership degree of the online review is equal to 1. For the criterion of price, the positive ideal value is 1, which means the similarity between the price of hotel and the consumption capacity of user is 1. The negative ideal value is completely opposite to the positive ideal value under each criterion.
Then, the positive ideal solution A + and the negative ideal solution A − are descripted as follows: (2) Calculate the distance. For each alternative hotel, calculate the distance between each alternative and the positive ideal solution D + i , as well as the distances between each alternative and the negative ideal solution D − i . For the criteria rating, online reviews and price, the distances can be calculated based on Equations (1, 2 and 4).
(3) Calculate the Close Index C i .
Rank hotels based on Close Index. The greater the value of hotel's Close Index, the higher the recommended ranking of hotel. Then recommend the hotels to the target user u based on the new ranking list.

The process of recommendation
In this part, we will introduce the process of mechanism for hotel recommendation to achieve personalized search engine based on the researches we had discussed. The proposed approach is comprised of the following steps: Step 1: Obtain the alternative hotels. Using search engine on e-commerce platform to filter the hotels based on the keywords entered by the target user.
Step 2: Identify similar group for the target user. According to the user's historical records and the similar group identification method proposed in 3.2. For each pair of users, calculate the similarity using Equations (6)(7)(8)(9), and divide users into three groups.
Step 3: Calculate the weight of each group according to Equation (13).
Step 4: Calculate hotel evaluation values under three criteria: price, rating, online review based on the methods proposed in Section 3.3.
Step 5: Use the TOPSIS method to rank the alternatives according to Section 3.4. Bigger values of Close Index are associated with higher ranking.
Step 6: Recommend the hotels based on the ranking list in step 5 to the target user.

A case study based on Tripadvisor.com
A case study is conducted in order to validate the efficiency and applicability of the proposed method on e-commerce website recommender problems.

Dataset
In this section, we use the data collected from Tripadvisor.com, which is a travel-sharing site where users can comment on hotels they have lived in, attractions they have visited and foods they have eaten. In this case study, we assume that the target user's demand is to order a hotel in London by searching for the keyword "London hotel". Then we randomly select 10 hotels in the recommended list of search engine as alternative hotels and collect basic information about 10 hotels. For each hotel, we collect 20 users' ratings and online reviews. We use these data as the test set. Besides, we collect user's information and historical comment records about hotels for each user, which are used as the training set. Table 1 displays the 10 hotels' information. The first column is the name of hotel, the second one is hotel's ranking in 10 hotels, the third column is hotel's ranking in all London' hotels based on search engine, the fourth one is hotel's price, and the sixth one is hotel's overall rating. Table 2 shows the user' information, in which the first column is user's name, the second column is the number of hotels that the user comments, the third column is the number of other users that consider the user's comments is helpful, the fourth and fifth columns are the original and standardized trusts of user on Tripadvisor.com. Table 3 is a sample about the detailed records of users' comments on hotels. Since there is too much text in online reviews and hotels descriptions, it is not shown here. In Table 3, the first column is user's ID, the second column is the hotel the user commented on, the third column shows user's rating to the hotel and the fourth and fifth columns show the price and city of hotel.

Result
Firstly, we identify users' attributes of interest, trust and consumption capacity, respectively. The Table 3  A sample about the detail records of users commented on hotels   User ID  hotel  rating  price  City of hotel   3  Oakley Hall Hotel  4  1393  Hampshire  3  The Savoy  5  3579  London  3 Flemings trust attribute has been standardized according to the third column in Table 2 and the standardized trust for user is shown in the fourth column in Table 2.
Based on the online reviews and hotels descriptions, the keywords used to represent the users' interests are displayed in Fig. 3, the larger the size of a word, the higher the frequency. The consumption capacity of the user can be obtained from the data in the fourth column in Table 3. We compare the hotel's price distribution in several cities, as Fig. 4. It is obvious that the price distributions in different cities are evidently distinct, the linguistic terms corresponding to the same price may be different.
Then for each user, we calculate the similarity with other users based on Equations (6)(7)(8)(9) proposed in Section 3.3. A part of the results is shown in Table 4. The first column is the users' ID, the second to fourth columns are these users' similarities with user 1 in interests, trust, and consumption capacity. The fifth column is the overall similarity. Because user 1 has no historical records, the similarities between him and the others in interests and consumption capacity are zero.  Table 5 The results about similarity groups User ID G1 G2 G3 1 2,3,9,13,14,19 5,8,11,12,16,18 4,6,7,10,15,17,20 2 1,3,9,13,14,19 5,8,11,12,16,18 4,6,7,10,15,17,20 3 2,3,5,7,14,20 6,8,9,12,13,19 4,10,11,15,16,17,18 Table 6 The ranking of each hotel for users X X X X X X X X Then, for a target user, for each item, the users who have evaluated the hotel can be divided into three groups. A part of the results is displayed in Table 5. Similarly, the first column is user ID, the second to the fourth column are the users belongs to similar group, weak similar group and dissimilar group, respectively.
Then based on the method proposed in Section 3.4, each user can obtain a personalized recommended list for hotel. The details are display in Table 6. In Table 6, rows 2 to 6 are 10 hotels' rankings for users 1 to 5 who booked the 41 hotel. For each of the remaining nine hotels, we select one user who have commented on the hotel, and display the 10 hotels' ranking for these users as rows 7 to 15. In search engine, the ranking of these 10 hotels is {1, 2, 3, 4, 5, 6, 7, 8, 9 10} for all users. The result in Table 6 indicates that the ranking lists of these 10 hotels for most users are different.

Discussion
From Tables 4 to 6, for all users, the proposed method has the ability to calculate their similarities with other users, to identify their similar groups and to provide them a personalized recommended list. The feasibility of proposed hotel recommendation method has been illustrated. What's more, the results in Table 6 illustrate that the proposed mechanism used for search engine can provide personalized list for users.
In the next step, we will discuss the accuracy of the proposed method. In this study, we use the index of Precision to measure the accuracy. Precision is the ratio of the items the user liked to all the recommended items in the recommended items [48]. Because in the test set, only one hotel per user has been ordered, there is a fine-turning for the definition of Precision. In this study, we consider the recommendation is successful when the ranking of hotel user ordered is top-n (where n is the threshold) among all hotels. Therefore, the Precision is the percentage of providing successful recommendation to all users. The calculation of Precision is shown as: where N(s) is the number of providing successful recommendation, and N is the total number of providing recommendation.
We calculate the Precision of the proposed method (model 1) and search engine currently used in Tripadvisor.com (model 2). Figure 5 exhibits the Precisions with the change of n for two methods.
From Fig. 5, it is obvious that the Precisions of the proposed method are always greater than that of model 2. That is to say, the accuracy of the proposed method is improved comparing to the search engine currently used in Tripadvisor.com. The hotel's ranking in search engine will remain the same for a long period. For the user we collected, the accuracy of model 1 for users who booked top hotels is always higher than that for the user who booked bottom hotels. For instance, for users who booked hotel 1, the Precision of model 2 is equal to 1, while the Precision of model 2 for users who booked hotel 10 is equal to 0. The Precision is extremely unstable for users with different preferences on booking hotels. Therefore, we analyze the Precision for different users with different preferences. Figure 6 shows the Precision after eliminating the users who ordered the top-␣ hotels (where ␣ is a threshold). For example, when ␣ = 1, it means that we just calculate the Precisions for users who booked hotels 2 to hotel 10. The results of Model 1(a) and model 2(a) show the Precisions for model 1 and model 2 respectively when the value of n is 5. The results of Model 1(b) and model 2(b) show the Precisions for model 1 and model 2 respectively when the value of n is 3. The results of model 1(c) and model 2(c) show the Precisions for model 1 and model 2 respectively when the value of n is 1. Consistent with the results in Fig. 5, the Precisions of the proposed method are always greater than those of search engine used in Tripadvisor.com whatever the value of n or ␣ is. In Fig. 6, we can observe that the precisions are zero for the users who prefer the bottom-5 hotel whatever the value of n is. However, the proposed method has stable performance. The result further proves the deficiency of the hotel recommended list generated by search engine, and indicates that the proposed method can provide users accurate and personalized hotel recommended list.
To go a step further, we discuss the accuracy for cold start users. The Precisions are presented as (in Table 7 The precision for cold start users this part, n = 5). In the same way, the Precisions of model 1 are greater than those of model 2, and when user has one historical record, the Precision of model 1 is significantly greater than that of model 2. The results in Table 7 prove that the accuracy of the proposed method is improved for cold start users, too.

Conclusion and future works
This study proposed a comprehensive method for hotel recommendation to modify the performance of search engine to deal with the problem that search engine cannot provide personalized hotel recommended list. The proposed method considered users' individualized preferences from the aspects of user interest, user trust and user consumption capacity and divided users into three groups. Besides, compared to traditional method, in this paper, we evaluated hotel in the criteria price rating and online reviews, which can provide a more precise recommendation than using a single criterion. For the criteria of rating and online reviews, we gave different weights to different groups. For the criterion of price, we considered user's consumption capacity for hotel. In order to ensure the accuracy of the proposed mechanism, we proposed the methods to quantify user attribute and hotel evaluation criteria by using fuzzy theory to express information more efficient. What's more, we utilized TOPSIS method to solve the problem. A case study based on Tripadvisor.com was conducted to verify the feasibility and efficiency of the proposed method. The results of the case study illustrated the proposed method can achieve personalized recommendations, and improve the accuracy of search engine.
There are also some limitations in this study. For instance, in the actual decision-making process, there are many factors affect decision of different consumers, and here we only consider three most important ones. Besides, because in the section of quantifying trust, we just give the method for the e-commerce platforms that have the trust evaluation system, the application of the proposed method is limited.
In future research, we will explore some other factors that affect decision-making to different users in different cases to improve the accuracy of recommendation. In addition, the quantification methods of user attributes and hotel evaluation criteria also need to be further improved. Finally, we will apply this method in practice in various fields such as movie recommendation.