K-Nearest neighbor algorithm on implicit feedback to determine SOP

,

classification to facilitate users in setting SOPs by their preferences, through data classification and into a list of suggestions or predictions in query completion on search [17]. So this research uses domain usage log data based on the SOP document search according to user preferences in the SOP system application.
Usage log data based on SOP document searches performed by employees has usually assumed in implicit feedback that the more employees look for the SOP document, closer to the employee preference. By applying the k-nearest neighbor algorithm to establish the operational standard of the procedure by worker activity with feedback can implicitly identify the most relevant SOPs for users. Bayesian algorithms based on reservation logs show significant results on the addition of context information with their implementation in restaurant ratings [18] and ranking e-commerce based on timestamp [19]. Additionally, monitoring user click behavior demonstrates the effectiveness of approaches that reflect user preferences which further indicate the potential for widespread application of applying to e-commerce purchase predictions [20]. Then by utilizing user interest for recommendations, such as applying a user location vector to its implementation to determine a tourist destination produces a 65-100% effectiveness of the prediction system as well as rank [21].
K-Nearest Neighbor (KNN) is an efficient algorithm for predicting implementations to identify bioluminescent proteins [22]. Then KNN's implementation shows that this method is used to identify relevant news content recommendations based on each user's implicit feedback. The use of the KNN algorithm to establish the freshness classification of fish based on the color of the fish to determine fish consumption is appropriate produces classification with 91.36% accuracy [23]. The application of KNN shows a high enough accuracy to predict the load the next day by merely utilizing the estimated temperature by monitoring the document search behavior. The use of KNN yields an accuracy of 94.95%, and the modified k-nearest neighbor produces 99.51% accuracy in classifying data [24] instruments [25]. Based on previous research, this research will determine the standard operating procedure (SOP) based on implicit feedback by utilizing user preferences so that it will determine the recommendation of an SOP according to consumption of SOP content on the system based on data extraction result and monitoring of user search behavior. In research by applying the KNN algorithm can identify the most relevant SOP for the user.

Implicit Feedback
Implicit Feedback is the feedback issued by the system automatically and by the wishes of the user. This feedback contains only positive values from one class. For example, like products that users, clicks, and bookmarks can be a single-class positive value. Feedback as well as automatic purchases and login access by the system, and those that are more easy to collect, and otherwise provided by the user to the item intentionally [26,27]. Implicit feedback data is information that can use from users without user feedback, which can not only be used between users and content but also other available information [28]. The recommendation system with feedback requires settings to bring feedback to the user preference level.

K-Nearest Neighbor
The KNN algorithm is one of the methods to perform a classification analysis, but in the last few decades, the use of the KNN method is also for prediction [28]. KNN is a learning algorithm a typical KNN method classifies the sample request with the most voting strategy. For each query sample, KNN finds its nearest neighbor in the dataset and then assigns it to a class that is mostly owned by its neighbors. KNN is an algorithm that classifies objects based on similarities of data with other data [29]. The KNN is an approach to finding the case by calculating the proximity between the new case and the old case based on matching the weight of some existing features. Figure 1 is the steps for calculating the KNN algorithm. Here are the steps for calculating the KNN algorithm: a. Weigthing with determining the characteristics in the dataset is the sum of the implicit feedback from the user personalization. b. Calculate the number of implicit feedback within 5 weeks with the provisions if the appropriate profile then multiplied by the percentage of 80% if not the profile match multiplied by 20% percentage.  (1): Where, x1 and x2 show two samples, x1i and x2i are their variable values. e. Determine the data label that has the minimum distance. f. Line the training data into the data label in step 3. g. Repeat steps 2 through 4 until the number of each class is k. With the test document, the system finds the k-nearest neighbor in the classified training location and obtains the category of test document according to the class distribution of these neighbors. Which can be used to measure similarities between neighbors and the test documents for weighting in getting a better classification effect [30]. In this research based on implicit feedback data from usage log that is search SOP document. First calculate the amount of implicit feedback each user has on the SOP, then measure the distance between the user and the SOP document to determine the relevant SOP. In general, the process flow of the application of the KNN method to determine the relevant SOP plan in Figure 2.

Result and Analysis
The calculation of K-NN done by searching the k group of objects in the closest learning data (similar) to the object on the new data or test data, this process is to calculate the similarity between documents calculated based on implicit feedback data in the form of log usage is the standard document search procedure. This study uses SOP document from the Library of Politeknik Pos Indonesia which amounts to 11 SOP documents; SOP is a document containing guidelines or set of rules that are made to facilitate the task of employees to work by their primary tasks and functions in their respective institutions. This study aims to establish the SOP document that best suits the user preferences through user usage history.

Input Identification
The data used in this study amounted to 11 SOP documents. For input identification process, 10 existing SOP documents will go through the preprocessing stage to calculate the rating or value based on the implicit feedback. Provided if accessed by the profile then multiplied by the percentage of 80% and if not by the profile of the meal will be multiplied by a percentage of 20% this is done for minimizing the possibility of user access that does not match the preferences view from the profile. However, it also not a rule out the possibility of the SOP being recommended to be incompatible with its profile because it relies on the user's preference from its implicit feedback. According to the source of the number of SOP access within 5 weeks at least 5 times access, then to determine the dataset seen from the value of the paid difference of 5 times access that is 3. Next to determine the appropriate label and not fit on the view dataset of the total amount of implicit feedback if the number more or equal to 2.4 then the label given is appropriate but if below 2.4 then the label given is not appropriate, the value 2.4 obtained from the multiplication of 3 and 80%, so 2.4 used as a benchmark to determine the label on the dataset for SOP in accordance with the profile users. The existing SOP documents represent as documents with SOP38, SOP39, ... SOPn and document queries will also be considerate as comparable documents with SOPq. Table 1 is the preprocessing phase by determining the value of each SOP to 1 user by observing the history of using SOP for 5 weeks.

Process Identification
After preprocessing done, next is to determine the classification parameter, by looking at the number of datasets that exist then calculate the amount of difference, then from the above dataset obtained k=5. After determining the parameter value, then the Euclidean distance calculation as described from (2)   Furthermore, after calculating the distance between each Euclidean object and document Q, then the ranking for each value on the document, so that obtained Euclidean distance and the overall ranking of documents in Table 2.

Identification of Output
In the identification output is to classify the document based on the ranking with the provisions of parameter 5, this provision is taken based because of datasets used is 10 so the parameter value is taken from half the number of datasets as the difference in comparison. The majority category results where the suitable values are shown by documents with ID D2, D4, and D6, while for un-suitable values are shown in documents with ID D5 and D3, obtaining a ratio of 2:3, for more suitable values. Using the majority category, then based on the calculation of KNN obtained the classification results of the 5 parameters of the data. That document Q is a document corresponding to the user in the field of service by looking at the value of comparison between the 5 parameters with the best distance value with the majority indicating the appropriate label. Figure 3 shows the process to be built on the system to determine the standard document application procedure. Administrator will input the SOP document data; then the SOP document will be accessed by the user by monitoring the user's search behavior as implicit feedback stored in the search log. Next, calculate by using the KNN algorithm to determine the suitability of SOP documents with user preferences. The next user will get recommendations for their preferences.

Result
Experiments using the KNN method with k=5 count similarities between the test data and the center of each label data. This test determines the prosecution of standard operational documents by using feedback from user log-out with greater provision if the selected document is a document that corresponds to the field located on the user profile. Based on the test results, obtained 5 SOP documents with 2:3 presentation, where the ratings of 1.4, and 5 indicate the label according to the rank 2 and 3 label is not appropriate.
Consider the fact that document Q is a document that matches the user by looking at a comparison between the parameters with the relevant value with the KNN method to retrieve from the nearest neighbor that must be obtained, so the following is graph 5 which shows the results. Based on Figure 4, the presentation of the ranking of the SOP document to be known or the test document by looking at the classification parameters of the 5. Documents obtained a presentation of each of the rank 1 until 5 as in the graph shows the majority of labels appear for test data that is appropriate for the service.

Conclusion
The present study presents an approach to determine the SOP documentation using the KNN algorithm for document classification by utilizing implicit feedback from the search log data with the percentage of documents corresponding to the 80% and unsuitable profiles of 20%. KNN is one of the most popular classifier, easy to use and efficient enough. By finding the value of K representing the number of neighbors as the class for the classification. Previous research using the KNN algorithm to perform web text analysis can automatically improve accuracy automatically while in this study there is not many classifications of documents but also predictions or user rates.

Discussion
From the research, however, in the establishment of SOP documents are still in the case study of SOP libraries by SOP determination within the scope of the division, and only utilize implicit feedback in the form of search logs. To establish SOP documents to show more accurate results it should be used in a broad SOP management system and utilize implicit feedback with parameters not only in the search log but maybe user behavior, usage logs and more and discuss performance evaluation. So it is expected that the determination in SOP document retrieval can show more accurate results.