Classification Of Category Selection Title Undergraduate Thesis Using K-Nearest Neighbor Method

This research makes the classification system of category selection title undergraduate thesis title use k-nearest neighbor method. This research will be conducted on the students of Informatics Engineering Department Faculty of Engineering, Universitas Nusantara PGRI Kediri. The purpose of making this system is to employee department and students to more easily make a classification of category selection undergraduate thesis title based on the field of interest and field of expertise of each student. The method used to classify the selection of undergaduate thesis title categories is k-nearest neighbor method using several criteria based on students' interests and expertise in a particular field or course. The result of this sitem is an information category of undergraduate thesis title of students who have been processed based on the field of interest and field of expertise of each student.


Introduction
The university is the organizer of academic education for students [1].Students as college products can be used as a reference to show the success of education.The success of student achievement can be seen from several factors such as cumulative grade point value (GPA) obtained by students, duration of student study, and student thesis score.
In this research will be discussed one of the above factors is the success of student undergraduate thesis score.Understanding undergraduate thesis according to a large dictionary of the Indonesian language is a scientific essay that must be written by students as part of the final requirements of academic education [2].Based on this, every final year student must take and complete the undergraduate thesis as the graduation requirement from higher education.But in reality many students have difficulty in choosing the category and title of the graduate thesis according to the field of interest and the field of expertise, therefore in this research will be made a classification system of category selection of student undergraduate thesis title using k-nearest neighbor (k-nn) method.
Nearest neighbor is an approach to finding a case by calculating the proximity between a new case and an old case, based on a weighted fit of a number of features [3].Nusantara University PGRI Kediri is a private university in Kediri City which has five Faculties, one of which is the Faculty of Engineering which has five study programs namely: Informatics Engineering, Mechanical Engineering, Electrical Engineering, Industrial Engineering and Information Systems.This research will be conducted in Informatics Engineering Study Program to assist the department in classifying the category of student's undergraduate thesis title and assisting the final grade students to choose the title category of undergraduate thesis based on the field of interest and the area of expertise.
The problems can be formulated in this research are how to apply k-nearest neighbor method to classify the selection of undergraduate thesis title categories of students based on their field of interest and field of expertise and how to apply k-nearest neighbor method for the proper and accurate classification of category selection of student undergraduate thesis title.The purpose of this research is to Apply k-nearest neighbor method for classification of undergraduate thesis title categories based on the field of interest and the field of expertise of  .This research makes a decision support system to help the students to determine the desired thesis title by using simple additive weigthing method [4].
Research conducted by Ratih Kumalasari Niswatin in 2015 under the title Decision Support System New Student Placement Using K-Nearest Neighbor Method.This study discusses the web-based decision support system to provide recommendation placement of new student majors are advised to enter the majors of informatics engineering or information systems using k-nearest neighbor method [5].Research conducted by Khafiizh Hastuti in 2012 with the title of Comparative Analysis of Data Mining Classification Algorithm for Non-Active Student Prediction.This study comparable some data mining classification algorithms for predictions of non-active students, the goal is to determine the accuracy of each algorithm.The algorithm compiled is logistic regression, decision tree, naive bayes and neural network [6].
Definition of decision support system, according to Alter decision support system is an interactive information system that provides information, modeling, and manipulation of data used to assist decision making in semi-structured situations and unstructured situations where no one knows for sure how decisions should be made [7].Definition of data mining, data mining is a term used to describe the discovery of knowledge in the database.Data mining and knowledge discovery in databases (KDD) are often used interchangeably to explain the process of extracting information in a large database.Data mining techniques are divided into several groups based on the tasks that can be done that is description, estimation, prediction, classification, clustering, association [3].
Classification is the process of finding a model or function that describes and distinguishes a data class or concept that aims to be used to predict the class of objects whose class labels are unknown.Data classification consists of two process steps, the first is the learning process (training phase) where the classification algorithm is made to analyze the training data and then represented in the form of classification rule, the second process is the classification where the test data is used to estimate the accuracy of the classification rule [8].The classification process is based on four components, they are [9]: Class is a dependent variable in the form of categorical which represents the label contained in the object.Predictor is an independent variable that is represented by the characteristics of the data attribute.Training dataset is a set of data that contains the values of the two above components that are used to determine a suitable class based on predictors.Testing datasets are new data that will be classified by the model that has been created and the accuracy of the classification evaluated.
There are several algorithms that are often used in the classification technique that is knearest neighbor classification algorithm, decision tree, naive bayesian classification, and support vector machines.Nearest neighbor is an approach to finding a case by calculating the proximity between a new case and an old case, based on a weighted fit of a number of features [3].The purpose of the nearest neighbor algorithm is to classify new objects based on attributes and training samples [10].K-Nearest Neighbor is one of the methods used in classification.
The working principle of K-Nearest Neighbor is to find the closest distance between the data to be evaluated with the nearest neighbor in the training data.Here is a distance search formula using euclidian formula [11].

Research Method
The method used for system development will be created using waterfall method concept.The following are the stages of the research method to be performed.At this stage, the analysis of the needs of both software and hardware systems.

2) Study Literature
At this stage, the process of extracting information and study materials relating to research to be done, the materials studied are sourced from the relevant journals and books related to the research.

3) Data Collection
In this phase also carried out the data collection process by conducting interviews, observation and documentation of the data -the student data is needed.This stage will produce a document user requirements or data relating to the wishes of the user in the manufacturing system.This document will be the reference in the design stage of the system.

4) System Design
System design requirements will translate requirements into a software design before coding is made.In this process created the software architecture design, data structures, interface representations, and procedural algorithms.

5) Implementation
At this stage of the design process of translation into the language that can be recognized by the computer.In this process do the programming (coding) in accordance with the system.The programming language used in this system is PHP, the database used to store data is MySQL.At this stage, the results of the report creation system manufacture.

Results and Analysis
In this section consists of two processes, they are explanation of k-nearest neighbor method and system design.

K-Nearest Neighbor Method
The classification system of category selection of student thesis title in this research is made using k-nearest neighbor method.The k-nearest neighbor method is chosen because it is the appropriate classification method used to classify the category of thesis titles based on the students' values on certain skill courses and based on the interest of the students in certain categories of fields.The criteria used in the classification of category selection of student undergraduate thesis title in this research is the value of information technology infrastructure planning course (PITI), the value of the software engineering course (RPL), the value of the network course, the value of the data mining course, the course processing grade Image, programming algorithm course values (ALPRO), database course values, and student interest in certain categories.Here are the steps -classification of category selection undergraduate thesis title students using k-nearest neighbor method.

Determining Data Training
Data training use student data force 2012 Informatics Engineering Program.Sample training data shown in table 1 below amounted to twenty student data.The attributes used are the value of information technology infrastructure planning course (PITI), the value of the software engineering course (RPL), the value of the network course, the data mining course grade, the value of the image processing course, the programming algorithm course (ALPRO) , The value of the database course, and student interest.The interest category attribute of the thesis title is converted into numbers with the conversion of number one for the interest of the information system category, the number two for the interest of the software engineering category, the number three for the interest of the network category, the number four for the interest of the data mining category, the number five for interest in the image category.In the training data is also displayed the results of the title category thesis students.Table 2 is the data testing on the classification of category selection of student thesis title.The attributes used are the value of information technology infrastructure planning course (PITI), the value of the software engineering course (RPL), the value of the network course, the data mining course grade, the value of the image processing course, the programming algorithm course (ALPRO) , the value of the database course, and student interest.At this stage determined the amount of data to be taken as the closest distance.5. Sorting the data from the smallest distance to the largest distance, then taken 5 data with the smallest distance / shortest (K = 5).Figure 3 shows the result of calculating the sorted distance.Based on figure 3 sequence of the first shortest distance to the fifth, there are three categories of images and two categories of SI, so the first data testing by the name of student Achmad Misbakhul Huda included in the category of Image (citra).

System Design
In the classification system, the selection of students undergraduate thesis title using knn method there are two users namely lecturers and employees of Informatics Engineering Study Program. Figure 4 below shows the design of a system diagram context.In Figure 4 shows entity lecturer can classify thesis title selection using knn method and lecturer entity can also receive classification information.Employee Prodi entity can enter training data as data reference process calculation method knn and employee of Prodi also can receive information of result of classification.Furthermore, a more detailed system diagram shown in Figure 5 data flow diagram (DFD) classification of student undergraduate thesis title selection using k-nearest neighbor method.
Figure 5 shows the flow of data flow diagram system where entity employees Prodi login process then login data will be verified in the data store login and if verified then the user employees Prodi can proceed to the process insert, update, delete data training.Further training data will be stored in data store training data.Likewise, the lecturer enters the login process first then the login data will be verified based on user data on the data store login, if the login data is verified then the lecturer user can proceed to the insert, update, delete data testing process, then the testing data will be stored in store testing data.
Training data will then be used as a reference to the process of calculating the knn method to classify the existing data testing, then the data classification of student thesis title  The result of data store decomposition on classification system of student undergraduate thesis title using knn method described in detail on entity relationship diagram (ERD) in the Figure 6.6 there are four entities that are entity login, entity training data, data testing entity and entity data classification.The login entity has four attributes: the login id attribute as the primary key, the username attribute, the password attribute and the status attribute.The training data entity has twelve attributes: the id attribute of the master data as the primary key, the attribute of the number, the name attribute, the attribute of interest, the attribute n1 representing the value of the course of information technology infrastructure planning, the n2 attribute representing the value of the software engineering course, the attribute n3 Which represents the value of the network course, the n4 attribute representing the value of the data mining course, the n5 attribute representing the value of the image processing course, the n6 attribute representing the course value of the programming algorithm, the n7 attribute representing the value of the database course and the n8 attribute representing the undergraduate thesis category.
The data testing entity has eleven attributes namely the id attribute of the test data as the primary key, the attribute number, the name attribute, the attribute of interest, the attribute n1 representing the value of the course of information technology infrastructure planning, the n2 attribute representing the value of the software engineering course, the n3 attribute represents the value of the network course, the n4 attribute representing the value of the data mining course, the n5 attribute representing the value of the image processing course, the n6 attribute representing the course value of the programming algorithm and the n7 attribute representing the value of the database course.The data classification entity has two attributes namely the attribute number as the primary key and the weight attribute.

Conclusion
The conclusion from the making classification of category selection title undergraduate thesis using k-nearest neighbor method is as follows: 1.The research to apply k-nearest neighbor method that can classify the category of student undergraduate thesis title based on the interest and the values of the course of planning and infrastructure of information technology, software engineering, network, data mining, image processing, programming algorithm and database.2. The results of the classification k-nearest neighbor method are grouped into categories of information systems, software engineering, networking, data mining and image processing.

IJEECS
ISSN: 2502-4752  Classification Of Category Selection Title Undergraduate Thesis… (Ratih Kumalasari Niswatin) 851 3. Calculating Distance Calculating the distance between attributes in the data testing with the distance attribute in data training using the formula below: √∑ Where x1 = sample data (data training), x2 = test data (data testing), i = variable data, dist = distance, p = data dimension.In the calculation of variable x1 using data testing in table 2, variable x2 using training data in table 1, variable i is the attribute used is the value of information technology infrastructure planning courses (PITI), the value of software engineering courses (RPL) College network, data mining course grades, image processing course grades, programming algorithm course values (ALPRO), database course grades, and student interest.Here is the calculation of the distance on the test data to one:√∑ √∑The distance calculation results for all training data on the first data testing as shown in Figure2.

Figure 4 .
Figure 4. Context Diagram System Selection Title Undergraduate Thesis… (Ratih Kumalasari Niswatin) 853 using knn method stored in the data store classification data and then displayed on the process of classification results to be viewed by lecturer entities and employee department entities.

Figure 5 .
Figure 5. Data Flow Diagram System

Figure 6
Figure 6.Entity Relationship Diagram

3 .
The results of research used by lecturers and employees Prodi to facilitate the classification of the category of thesis in accordance with the field of interest and field of expertise of each student in Informatics Engineering Program Faculty of Engineering, University of Nusantara PGRI Kediri.
in Informatics Engineering Program Faculty of Engineering, Universitas Nusantara PGRI Kediri.Benefits and contributions of this research are assisting the employee departement of informatics to classify the category of student undergraduate thesis title in Informatics Engineering Program Faculty of Engineering Universitas Nusantara PGRI Kediri and assisting the final grade students to select undergraduate thesis title categories based on the areas of interest and the areas of expertise.Some of the previous studies that underlie this research are: Research conducted by Kiki Rizky Ananda S in 2014 with the title Decision Support System To Determine Title Thesis Department of Computer Information Techniques Using Simple Additive Weigthing Method (SAW IJEECS ISSN: 2502-4752  Classification Of Category Selection Title Undergraduate Thesis… (Ratih Kumalasari Niswatin) 847 each student

Table 1 .
Table 1 below is sample training data used.Data Training

Table 2 .
Data Testing