Prediction of engineering students' academic performance using Artificial Neural Network and Linear Regression: A comparison

Predicting students' performance is very important if not crucial especially in engineering courses. This is to enable strategic intervention to be carried out before the students reach the higher semesters including the final semester before graduation. This paper presents a comparison study between Artificial Neural Network (ANN) and Linear Regression (LR) in predicting the academic performance. Cumulative Grade Point Average (CGPA) was used to measure the academic achievement at semester eight. The study was conducted at the Faculty of Electrical Engineering, Universiti Teknologi MARA (UiTM), Malaysia. Students' fundamental subjects results at first semester were used as independent variables or input predictor variables while CGPA in the final semester that is at semester 8 is used as the output or the dependent variable. Performances of the models were measured using the coefficient of Correlation R and that of Mean Square Error (MSE). The outcomes of the study from both models indicate a strong correlation between fundamental results for core subjects at semester one or semester three with the final CGPA.


INTRODUCTION
Students' performance prediction is an important issue to most academic institutions of higher learning. This had led to many researches in prediction work that included students from different background and study areas such as business, medical and computer technology [1][2][3] however, most past researchers used predicator variables or the independent values from demographic profiles. Data used were collected mostly from survey forms based on students' former background education, residency region, gender and Scholastic Aptitude Test (SAT) scores.
In the past, statistical Package for Social Sciences (SPSS) has been very popular and was used extensively by researchers in this area. Recently, the use of Artificial Neural Network (ANN) is gaining popularity with researchers working in the area of students' performance prediction. [2,4,5]. However, these researchers developed NN models using data from demographic background as inputs to the model. The study presented in this paper, used the Grade Point (GP) of fundamental subjects scored by the students at first semester as inputs. Once the students are accepted into the Program based on merits set by the Faculty of Electrical Engineering UiTM, then every help should be offered to help students to perform in their study before graduation which is in line with the Vision and Mission of the University. [6][7][8].
This paper presents the results of a study that compares the academic performance of Matriculation and Diploma Students using two different methods which are Multiple Linear Regression (LR) and Neural Network (NN). II.

DATA COLLECTION AND MODEL BUILDING
Data of Matriculation and Diploma students were compiled in Excel format which included identity number, gender, Grade Points of subjects scored at semester one and semester three for Matriculation students. For the Diploma students, the Grade Points of subjects were collected from semester three as these students enters the course at semester 3 since they were given credit exemption for courses in semester one and two. The most important data is the identity number of students as that number will relate to its final CGPA at semester eight. There were 391 matriculation students from three batches, which are from intakes in July 2005, 2006 and 2007. The Diploma students totalled up to 505 from 3 intakes in July 2006, 2007 and 2008. There were seven (7) subjects attempted by students at semester one but for this study we omitted subjects like Co-curriculum and Laboratory work as input or independent variables. At semester three we again omitted Laboratory work and Tamadun Islam (Islamic Civilisation) as such subjects are not basic foundation to higher courses along the path of the program. SPSS is most widely used for statistical analysis in social science. It is commonly used by market researchers, health researchers, survey companies, government, education researchers and marketing organizations. The original SPSS manual has been described as one of sociology's most influential books. In addition to statistical analysis, data management (case selection, file reshaping, creating derived data) and data documentation (a metadata dictionary is stored in the data file) are features of the base software. Statistics base software package includes Descriptive Statistics (cross tabulation, Frequencies), Means, ANOVA, Correlation and prediction for numerical outcomes, linear Regression, (LR) [9] LR is used when we want to predict the value of a variable based on the value of another variable. When there are more than one input variable, it becomes multiple linear Regression or just multiple Regression. In this research model, the dependent variable is the final CGPA8 while the independent variables are GP scored for fundamental subjects like Circuit Theory (CT), Fundamental Electronics (FE), Signals and Systems (SS), Mathematics (MAT) and Communication Theory (COMM).
Then the same students were followed through with subjects at semester three. This time around, the subjects included Digital system, Material Science, Signals and System 2 (SS2), Mathematics 2 (MAT2) and English 1.
Tables 1 and 2 below depict the model summary and the ANOVA output for Matriculation students at semester one.    Fig. 1, it can be seen that at lower CGPA8, the predicted seemed to be higher than the actual value. From Fig.  2, the expected residual is above the best line and then the expected is below the best line at higher end. Then the procedure was repeated for subjects at semester three (DataMaxSem3) with the same number of students and also tested for a different set of data of 505 students from Diploma (DataDipSem3). The following tables depict the model summary and ANOVA output respectively.      From Figures 7 and 8 it can be seen that at lower CGPA, the predicted is higher than the actual or targeted while at higher CGPA the predicted is lower than the actual CGPA8.
V. RESULT AND ANALYSIS Table 8 shows the MSE and MSR respectively for NN and LR model developed. It can be seen that both models yielded the same Residual or error in every case. However from Table 9 the R for NN is much higher than R for LR model for all cases. The trend and pattern of the outcomes of the prediction model holds true for all three different cases.

VI. DISCUSSION AND RECOMMENDATION
As can be seen earlier, the student's performance in the fundamental subjects at semester one for the Matriculation and at semester three for Diploma students do influenced their final CGPA8 upon graduation. These fundamental subjects form the foundation or pre-requisite to all other subjects in the following semesters in the path of their study plan. Those with low CGPA at the start are advised to re-schedule to take lesser credit hours during full semester and to register for intersession or summer schools in order to improve the CGPA. The role of academic advisors (AA) and lecturers are critical this time around to motivate, convince and to inculcate positive thinking to fellow students until graduation.
As those with high CGPA at the start of the Program are to work smart and steadily maintain the CGPA. This is due to the fact that it is extremely difficult to maintain as the total credit hours that form the denominator for the final CGPA calculation. Furthermore, the subject matter is more difficult for the students as they progress further into the higher semesters of the electrical engineering courses.

VII. CONCLUSION
This paper presented LR and NN prediction models to predict students' performance based on multiple entry levels namely Matriculation and Diploma entry levels. This study is limited to Electrical Engineering Degree students at the Faculty of Electrical Engineering UiTM as the data were 978-1-4799-2332-8/13/$31.00 ©2013 IEEE obtained from Students Information Management System (SIMS) developed for academic communities of Universiti Teknologi MARA. From the findings, both prediction models indicated similar results as far as Mean square Error is concerned. The trend and pattern of the outcomes of the models hold true for all three cases. This verified that the fundamental subjects had a strong influence on the final CGPA8. Results of the study showed that strong students' abilities in engineering fundamentals will strongly influence the overall academic performance in an engineering program. Students with lower CGPA at the start can be motivated to do better by early strategic intervention of academic advisors and lecturers before reaching the final semester. Matriculation intake students could be helped as early as the first semester result without wasting further time until semester three. Diploma students can be monitored only at the end of semester three as there have only six semesters to complete the entire Program.