Published December 21, 2022 | Version v1
Dataset Open

Prediction of Personality Traits using the Big 5 Framework


The methodology is the core component of any research-related work. The methods used to gain the results are shown in the methodology. Here, the whole research implementation is done using python. There are different steps involved to get the entire research work done which is as follows:

1. Acquire Personality Dataset

The kaggle machine learning dataset is a collection of datasets, data generators which are used by machine learning community for analysis purpose. The personality prediction dataset is acquired from the kaggle website. This dataset was collected (2016-2018) through an interactive on-line personality test. The personality test was constructed from the IPIP. The personality prediction dataset can be downloaded in zip file format just by clicking on the link available. The personality prediction file consists of two subject CSV files (test.csv & train.csv). The test.csv file has 0 missing values, 7 attributes, and final label output. Also, the dataset has multivariate characteristics. Here, data-preprocessing is done for checking inconsistent behaviors or trends.


2. Data preprocessing

After, Data acquisition the next step is to clean and preprocess the data. The Dataset available has numerical type features. The target value is a five-level personality consisting of serious,lively,responsible,dependable & extraverted. The preprocessed dataset is further split into training and testing datasets. This is achieved by passing feature value, target value, test size to the train-test split method of the scikit-learn package. After splitting of data, the training data is sent to the following Logistic regression & SVM design is used for training the artificial neural networks then test data is used to predict the accuracy of the trained network model.


3. Feature Extraction

The following items were presented on one page and each was rated on a five point scale using radio buttons. The order on page was EXT1, AGR1, CSN1, EST1, OPN1, EXT2, etc. The scale was labeled 1=Disagree, 3=Neutral, 5=Agree


                EXT1	I am the life of the party.
                EXT2	I don't talk a lot.
                EXT3	I feel comfortable around people.
                EXT4	I am quiet around strangers.
                EST1	I get stressed out easily.
                EST2	I get irritated easily.
                EST3	I worry about things.
                EST4	I change my mood a lot.
                AGR1	I have a soft heart.
                AGR2	I am interested in people.
                AGR3	I insult people.
                AGR4	I am not really interested in others.
                CSN1	I am always prepared.
                CSN2	I leave my belongings around.
                CSN3	I follow a schedule.
                CSN4	I make a mess of things.
                OPN1	I have a rich vocabulary.
                OPN2	I have difficulty understanding abstract ideas.
                OPN3	I do not have a good imagination.
                OPN4	I use difficult words.


4. Training the Model

Train/Test is a method to measure the accuracy of your model. It is called Train/Test because you split the the data set into two sets: a training set and a testing set. 80% for training, and 20% for testing. You train the model using the training set.In this model we trained our dataset using linear_model.LogisticRegression() & svm.SVC() from sklearn Package


5. Personality Prediction Output

After the training of the designed neural network, the testing of Logistic Regression & SVM is performed using Cohen_kappa_score & Accuracy Score.



Files (4.7 MB)

Name Size Download all
2.3 MB Preview Download
4.3 kB Download
5.0 kB Preview Download
4.9 kB Preview Download
10.5 kB Preview Download
21.4 kB Preview Download
10.5 kB Preview Download
21.4 kB Preview Download
2.3 MB Preview Download