Creating an Additional Class Layer with Machine Learning to counter Overfitting in an Unbalanced Ancient Coin Dataset

Gampe, Sebastian; Tolle, Karsten

doi:10.5281/zenodo.10980380

Published April 16, 2024 | Version v4

Conference paper Open

Creating an Additional Class Layer with Machine Learning to counter Overfitting in an Unbalanced Ancient Coin Dataset

1. Goethe-University, Frankfurt am Main

We have implemented an approach based on Convolutional Neural Networks (CNN) for mint recognition for our Corpus Nummorum (CN) coin dataset as an alternative to coin type recognition, since we had too few instances for most of the types (classes). However, this shift increased an existing problem with our dataset: the extremely unbalaced number of instances per class. While some of our classes consist of only 20 instances, others consist of several hundred. After training our VGG16 model we unsurprisingly observed an overfitting of these “big” classes within the confusion matrix. To reduce this problem, we tried to split the classes with the most images into several smaller ones and called them additional class layers. We use three different machine learning (ML) approaches to perform this breakdown. One is an unsupervised clustering method without additional manual work. The other two are supervised approaches taking into account the motifs of the coins themselves: a) an object detection model that predicts trained entities, and b) a Natural Language Processing (NLP) method to find entities in the textual descriptions of the coins. Based on the combination of obverse and reverse results from these two approaches the new additional class layer were defined. After retraining of our mint recogntion model with these new classes, we evaluated the results based on the confusion matrix. In our case, the best results could be observed by forming additional class layer based on the NLP method.

Files

Gampe_Tolle-Additional_Class_Layer_v4.pdf

Files (5.1 MB)

Name	Size	Download all
Gampe_Tolle-Additional_Class_Layer_v4.pdf md5:86d733fa22ce810c51274ebb4c6ee2cb	2.5 MB	Preview Download
IR-on-coin-datasets-main.zip md5:733bc71c8d7126b76c0b86f1622e4056	12.4 kB	Preview Download
NLP-on-multilingual-coin-datasets-1.0.0.zip md5:9fb339228f3d9e407b5272cd66b209c3	2.6 MB	Preview Download

Additional details

References: Dataset: 10.5281/zenodo.10033993 (DOI)

	All versions	This version
Views	1,090	470
Downloads	642	191
Data volume	3.1 GB	486.0 MB

Creating an Additional Class Layer with Machine Learning to counter Overfitting in an Unbalanced Ancient Coin Dataset

Creators

Description

Files

Gampe_Tolle-Additional_Class_Layer_v4.pdf

Files (5.1 MB)

Additional details

Related works