Conference paper Open Access

Web acquired image datasets need curation: an examplar pipeline evaluated on Greek food images

Vasileios Sevetlidis; George Pavlidis; Vasileios Arampatzakis; Chairi Kiourt; Spyridon Mouroutsos; Antonios Gasteratos

Mining Web data to create AI-usable datasets, is still non-trivial. Unfortunately, despite the free data access, the formation of a dataset useful for machine learning applications cannot rely solely on a data mining phase. For any given query, the retrieved sample may include duplicated, misclassified or completely irrelevant content. The consequence of not “cleaning” those datasets is to end up with faulty, noisy and imbalanced datasets. Thus, curation is necessary, to tackle the variable degrees of inconsistency found on the retrieved samples. This paper suggests a pipeline consisting of state-of- the-art and off-the-shelf methods for curating an image dataset retrieved from the Web. As a case study, the pipeline is applied on expanding food datasets with currently uncategorized Greek dishes, leveraging information found in a specialized ontology, aiming at increasing the accuracy in food recognition applications.

Files (1.7 MB)
Name Size
1570734043 final.pdf
1.7 MB Download
All versions This version
Views 4848
Downloads 2727
Data volume 45.6 MB45.6 MB
Unique views 4646
Unique downloads 2424


Cite as