Carinthia dataset

Authors
Affiliations

Corinna Kofler

Sabrina Strauß

Anja Zernig

Ernesto Lazaro Garcia

Michael Boxleitner

Beatrix Mayr

Isabell Dicillia-Kovatsch

Claudia Anna Dohr

Published

February 27, 2024

Abstract
The Carinthia dataset contains Scanning Electron Microscope (SEM) images of defects found on one production layer of unstructured semiconductor wafers. The dataset consists of 4,591 images unevenly distributed in six defect classes.

The author(s) received funding from AIMS5.0 for this work. The project AIMS5.0 is supported by the Chips Joint Undertaking and its members, including the top-up funding by National Funding Authorities from involved countries under grant agreement no. 101112089. Views and opinions expressed are however those of the author(s) only and do not necessarily reflect those of the European Union or Chips Joint Undertaking. Neither the European Union nor the Chips Joint Undertaking can be held responsible for them.

Preview of the dataframe

Table 1 shows a preview of the dataframe stored in the file ‘carinthia.csv’. The dataframe consists of three columns, ‘image_path’, ‘file_name’ and ‘label’ from left to right and 4,591 rows. The column ‘image_path’ contains the relative paths to the images stored in the folder ‘images’. The column ‘file_name’ contains the filenames of the images and the column ‘label’ the corresponding defect classes of the images.

Number of images in the dataset:  4591
Number of columns:  3
Column headers from list(df.columns): ['image_path', 'file_name', 'label']
Table 1: Preview of the dataset.
image_path file_name label
0 data/images/2661734b21874fdf9e35a4319a9e7462.jpg 2661734b21874fdf9e35a4319a9e7462.jpg 6
1 data/images/0c8af9b8e62f4cb498392e806236e5e0.jpg 0c8af9b8e62f4cb498392e806236e5e0.jpg 6
2 data/images/ecb33a23f90b47fe8ee1dfe3e4d5f0df.jpg ecb33a23f90b47fe8ee1dfe3e4d5f0df.jpg 6
3 data/images/adc88a3f9fb34ae79142ba9f12a6f1e1.jpg adc88a3f9fb34ae79142ba9f12a6f1e1.jpg 6
4 data/images/bc228459f662459f80fbce99bedd9d20.jpg bc228459f662459f80fbce99bedd9d20.jpg 6

Class distribution of the Carinthia dataset

Figure 1 shows the defect class distribution of the Carinthia dataset. The y-axis depicts the six different defect classes and the x-axis the number of images. This bar chart shows that the dataset is highly imbalanced.

Figure 1: Defect class distribution of the dataset.

Image examples of all defect classes

Figure 2 shows image examples of all six defect classes. The defect classes are listed in the first column and the remaining 15 columns show randomly selected example images for each class. For the defect classes ‘2’ and ‘5’, which only contain eight and four images, all the images are shown and the remaining columns are empty. In the images belonging to defect class ‘6’, no defects are observed. However, such images can occur in production if the SEM tool is not properly aligned, resulting in the tool generating images of the area adjacent to the defect instead of the defect itself.

Figure 2: Example images of all classes.

Image features

The defects are usually in the center of the images . Additionally, for defect classes ‘1’ to ‘5’, a black border with a width of three pixels can be observed on the right side of the images. For defect class ‘6’, the black border has a width of two pixels. An example image is shown in Figure 3.

Figure 3: Typical example image of the Carinthia dataset.

Figure 4 shows one of the images with a square in the center below the actual defect as an example. Such squares can originate from framing effects induced by electrons during the alignment of the SEM tool.

Figure 4: Example image with a visible framing effect.