Published June 24, 2024 | Version 2
Dataset Open

Clothing Dataset for Second-Hand Fashion

  • 1. RISE Research Institutes of Sweden AB

Contributors

Description

Second-Hand Fashion Dataset

Overview

The dataset originates from projects focused on the sorting of used clothes within a sorting facility. The primary objective is to classify each garment into one of several categories to determine its ultimate destination: reuse, reuse outside Sweden (export), recycling, repair, remake, or thermal waste. 

The dataset has 31,997 clothing items, a massive update from the 3,000 items in version 1. The dataset collection started under the Vinnova funded project "AI for resource-efficient circular fashion" in Spring, 2022 and involves collaboration among three institutions: RISE Research Institutes of Sweden AB, Wargön Innovation AB, and Myrorna AB. The dataset has received further support through the EU project, CISUTAC (cisutac.eu).

Project page 

- Webpage: second-hand-fashion
- Contact: farrukh.nauman@ri.se

Dataset Details

- The dataset contains 31,997 clothing items, each with a unique item ID in a datetime format. The items are divided into three stations: `station1`, `station2`, and `station3`. The `station1` and `station2` folders contain images and annotations from Wargön Innovation AB, while the `station3` folder contains data from Myrorna AB. Each clothing item has three images and a JSON file containing annotations. 

- Three images are provided for each clothing item:
  1. Front view.
  2. Back view.
  3. Brand label close-up. About 4000-5000 brand images are missing because of privacy concerns: people's hands, faces, etc. Some clothing items did not have a brand label to begin with.

- Image resolutions are primarily in two sizes: `1280x720` and `1920x1080`. The background of the images is a table that used a measuring tape prior to January 2023, but later images have a square grid pattern with each square measuring `10x10` cm. 

- Each JSON file contains a list of annotations, some of which require nuanced interpretation (see `labels.py` for the options):
    - `usage`: Arguably the most critical label, usage indicates the garment's intended pathway. Options include 'Reuse,' 'Repair,' 'Remake,' 'Recycle,' 'Export' (reuse outside Sweden), and 'Energy recovery' (thermal waste). About 99% of the garments fall into the 'Reuse,' 'Export,' or 'Recycle' categories.
    - `price`: The price field should be viewed as suggestive rather than definitive. Pricing models in the second-hand industry vary widely, including pricing by weight, brand, demand, or fixed value. Wargön Innovation AB does not determine actual pricing.
    - `trend`: This field refers to the general style of the garment, not a time-dependent trend as in some other datasets (e.g., Visuelle 2.0). It might be more accurately labeled as 'style.'
    - `material`: Material annotations are mostly based on the readings from a Near Infrared (NIR) scanner and in some cases from the garment's brand label.
    - Damage-related attributes include:
        - `condition` (1-5 scale, 5 being the best)
        - `pilling` (1-5 scale, 5 meaning no pilling)
        - `stains`, `holes`, `smell` (each with options 'None,' 'Minor,' 'Major'). 
        
        Note: 'holes' and 'smell' were introduced after November 17th, 2022, and stains previously only had 'Yes'/'No' options. For `station1` and `station2`, we introduced additional damage location labels to assist in damage detection: 

            "damageimage": "back",
            "damageloc": "bottom left",
            "damage": "stain ",
            "damage2image": "front",
            "damage2loc": "None",
            "damage2": "",
            "damage3image": "back",
            "damage3loc": "bottom right",
            "damage3": "stain"

        Taken from `labels_2024_04_05_08_47_35.json` file. Additionally, we annotated a few hundred images with bounding box annotations that we aim to release at a later date.
    - `comments`: The comments field is mostly empty, but sometimes contains important information about the garment, such as a detailed text description of the damage. 

- Whenever possible, ISO standards have been followed to define these attributes on a 1-5 scale (e.g., `pilling`).

- Gold dataset: `Test` inside the comments field is meant for garments that were annotated multiple times by different annotators for annotator agreement comparisons. These 100 garments were annotated twice at Wargön Innovation AB (search within `station1/[dec2022,feb2023]`)and once at Myrorna AB (see `station3/test100` folder for JSON files containing their annotations).

- The data has been annotated by a group of expert second-hand sorters at Wargön Innovation AB and Myrorna AB. 

- Some attributes, such as `price`, should be considered with caution. Many distinct pricing models exist in the second-hand industry:
  - Price by weight
  - Price by brand and demand (similar to first-hand fashion)
  - Generic pricing at a fixed value (e.g., 1 Euro or 10 SEK)
  
  Wargön Innovation AB does not set the prices in practice and their prices are suggestive only (`station1` and `station2`). Myrorna AB (`station3`), in contrast, does resale and sets the prices. 

Comments

- We received feedback on our version 1 that some images were too blurry or had poor lighting. The image quality has slightly improved, but largely remains similar to release 1. 
- We further learned that a handful of data items were duplicates. Several duplicate images were removed, but about 400 still remain.
- Some users did not prefer a `tar.gz` format that we uploaded in version 1 of the dataset. We have now switched to `.zip` for convenience.
- Most JSON files parse fine using any standard JSON reader, but a handful that are problematic have been set aside in the `json_errors` folder. 
- Extra care was taken not to leak personal information. This is why you will not see any entries for `annotator` attribute in the JSON files in station1/sep2023 since people used their real names. Since then, we used internally assigned IDs. 
- Many brand images contained people's hands, faces, or other personal information. We have removed about 4000-5000 brand images for privacy reasons. 
- Please inform us immediately if you find any personal information revelations in the dataset:
    - Farrukh Nauman (RISE AB): `farrukh.nauman@ri.se`, 
    - Susanne Eriksson (Wargön Innovation AB): `susanne.eriksson@wargoninnovation.se`,  
    - Gabriella Engstrom (Wargön Innovation AB): `gabriella.engstrom@wargoninnovation.se`.

We went through 100k images three times to ensure no personal information is leaked, but we are human and can make mistakes.

Partners

The data collection for this dataset has been carried out in collaboration with the following partners:

1. RISE Research Institutes of Sweden AB: RISE is a leading research institute dedicated to advancing innovation and sustainability across various sectors, including fashion and textiles.

2. Wargön Innovation AB: Wargön Innovation is an expert in sustainable and circular fashion solutions, contributing valuable insights and expertise to the dataset creation.

3. Myrorna AB: Myrorna is Sweden's oldest chain of stores for collecting clothes and furnishings that can be reused. 

License

CC-BY 4.0. Please refer to the LICENSE file for more details. 

Acknowledgments

This dataset was made possible through the collaborative efforts of RISE Research Institutes of Sweden AB, Wargön Innovation AB, and Myrorna AB, with funding from Vinnova and support from the EU project CISUTAC. We extend our gratitude to all the expert second-hand sorters and annotators who contributed their expertise to this project.

Files

circular_fashion_v2.zip

Files (26.9 GB)

Name Size Download all
md5:d33884913c52a0c4e323787f70a182c1
26.9 GB Preview Download

Additional details

Funding

CISUTAC – Circular & Sustainable Textiles & Clothing 101060375
European Commission