Published May 23, 2025 | Version v4
Dataset Open

Metal Arc Welding

Description

Predictive Quality Arc Welding Dataset

The dataset comprises various current and voltage time series. Both currents and voltages are synchronously sampled at a frequency 100 kHz, with a maximum permissible error of 0.5%.

 

Preprocessed Data

Column Name   Description

------------  -------------------------------------------------------------

labels         Quality label (0: bad weld quality | 1: good weld quality | -1: no label)

exp_ids       ID of the experiment run

welding_run_id : ID of the welding run

V_000           Voltage at the beginning of the cycle (t_0)

...           Voltage from (t_1) to (t_198)

V_199         Voltage at the end of the cycle

I_000           Current at the beginning of the cycle (t_0)

...           Current from (t_1) to (t_198)

I_199         Current at the end of the cycle

Code Sample Reading the Data

import logging
import numpy as np
import pandas as pd  


def convert_to_np(data: pd.DataFrame) -> tuple[np.ndarray, np.ndarray, np.ndarray, np.ndarray]:
    """
    Convert DataFrame to numpy arrays, separating labels, experiment IDs, and features.

    Args:
        data (pd.DataFrame): Input DataFrame containing 'labels', 'exp_ids', and feature columns.

    Returns:
        tuple: A tuple containing:
            - labels (np.ndarray): Array of labels
            - exp_ids (np.ndarray): Array of experiment IDs
            - data (np.ndarray): Combined array of current and voltage features
    """
    logging.info(f"Converting data to numpy array")
    labels, exp_ids, welding_run_ids = data["label"].values, data["exp_id"].values, data["welding_run_id"].values 
    data = data.drop(columns=["label", "exp_id"])
    
    cols_v = data.columns[data.columns.str.startswith("V")]
    cols_i = data.columns[data.columns.str.startswith("I")]
    
    current_data = data[cols_i].values
    voltage_data = data[cols_v].values

    data = np.stack([current_data, voltage_data], axis=2)

    return labels, exp_ids, welding_run_ids, data


if __name__ == "__main__":
    logging.basicConfig(level=logging.INFO)

    data_path = "data/welding_data.csv"
    data = pd.read_csv(data_path)
    labels, exp_ids, welding_run_ids, data = convert_to_np(data)
    logging.info(f"Data shapes - labels: {labels.shape}, exp_ids: {exp_ids.shape}, welding_run_ids: {welding_run_ids.shape}, data: {data.shape}")  

 

Files

welding_data.csv

Files (2.7 GB)

Name Size Download all
md5:63489186e5ddae7fed6941444041cb68
2.7 GB Preview Download

Additional details

Related works

Is cited by
Publication: 10.1016/j.procir.2023.09.123 (DOI)
Publication: 10.1145/3627673.368003 (DOI)