Published January 24, 2026 | Version v1.0
Dataset Open

Arabic Sign Language (ArSL) Dataset

  • 1. Ibn Tofail University

Description

Arabic Sign Language (ArSL) Dataset

Overview

This dataset contains right-hand landmark coordinates extracted from Arabic Sign Language (ArSL) video recordings using the MediaPipe framework. The dataset is designed for training and evaluating machine learning and deep learning models for ArSL recognition.

Dataset Description

  • Total Samples: 7,010
  • Number of Signs: 31 (28 Arabic alphabet letters + 3 control signs)
  • Features per Sample: 89 (42 raw coordinates + 47 engineered geometric features)
  • Data Format: CSV (Comma-Separated Values)
  • Collection Method: Captured from Personal video recordings processed with MediaPipe Hands model
  • Hand Landmarks: 21 keypoints per frame (x, y coordinates)

Sign Inventory

Arabic Alphabet (28 signs)

Alef (ا), Ba2 (ب), Ta2 (ت), Tha2 (ث), Jim (ج), 7a2 (ح), Kha2 (خ), Dal (د), Thal (ذ), Ra2 (ر), Zayn (ز), Sin (س), Chin (ش), SSad (ص), DDad (ض), TTa2 (ط), TTha2 (ظ), 3ayn (ع), Ghayn (غ), Fa2 (ف), 9af (ق), Kaf (ك), Lam (ل), Mim (م), Noon (ن), Ha2 (ه), Waw (و), Ya2 (ي)

Control Signs (3 signs)

  • Space (مسافة): Letter separation
  • Delete (مسح): Error correction
  • Finish (إنهاء): Translation completion

Files Included

1. `ArSL_dataset.csv`

Primary dataset file with categorical sign labels.

Structure:

  • Column 1: Sign label (categorical: Sign_Alef, Sign_Ba2, ..., Sign_Finish)
  • Columns 2-43: Raw landmark coordinates (21 landmarks × 2 coordinates: x0, y0, x1, y1, ..., x20, y20)
  • Columns 44-90: Engineered geometric features (47 features: means, angles, distances)

Total Columns: 90 (1 label + 89 features)

2. `ArSL_dataset_encoded.csv`

Dataset with one-hot encoded sign labels for direct model training.

Structure:

  • Columns 1-89: Same 89 features as File 1
  • Columns 90-120: One-hot encoded sign labels (31 binary columns)

Total Columns: 120 (89 features + 31 binary labels)

Feature Engineering Details

Raw Features (42 features)

MediaPipe hand landmarks: 21 keypoints × (x, y) coordinates

Engineered Features (47 features)

1. Mean Coordinates (18 features): Mean positions of key landmarks
   - Pairs: [0,4], [0,8], [0,12], [0,16], [0,20], [4,8], [4,12], [4,16], [4,20]

2. Angular Features (14 features): Orientational relationships between landmarks
   - Triplets: [1,2,3], [2,3,4], [0,5,6], [5,6,7], [6,7,8], [0,9,10], [9,10,11], [10,11,12], [0,13,14], [13,14,15], [14,15,16], [0,17,18], [17,18,19], [18,19,20]

3. Distance Features (15 features): Scale-invariant geometric measurements
   - Pairs: [0,4], [0,8], [0,12], [0,16], [0,20], [4,8], [4,12], [4,16], [4,20], [8,12], [8,16], [8,20], [12,16], [12,20], [16,20]

Usage Example

# python
import pandas as pd

# Load dataset with categorical labels
df = pd.read_csv('arsl_dataset.csv')

# Or load one-hot encoded version
df_encoded = pd.read_csv('arsl_dataset_encoded.csv')

# Split features and labels
X = df_encoded.iloc[:, :89]  # 89 features
y = df_encoded.iloc[:, 89:]   #  31 one-hot labels

# Your ML/DL model training here

License

This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.

Contact

Version History

  • v1.0 (January 2025): Initial release with 7,010 samples

Acknowledgments

This dataset was created as part of research on Arabic Sign Language recognition systems to enhance communication accessibility for the Arabic-speaking deaf community.

 

Related Publications

This dataset has been used in the following research:

1. "Arabic Sign Language Detection using MediaPipe and Machine Learning Techniques." International Conference on Computational Intelligence Approaches and Applications (ICCIAA), April 28-30, 2025. doi: 10.1109/ICCIAA65327.2025.11013250.

2. "Arabic Sign Language Recognition using MediaPipe and Deep Neural Network." International Conference on Optimization and Applications (ICOA), October 16-17, 2025. doi: 10.1109/ICOA66896.2025.11236819.

3. "Comparative Analysis of Machine Learning and Deep Learning Models for Arabic Sign Language Recognition: Performance, Complexity, and Efficiency Evaluation." [in preparation]

Files

ArSL_dataset.csv

Files (24.3 MB)

Name Size Download all
md5:80f92dfdf29019f83d110eb4d98e3b8c
12.0 MB Preview Download
md5:d423d899472800d73a8414b7ff7ec231
12.3 MB Preview Download

Additional details

Related works

Is supplement to
Conference paper: 10.1109/ICCIAA65327.2025.11013250 (DOI)
Conference paper: 10.1109/ICOA66896.2025.11236819 (DOI)

References