Arabic Sign Language (ArSL) Dataset
Authors/Creators
- 1. Ibn Tofail University
Description
Arabic Sign Language (ArSL) Dataset
Overview
This dataset contains right-hand landmark coordinates extracted from Arabic Sign Language (ArSL) video recordings using the MediaPipe framework. The dataset is designed for training and evaluating machine learning and deep learning models for ArSL recognition.
Dataset Description
- Total Samples: 7,010
- Number of Signs: 31 (28 Arabic alphabet letters + 3 control signs)
- Features per Sample: 89 (42 raw coordinates + 47 engineered geometric features)
- Data Format: CSV (Comma-Separated Values)
- Collection Method: Captured from Personal video recordings processed with MediaPipe Hands model
- Hand Landmarks: 21 keypoints per frame (x, y coordinates)
Sign Inventory
Arabic Alphabet (28 signs)
Alef (ا), Ba2 (ب), Ta2 (ت), Tha2 (ث), Jim (ج), 7a2 (ح), Kha2 (خ), Dal (د), Thal (ذ), Ra2 (ر), Zayn (ز), Sin (س), Chin (ش), SSad (ص), DDad (ض), TTa2 (ط), TTha2 (ظ), 3ayn (ع), Ghayn (غ), Fa2 (ف), 9af (ق), Kaf (ك), Lam (ل), Mim (م), Noon (ن), Ha2 (ه), Waw (و), Ya2 (ي)
Control Signs (3 signs)
- Space (مسافة): Letter separation
- Delete (مسح): Error correction
- Finish (إنهاء): Translation completion
Files Included
1. `ArSL_dataset.csv`
Primary dataset file with categorical sign labels.
Structure:
- Column 1: Sign label (categorical: Sign_Alef, Sign_Ba2, ..., Sign_Finish)
- Columns 2-43: Raw landmark coordinates (21 landmarks × 2 coordinates: x0, y0, x1, y1, ..., x20, y20)
- Columns 44-90: Engineered geometric features (47 features: means, angles, distances)
Total Columns: 90 (1 label + 89 features)
2. `ArSL_dataset_encoded.csv`
Dataset with one-hot encoded sign labels for direct model training.
Structure:
- Columns 1-89: Same 89 features as File 1
- Columns 90-120: One-hot encoded sign labels (31 binary columns)
Total Columns: 120 (89 features + 31 binary labels)
Feature Engineering Details
Raw Features (42 features)
MediaPipe hand landmarks: 21 keypoints × (x, y) coordinates
Engineered Features (47 features)
1. Mean Coordinates (18 features): Mean positions of key landmarks
- Pairs: [0,4], [0,8], [0,12], [0,16], [0,20], [4,8], [4,12], [4,16], [4,20]
2. Angular Features (14 features): Orientational relationships between landmarks
- Triplets: [1,2,3], [2,3,4], [0,5,6], [5,6,7], [6,7,8], [0,9,10], [9,10,11], [10,11,12], [0,13,14], [13,14,15], [14,15,16], [0,17,18], [17,18,19], [18,19,20]
3. Distance Features (15 features): Scale-invariant geometric measurements
- Pairs: [0,4], [0,8], [0,12], [0,16], [0,20], [4,8], [4,12], [4,16], [4,20], [8,12], [8,16], [8,20], [12,16], [12,20], [16,20]
Usage Example
# python
import pandas as pd
# Load dataset with categorical labels
df = pd.read_csv('arsl_dataset.csv')
# Or load one-hot encoded version
df_encoded = pd.read_csv('arsl_dataset_encoded.csv')
# Split features and labels
X = df_encoded.iloc[:, :89] # 89 features
y = df_encoded.iloc[:, 89:] # 31 one-hot labels
# Your ML/DL model training here
License
This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.
Contact
- Email: youssef.farhan@uit.ac.ma
- Institution: Ibn Tofail University, Kenitra, Morocco
Version History
- v1.0 (January 2025): Initial release with 7,010 samples
Acknowledgments
This dataset was created as part of research on Arabic Sign Language recognition systems to enhance communication accessibility for the Arabic-speaking deaf community.
Related Publications
This dataset has been used in the following research:
1. "Arabic Sign Language Detection using MediaPipe and Machine Learning Techniques." International Conference on Computational Intelligence Approaches and Applications (ICCIAA), April 28-30, 2025. doi: 10.1109/ICCIAA65327.2025.11013250.
2. "Arabic Sign Language Recognition using MediaPipe and Deep Neural Network." International Conference on Optimization and Applications (ICOA), October 16-17, 2025. doi: 10.1109/ICOA66896.2025.11236819.
3. "Comparative Analysis of Machine Learning and Deep Learning Models for Arabic Sign Language Recognition: Performance, Complexity, and Efficiency Evaluation." [in preparation]
Files
ArSL_dataset.csv
Files
(24.3 MB)
| Name | Size | Download all |
|---|---|---|
|
md5:80f92dfdf29019f83d110eb4d98e3b8c
|
12.0 MB | Preview Download |
|
md5:d423d899472800d73a8414b7ff7ec231
|
12.3 MB | Preview Download |
Additional details
Related works
- Is supplement to
- Conference paper: 10.1109/ICCIAA65327.2025.11013250 (DOI)
- Conference paper: 10.1109/ICOA66896.2025.11236819 (DOI)
References
- Zhang, F., Bazarevsky, V., Vakunov, A., Tkachenka, A., Sung, G., Chang, C.-L., & Grundmann, M. (2020). MediaPipe Hands: On‑device Real‑time Hand Tracking. arXiv:2006.10214. Accessed: Dec. 20, 2025. [Online]. Available: https://ai.google.dev/edge/mediapipe/solutions/vision/hand_landmarker?hl=fr