SWL-LSE: SignaMed Word-Level LSE, a Dataset of Spanish Sign Language Health Signs
Authors/Creators
-
Universidade de Vigo
(Hosting institution)
-
Alba-Castro, José Luis
(Data manager)1
-
Vázquez Enríquez, Manuel
(Data collector)1
- Pérez Pérez, Ania (Data curator)1
-
Cabeza-Pereiro, María del Carmen
(Data collector)
-
Docio-Fernandez, Laura
(Data manager)1
-
Confederación Estatal de Personas Sordas
(Data collector)
- FAXPG (Data collector)
Description
SWL-LSE Dataset
The SWL-LSE dataset is coined from SignaMed Word-Level LSE (Lengua de Signos Española -Spanish Sign Language).
Overview
The dataset consists of 8,000 sign sequences from 300 different sign classes related to the health domain. Each class is represented by an RGB video that serves as the dictionary sign. These dictionary signs were reproduced by 124 signers, including deaf individuals, interpreters, and L2 Spanish Sign Language (LSE) students, using their webcams or mobile phones via the SignaMed platform (https://signamed.web.app). For privacy reasons, only the skeleton data is shared.
The process of collecting the dataset is described in:
Vázquez-Enríquez, M.; Alba-Castro, J.L.; Pérez-Pérez, A.; Cabeza-Pereiro, C.; Docío-Fernández, L. SignaMed: a Cooperative
Bilingual LSE-Spanish Dictionary in the Healthcare Domain. In Proceedings of the Proceedings of the LREC-COLING 2024
11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources; Efthimiou, E.; Fotinea, S.E.; Hanke, T.; Hochgesang, J.A.; Mesch, J.; Schulder, M., Eds., Torino, Italia, 2024; pp. 386–394.
The dataset itself and the pipeline for training and executing a baseline model based on skeletons is described in this github (https://github.com/mvazquezgts/SWL-LSE), and this paper:
Vázquez-Enríquez, M.; Alba-Castro, J.L.; Docío-Fernández, L.; Rodríguez-Banga, E. SWL-LSE: A Dataset of Spanish Sign Language Health Signs with an ISLR Baseline Method. Technologies 2024, 12(10), 205, D.O.I:10.3390/technologies12100205
Files
1. VIDEOS_REF.zip
- Description: RGB videos recorded in lab conditions that represent each sign-class
- Total files: 300
2. videos_ref_annotations.csv
- Description: CSV file with the correspondence between the name of the video, its class ID and gloss in spanish: FILENAME,CLASS_ID,LABEL.
- Total files: 1
3. ANNOTATIONS.zip
- Description: 3 CSV files with train, validation and test file-class correspondences: FILENAME,CLASS_ID
- Total files: 3
4. MEDIAPIPE.zip
- Description: Pickle files containing the full output of Mediapipe using their Heavy model. Each .pkl file contains the outputs of Mediapipe Holistic legacy, Mediapipe Pose and Mediapipe Hands. Each file is package as a dictionary: dict_keys(['pose', 'hands', 'holistic_legacy'])
- Total files: 8000
Usage
Researchers and practitioners in pattern recognition, machine learning, and sign language linguistics may find this dataset valuable for:
- Training/testing machine learning models for isolated sign language recognition or gesture recognition.
- Analyzing patterns on signs realization
Acknowledgments
This dataset is a collaborative effort of the next research goups and entities:
- Group of Multimedia Technologies (GTM) from the atlanTTic Research Center of University of Vigo (Spain)
- Group of Discourse and Society (GRADES) from the School of Philology and Translation of University of Vigo (Spain)
- Federation of Deaf People Galician Associations (FAXPG)
- Fundación CNSE-DILSE
Gratitude is extended to them for their contributions and support.
Files
ANNOTATIONS.zip
Additional details
Funding
- Ministerio de Ciencia, Innovación y Universidades
- TECNOLOGIAS PARA LA INCLUSION EN LENGUA DE SIGNOS: BASES DE DATOS, RECONOCIMIENTO Y TRADUCCION PID2021-123988OB-C32 (financiado/a por MICIU/AEI /10.13039/501100011033)
- Ministerio de Ciencia, Innovación y Universidades
- DESARROLLO DE UNA APLICACION DE TRADUCCION DE LENGUA DE SIGNOS EN ENTORNOS DE SALUD PDC2022-133766-I00
References
- Vázquez-Enríquez, M.; Alba-Castro, J.L.; Pérez-Pérez, A.; Cabeza-Pereiro, C.; Docío-Fernández, L. SignaMed: a Cooperative Bilingual LSE-Spanish Dictionary in the Healthcare Domain. In Proceedings of the Proceedings of the LREC-COLING 2024 11th Workshop on the Representation and Processing of Sign Languages: Evaluation of Sign Language Resources; Efthimiou, E.; Fotinea, S.E.; Hanke, T.; Hochgesang, J.A.; Mesch, J.; Schulder, M., Eds., Torino, Italia, 2024; pp. 386–394.
- Vázquez-Enríquez, M.; Alba-Castro, J.L.; Docío-Fernández, L.; Rodríguez-Banga, E. SWL-LSE: A Dataset of Health-Related Signs in Spanish Sign Language with an ISLR Baseline Method. Technologies 2024, 12, 205. https://doi.org/10.3390/technologies12100205