FluentSigners-50

Overview

This paper presents a new dataset for Kazakh-Russian Sign Language (KRSL) created for the purposes of Sign Language Processing. In 2020, Kazakhstan's schools were quickly switched to online mode due to COVID-19 pandemic. Every working day, the El-arna TV channel was broadcasting video lessons for grades from 1 to 11 with sign language translation. This opportunity allowed us to record a corpus with a large vocabulary and spontaneous SL interpretation. To this end, this corpus contains video recordings of Kazakhstan's online school translated to Kazakh-Russian sign language by 7 interpreters. At the moment we collected and cleaned 890 hours of video material. A custom annotation tool was created to make the process of data annotation simple and easy-to-use by deaf community. To date, around 300 hours of videos have been annotated with glosses and 4,009 lessons out of 4,547 were transcribed with automatic speech-to-text software.

Download

Sample videos from dataset
Annotations: Transcripts, Gloss annotation

We are currently working on decreasing the overal size of the dataset (>250 GB) and converting videos to the same format. We will upload finalized dataset by the end of June 2022.

Citation

Please cite the following reference in papers using this dataset:

Acknowledgment

This work was supported by the Nazarbayev University Faculty Development Competitive Research Grant Program 2019-2021 "Kazakh Sign Language Automatic Recognition System (K-SLARS)". Award number is 110119FD4545".

KRSL-OnlineSchool: Large Vocabulary Kazakh-Russian Sign Language Dataset

Overview

Download

Citation

Acknowledgment