Quran Speech Recognition Dataset
This dataset is designed for training and evaluating Quranic speech recognition models, with a focus on syllable-based transcription. It includes audio recordings and corresponding syllabized transcriptions from professional reciters and regular users, ensuring coverage of real-world scenarios.
Research Group for Quranic Speech Recognition
Tarek ELDEEB
Dr.Moustafa Elshafi - Zewailcity
Creative Commons Attribution 4.0 International (CC BY 4.0)
Arabic (ar)
https://archive.org/details/quran-speech-dataset
https://utweb.rainberrytv.com/gui/share.html#link=magnet%3A%3Fxt%3Durn%3Abtih%3A34e50d0fd9afb7f308883b14e5e60f6532e30141%26dn%3Dquran-speech-dataset%26ws%3Dhttp%253a%252f%252fia601500.us.archive.org%252f23%252fitems%252f%26tr%3Dhttp%253a%252f%252fbt1.archive.org%253a6969%252fannounce%26tr%3Dhttp%253a%252f%252fbt2.archive.org%253a6969%252fannounce
11004
63691.96
17.69
24.71
16000 Hz
16-bit
Mono (1)
Audio recordings from professional Quranic reciters, syllabized using automated Tajweed rule-based software. Preprocessing steps included text normalization and resampling to match the required format.
2823
21161.93
5.88
24.83
16000 Hz
16-bit
Mono (1)
1169
1654
Test set includes a mix of professional reciters and regular users to simulate real-world usage scenarios. Transcriptions were syllabized using automated software based on Tajweed rules.
Removed unnecessary characters and ensured syllable alignment with Quranic Tajweed rules.
Resampled all audio files to 16 kHz to match the Wav2Vec 2.0 pre-trained model requirements.