Quran Speech Recognition Dataset

Quran Speech Recognition Dataset This dataset is designed for training and evaluating Quranic speech recognition models, with a focus on syllable-based transcription. It includes audio recordings and corresponding syllabized transcriptions from professional reciters and regular users, ensuring coverage of real-world scenarios. Research Group for Quranic Speech Recognition Tarek ELDEEB Dr.Moustafa Elshafi - Zewailcity Creative Commons Attribution 4.0 International (CC BY 4.0) Arabic (ar) https://archive.org/details/quran-speech-dataset https://utweb.rainberrytv.com/gui/share.html#link=magnet%3A%3Fxt%3Durn%3Abtih%3A34e50d0fd9afb7f308883b14e5e60f6532e30141%26dn%3Dquran-speech-dataset%26ws%3Dhttp%253a%252f%252fia601500.us.archive.org%252f23%252fitems%252f%26tr%3Dhttp%253a%252f%252fbt1.archive.org%253a6969%252fannounce%26tr%3Dhttp%253a%252f%252fbt2.archive.org%253a6969%252fannounce 11004 63691.96 17.69 24.71 16000 Hz 16-bit Mono (1) Audio recordings from professional Quranic reciters, syllabized using automated Tajweed rule-based software. Preprocessing steps included text normalization and resampling to match the required format. 2823 21161.93 5.88 24.83 16000 Hz 16-bit Mono (1) 1169 1654 Test set includes a mix of professional reciters and regular users to simulate real-world usage scenarios. Transcriptions were syllabized using automated software based on Tajweed rules. Removed unnecessary characters and ensured syllable alignment with Quranic Tajweed rules. Resampled all audio files to 16 kHz to match the Wav2Vec 2.0 pre-trained model requirements.