Ayoub-Laachir commited on
Commit
d021e05
·
verified ·
1 Parent(s): 1744f1a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +111 -0
README.md CHANGED
@@ -59,6 +59,117 @@ These metrics demonstrate the model's ability to accurately transcribe Moroccan
59
 
60
  The fine-tuned model shows improved handling of Darija-specific words, sentence structure, and overall accuracy.
61
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
62
  ## Challenges and Future Improvements
63
  ### Challenges Encountered
64
  - Diverse spellings of words in Moroccan Darija
 
59
 
60
  The fine-tuned model shows improved handling of Darija-specific words, sentence structure, and overall accuracy.
61
 
62
+ ## Audio Transcription Script with PEFT Layers
63
+
64
+ This script demonstrates how to transcribe audio files using the fine-tuned Whisper Large V3 model for Moroccan Darija, incorporating PEFT (Parameter-Efficient Fine-Tuning) layers for improved performance.
65
+
66
+ ### Required Libraries
67
+
68
+ Before running the script, ensure you have the following libraries installed. You can install them using:
69
+
70
+ ```bash
71
+ !pip install --upgrade pip
72
+ !pip install --upgrade transformers accelerate librosa soundfile pydub
73
+ !pip install peft==0.3.0 # Install PEFT library
74
+ ```
75
+ ```python
76
+ import torch
77
+ from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline
78
+ import librosa
79
+ import soundfile as sf
80
+ from pydub import AudioSegment
81
+ from peft import PeftModel, PeftConfig # Import PEFT classes
82
+
83
+ # Set the device to GPU if available, else use CPU
84
+ device = "cuda:0" if torch.cuda.is_available() else "cpu"
85
+ torch_dtype = torch.float16 if torch.cuda.is_available() else torch.float32
86
+
87
+ # Configuration for the base Whisper model
88
+ base_model_name = "openai/whisper-large-v3" # Base model for Whisper
89
+ processor = AutoProcessor.from_pretrained(base_model_name) # Load the processor
90
+
91
+ # Load your fine-tuned model configuration
92
+ model_name = "Ayoub-Laachir/MaghrebVoice_OnlyLoRaLayers" # Fine-tuned model with LoRA layers
93
+ peft_config = PeftConfig.from_pretrained(model_name) # Load PEFT configuration
94
+
95
+ # Load the base model
96
+ base_model = AutoModelForSpeechSeq2Seq.from_pretrained(base_model_name).to(device) # Load the base model
97
+
98
+ # Load the PEFT model
99
+ model = PeftModel.from_pretrained(base_model, model_name).to(device) # Load the PEFT model
100
+
101
+ # Merge the LoRA weights with the base model
102
+ model = model.merge_and_unload() # Combine the LoRA weights into the base model
103
+
104
+ # Configuration for transcription
105
+ config = {
106
+ "language": "arabic", # Language for transcription
107
+ "task": "transcribe", # Task type
108
+ "chunk_length_s": 30, # Length of each audio chunk in seconds
109
+ "stride_length_s": 5, # Overlap between chunks in seconds
110
+ }
111
+
112
+ # Initialize the automatic speech recognition pipeline
113
+ pipe = pipeline(
114
+ "automatic-speech-recognition",
115
+ model=model, # Use the merged model
116
+ tokenizer=processor.tokenizer,
117
+ feature_extractor=processor.feature_extractor,
118
+ torch_dtype=torch_dtype,
119
+ device=device,
120
+ chunk_length_s=config["chunk_length_s"],
121
+ stride_length_s=config["stride_length_s"],
122
+ )
123
+
124
+ # Convert audio to 16kHz sampling rate
125
+ def convert_audio_to_16khz(input_path, output_path):
126
+ audio, sr = librosa.load(input_path, sr=None) # Load the audio file
127
+ audio_16k = librosa.resample(audio, orig_sr=sr, target_sr=16000) # Resample to 16kHz
128
+ sf.write(output_path, audio_16k, 16000) # Save the converted audio
129
+
130
+ # Format time in HH:MM:SS.milliseconds
131
+ def format_time(seconds):
132
+ hours = int(seconds // 3600)
133
+ minutes = int((seconds % 3600) // 60)
134
+ seconds = seconds % 60
135
+ return f"{hours:02d}:{minutes:02d}:{seconds:06.3f}"
136
+
137
+ # Transcribe audio file
138
+ def transcribe_audio(audio_path):
139
+ try:
140
+ result = pipe(audio_path, return_timestamps=True) # Transcribe audio and get timestamps
141
+ return result["chunks"] # Return transcription chunks
142
+ except Exception as e:
143
+ print(f"Error transcribing audio: {e}")
144
+ return None
145
+
146
+ # Main function to execute the transcription process
147
+ def main():
148
+ # Specify input and output audio paths (update paths as needed)
149
+ input_audio_path = "/path/to/your/input/audio.mp3" # Replace with your input audio path
150
+ output_audio_path = "/path/to/your/output/audio_16khz.wav" # Replace with your output audio path
151
+
152
+ # Convert audio to 16kHz
153
+ convert_audio_to_16khz(input_audio_path, output_audio_path)
154
+
155
+ # Transcribe the converted audio
156
+ transcription_chunks = transcribe_audio(output_audio_path)
157
+
158
+ if transcription_chunks:
159
+ print("WEBVTT\n") # Print header for WEBVTT format
160
+ for chunk in transcription_chunks:
161
+ start_time = format_time(chunk["timestamp"][0]) # Format start time
162
+ end_time = format_time(chunk["timestamp"][1]) # Format end time
163
+ text = chunk["text"] # Get the transcribed text
164
+ print(f"{start_time} --> {end_time}") # Print time range
165
+ print(f"{text}\n") # Print transcribed text
166
+ else:
167
+ print("Transcription failed.")
168
+
169
+ if __name__ == "__main__":
170
+ main()
171
+ ```
172
+
173
  ## Challenges and Future Improvements
174
  ### Challenges Encountered
175
  - Diverse spellings of words in Moroccan Darija