munish0838 commited on
Commit
be457b0
Β·
verified Β·
1 Parent(s): 8bb98cc

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +131 -0
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+
4
+ license: llama3
5
+
6
+ ---
7
+
8
+ ![](https://lh7-rt.googleusercontent.com/docsz/AD_4nXeiuCm7c8lEwEJuRey9kiVZsRn2W-b4pWlu3-X534V3YmVuVc2ZL-NXg2RkzSOOS2JXGHutDuyyNAUtdJI65jGTo8jT9Y99tMi4H4MqL44Uc5QKG77B0d6-JfIkZHFaUA71-RtjyYZWVIhqsNZcx8-OMaA?key=xt3VSDoCbmTY7o-cwwOFwQ)
9
+
10
+ # QuantFactory/DiarizationLM-8b-Fisher-v1-GGUF
11
+ This is quantized version of [google/DiarizationLM-8b-Fisher-v1](https://huggingface.co/google/DiarizationLM-8b-Fisher-v1) created using llama.cpp
12
+
13
+ # Original Model Card
14
+
15
+
16
+ **This is not an officially supported Google product.**
17
+
18
+ ## Overview
19
+
20
+ Note: This model is outdated. Please use [google/DiarizationLM-8b-Fisher-v2](https://huggingface.co/google/DiarizationLM-8b-Fisher-v2) instead.
21
+
22
+ [DiarizationLM](https://arxiv.org/abs/2401.03506) model finetuned
23
+ on the training subset of the Fisher corpus.
24
+
25
+ * Foundation model: [unsloth/llama-3-8b-bnb-4bit](https://huggingface.co/unsloth/llama-3-8b-bnb-4bit)
26
+ * Finetuning scripts: https://github.com/google/speaker-id/tree/master/DiarizationLM/unsloth
27
+
28
+ ## Training config
29
+
30
+ This model is finetuned on the training subset of the Fisher corpus, using a LoRA adapter of rank 256. The total number of training parameters is 671,088,640. With a batch size of 16, this model has been trained for 25400 steps, which is ~8 epochs of the training data.
31
+
32
+ We use the `mixed` flavor during our training, meaning we combine data from `hyp2ora` and `deg2ref` flavors. After the prompt builder, we have a total of 51,063 prompt-completion pairs in our training set.
33
+
34
+ The finetuning took more than 4 days on a Google Cloud VM instance that has one NVIDIA A100 GPU with 80GB memory.
35
+
36
+ The maximal length of the prompt to this model is 6000 characters, including the " --> " suffix. The maximal sequence length is 4096 tokens.
37
+
38
+ ## Metrics
39
+
40
+ ### Fisher testing set
41
+
42
+ | System | WER (%) | WDER (%) | cpWER (%) |
43
+ | ------- | ------- | -------- | --------- |
44
+ | USM + turn-to-diarize baseline | 15.48 | 5.32 | 21.19 |
45
+ | + This model | - | 4.40 | 19.76 |
46
+
47
+ ### Callhome testing set
48
+
49
+ | System | WER (%) | WDER (%) | cpWER (%) |
50
+ | ------- | ------- | -------- | --------- |
51
+ | USM + turn-to-diarize baseline | 15.36 | 7.72 | 24.39 |
52
+ | + This model | - | 12.27 | 30.80 |
53
+
54
+ ## Usage
55
+
56
+ First, you need to install two packages:
57
+
58
+ ```
59
+ pip install transformers diarizationlm
60
+ ```
61
+
62
+ On a machine with GPU and CUDA, you can use the model by running the following script:
63
+
64
+ ```python
65
+ from transformers import LlamaForCausalLM, AutoTokenizer
66
+ from diarizationlm import utils
67
+
68
+ HYPOTHESIS = """<speaker:1> Hello, how are you doing <speaker:2> today? I am doing well. What about <speaker:1> you? I'm doing well, too. Thank you."""
69
+
70
+ print("Loading model...")
71
+ tokenizer = AutoTokenizer.from_pretrained("google/DiarizationLM-8b-Fisher-v1", device_map="cuda")
72
+ model = LlamaForCausalLM.from_pretrained("google/DiarizationLM-8b-Fisher-v1", device_map="cuda")
73
+
74
+ print("Tokenizing input...")
75
+ inputs = tokenizer([HYPOTHESIS + " --> "], return_tensors = "pt").to("cuda")
76
+
77
+ print("Generating completion...")
78
+ outputs = model.generate(**inputs,
79
+ max_new_tokens = inputs.input_ids.shape[1] * 1.2,
80
+ use_cache = False)
81
+
82
+ print("Decoding completion...")
83
+ completion = tokenizer.batch_decode(outputs[:, inputs.input_ids.shape[1]:],
84
+ skip_special_tokens = True)[0]
85
+
86
+ print("Transferring completion to hypothesis text...")
87
+ transferred_completion = utils.transfer_llm_completion(completion, HYPOTHESIS)
88
+
89
+ print("========================================")
90
+ print("Hypothesis:", HYPOTHESIS)
91
+ print("========================================")
92
+ print("Completion:", completion)
93
+ print("========================================")
94
+ print("Transferred completion:", transferred_completion)
95
+ print("========================================")
96
+ ```
97
+
98
+ The output will look like below:
99
+
100
+ ```
101
+ Loading model...
102
+ Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
103
+ Loading checkpoint shards: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4/4 [00:13<00:00, 3.32s/it]
104
+ generation_config.json: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆοΏ½οΏ½β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 172/172 [00:00<00:00, 992kB/s]
105
+ Tokenizing input...
106
+ Generating completion...
107
+ Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
108
+ Decoding completion...
109
+ Transferring completion to hypothesis text...
110
+ ========================================
111
+ Hypothesis: <speaker:1> Hello, how are you doing <speaker:2> today? I am doing well. What about <speaker:1> you? I'm doing well, too. Thank you.
112
+ ========================================
113
+ Completion: <speaker:1> Hello, how are you doing today? <speaker:2> i am doing well. What about you? <speaker:1> i'm doing well, too. Thank you. [eod] [eod] <speaker:2
114
+ ========================================
115
+ Transferred completion: <speaker:1> Hello, how are you doing today? <speaker:2> I am doing well. What about you? <speaker:1> I'm doing well, too. Thank you.
116
+ ========================================
117
+ ```
118
+
119
+ ## Citation
120
+
121
+ Our paper is cited as:
122
+
123
+ ```
124
+ @article{wang2024diarizationlm,
125
+ title={{DiarizationLM: Speaker Diarization Post-Processing with Large Language Models}},
126
+ author={Quan Wang and Yiling Huang and Guanlong Zhao and Evan Clark and Wei Xia and Hank Liao},
127
+ journal={arXiv preprint arXiv:2401.03506},
128
+ year={2024}
129
+ }
130
+ ```
131
+