can't be loaded into llama.cpp
can't be loaded into llama.cpp. Encountered validation error.
ValidationError: 1 validation error for LlamaCpp
__root__
Could not load Llama model from path: /Users/fred/Documents/models/AquilaChat2-7B-16K.Q4_0.gguf. Received error (type=value_error)
Sorry for the inconvenience caused.
In fact, I didn't expect that someone would use it, since it is already an old model. And due to my poor network (500 kb/s at most) and my status as a middle school student, I planned to test it after the Spring Festival so that I can have plenty time to download and test it.
I tested it on Google Colab just now, and found it was sure to have some issues. But I don't know how to fix it.
Here is the log:
bin/main -m AquilaChat2-7B-16K.Q2_K.gguf --color -c 16384 --temp 0.7 --repeat_penalty 1.1 -n -1 -i -ins # command
Log start
main: build = 1 (d62520e)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed = 1706691807
llama_model_loader: loaded meta data with 22 key-value pairs and 291 tensors from AquilaChat2-7B-16K.Q2_K.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.name str = ..
llama_model_loader: - kv 2: llama.context_length u32 = 16384
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096
llama_model_loader: - kv 4: llama.block_count u32 = 32
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 11008
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 32
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 10: llama.rope.scaling.type str = linear
llama_model_loader: - kv 11: llama.rope.scaling.factor f32 = 8.000000
llama_model_loader: - kv 12: general.file_type u32 = 10
llama_model_loader: - kv 13: tokenizer.ggml.model str = llama
llama_model_loader: - kv 14: tokenizer.ggml.tokens arr[str,100008] = ["<|endoftext|>", "!", "\"", "#", "$"...
llama_model_loader: - kv 15: tokenizer.ggml.scores arr[f32,100008] = [-1000.000000, -1000.000000, -1000.00...
llama_model_loader: - kv 16: tokenizer.ggml.token_type arr[i32,100008] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 17: tokenizer.ggml.bos_token_id u32 = 0
llama_model_loader: - kv 18: tokenizer.ggml.eos_token_id u32 = 0
llama_model_loader: - kv 19: tokenizer.ggml.unknown_token_id u32 = 0
llama_model_loader: - kv 20: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 21: general.quantization_version u32 = 2
llama_model_loader: - type f32: 65 tensors
llama_model_loader: - type q2_K: 129 tensors
llama_model_loader: - type q3_K: 96 tensors
llama_model_loader: - type q6_K: 1 tensors
llama_model_load: error loading model: _Map_base::at
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'AquilaChat2-7B-16K.Q2_K.gguf'
main: error: unable to load model
I guess it is caused by my wrong selection of the vocab-type, but I'm not sure. I will test my thought later.
To be honest, it's really weird that it can be quantified without any problems, but then came into crash when running it.
Sorry for the inconvenience again. And thank you for pointing such a serious issue.
对造成的不便深表歉意。
事实上,我完全没想到有人会使用它,因为这已经是一个很老的模型了。我原本打算在春节之后再测试的,这样我就有充足的时间去下载和测试它,因为我网络实在是太差了(最高500kb/s),而且我是个中学生。
我刚刚在Google Colab上测试了它,然后我确定它确实存在问题。但我目前不知道如何修复它。
日志见上方英文部分。
我猜测这是由于我选用了错误的vocab-type
导致的,但我不确定。我之后会测试自己的想法。
老实说,这真的很奇怪,它明明可以没有任何问题地量化,但是运行起来就崩溃了。
再次表示歉意。感谢您指出一个如此严重的问题。
My guess is correct, it is indeed a problem with vocab-type
.
I fixed this issue just now. I will push it as soon as I finished quantising.
Thanks for your test and support!!!
我的猜测是正确的,确实是vocab-type
的问题。
我修好了这个Bug!等我量化完毕我就会立刻上传。
非常感谢你的测试和支持!!!
slp now. I will push the commit tomorrow morning (UTC+8).
睡了。我会在明天早上(UTC+8)推送所有更新。
好吧我可以说中文的。真了不起你还只是个中学生就对前沿科技有这么多研究。我现在是在比较不同的中文模型,懒得下载tensor来转换了就在HF上找现成的gguf。Aquila的7B聊天模型还只有你这里有上传呢哈哈。
话说这个模型很老/发布很久了吗?我只是在llama.cpp上看到支持就跑去官网看了眼,然后搜gguf就来了这里。
Aquila的7B聊天模型还只有你这里有上传呢哈哈。
这倒是(
感觉有可能是因为大家都不太关心除Mistral的7B以外的模型吧(毕竟Mistral的7B模型确实是强)。我完全是因为电脑显卡太烂了又没有什么好中文模型才想着量化一个垫垫底的(
好吧我可以说中文的。
我当练英语了
话说这个模型很老/发布很久了吗?
挺老的了,我放假前一个月就已经想着要量化它了,那时这个模型都已经躺在我电脑1个月多了,发布时间只会更早。
不过看你怎么定义这个老吧(
懒得下载tensor来转换了就在HF上找现成的gguf
事实上,坑有点多(
llama.cpp对Aquila系列模型的支持应该在他们加入了hfft
这个vocab-type
之后就被破坏了。我之前之所以认为我量化错误的原因是vocab-type
选择错误是因为
我放假前一个月就已经想着要量化它了
这个原因。那时候还没有hfft
这个vocab-type
,用bpe
才能正常量化。现在用bpe
量化会报错,于是我一个一个地实验,最后发现hfft
才能正常量化,于是我就推送了commit。
然后,你就发了这条issue(
我最后的解决方案是:对llama.cpp git checkout 0a7c980
,回退到没加入hfft
并且也有对最新gguf
格式支持的commit,再使用bpe
进行量化,这次才终于成功了。
所以你要是自己下载原模型转换的话,指不定会吃多少苦头(我找了一圈llama.cpp的GitHub issues都没发现有人报告这个问题)(
感谢!已测试,没有问题!春节快乐!
感谢!已测试,没有问题!春节快乐!
春节快乐!