BOS token as 1 seriously hurts these GGUF Yi models

#2
by wolfram - opened

Same issue as reported by KerfuffleV2 here:

TheBloke/Nous-Capybara-34B-GGUF · BOS token as 1 seriously hurts these GGUF Yi models

Confirmed the issue and the workaround. @TheBloke Would it be worth it to apply the workaround on the repo so future downloads get the fixed version? Otherwise most downloaders probably won't know the model could be even better with this fix.

Updated GGUFs with the change to BOS token are uploading now.

As per @KerfuffleV2 's script and instructions, method used to create them was:

for file in *.gguf ; do /workspace/git/gguf-llama/gguf-py/scripts/gguf-set-metadata.py --force "$file" tokenizer.ggml.bos_token_id 144 ; done

Sample Q4_K_M generation

<|im_start|>system
You are an author who writes excellent stories about llamas<|im_end|>
<|im_start|>user
Write a story about llamas<|im_end|>
<|im_start|>assistant
Once upon a time, in the heart of the Andes Mountains, there lived a community of llamas. These llamas were not just any ordinary animals; they possessed magical powers that allowed them to communicate with one another and even with humans.

The llamas were led by an ancient wise llama named Inca, who had been around for centuries. Inca was revered by the other llamas as well as the villagers in the nearby town. The villagers often sought Inca's advice on various matters, including love, health, and prosperity.

One day, a group of explorers arrived at the village. They were searching for new lands to discover and map out. However, they had no idea about the magical powers of the llamas. As soon as they saw Inca, they were amazed by his wisdom and knowledge.

The leader of the explorers approached Inca and asked him if he knew any hidden treasures or secret paths that could lead them to undiscovered lands. Inca looked into the future and saw that these explorers would eventually discover America and bring back many great inventions and ideas. He decided to help them on their journey by sharing some of his knowledge with them.

Inca told the leader of the explorers about a hidden path through the mountains, which led to an ancient city filled with treasures beyond imagination. Intrigued, the leader asked Inca how he knew this information. Inca replied that it was part of his magical powers as a wise llama.

The explorers were thrilled and decided to follow Inca's instructions. They began their journey through the treacherous mountains, facing many challenges along the way. However, thanks to Inca's guidance, they eventually reached the ancient city filled with treasures.

As the explorers continued on their journey, they discovered many new lands and brought back valuable information that helped shape history. Their success can be attributed in part to the magical powers of the wise llama, Inca, who had shared his knowledge with them.

Years later, when people learned about the llamas' magical powers, they came from far and wide to seek guidance from Inca. The community of llamas grew stronger and more united, with each new member contributing their unique talents and abilities.

And so, through the wisdom and magic of Inca, the llamas continued to thrive in harmony with nature and help humans navigate their way through life's challenges.<|im_end|> [end of text]

Did anybody actually tested with bos_token_id set to 1 or 2 or 144, and found any substantial evidence what the fine tuned model would work better with? I guess the original discussion, https://github.com/01-ai/Yi/discussions/5 was mainly about how the base model had been trained, and the existence of token 1 was likely unknown to the base model. But for the fine tune here, wouldn't the actual bos id depend on how it's been fine tuned? Perhaps we need to ask @ehartford how or if axolotl sets bos before the <|im_start|> token, and what bos that is.

Updated GGUFs are uploaded.

Note there appears to be a problem with CUDA acceleration at the moment. Not related to this BOS change, just in general:

...................................................................................................
llama_new_context_with_model: n_ctx      = 4096
llama_new_context_with_model: freq_base  = 5000000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init: offloading v cache to GPU
llama_kv_cache_init: offloading k cache to GPU
llama_kv_cache_init: VRAM kv self = 960.00 MB
llama_new_context_with_model: kv self size  =  960.00 MB
llama_build_graph: non-view tensors processed: 1384/1384
llama_new_context_with_model: compute buffer total size = 499.57 MB
llama_new_context_with_model: VRAM scratch buffer: 498.00 MB
llama_new_context_with_model: total VRAM used: 20912.15 MB (model: 19454.15 MB, context: 1458.00 MB)

system_info: n_threads = 64 / 128 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 |
sampling:
    repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
    top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
    mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
generate: n_ctx = 4096, n_batch = 512, n_predict = -1, n_keep = 0



<|im_start|>system
You are an author who writes excellent stories about llamas<|im_end|>
<|im_start|>user
Write a story about llamas<|im_end|>
<|im_start|>assistant<h3>
CUDA error 716 at ggml-cuda.cu:7104: misaligned address
current device: 0

i will raise this with the llama.cpp team.

@mljxy we can see in Eric's model that his tokenizer_config.json is the same as the original Yi model with regard to add_bos_token, ie it's set to False

The issue as I understand is that the Yi models are set to not add the BOS token by default, which is still quite rare amongst models. llama.cpp has it hardcoded to always add the BOS token, so currently llama.cpp will be adding in an extra token that was not there during either base training or fine tuning. Eric's fine tuning doesn't change the BOS to <|im_start|>, it's still set to <|startoftext|>. And the user is expected to add <im_start|> in manually as part of the prompt template.

@KerfuffleV2 's workaround is to change BOS to a newline, so that even though it's still added erroneously, it'll effectively be ignored by the model as newlines tend to form part of normal text that doesn't have a huge effect on generation.

The better solution is a llama.cpp PR which will fix the issue properly, so that BOS is not added in the first place, respecting the original model tokenizer_config.json setting.

As to comparing the before/after, I have briefly, and the results were:

  1. This model, Dolphin 2.2 - no discernible difference
  2. CapyBara - prompts totally broken by the erroneous BOS edded.

My theory for that is that Eric's use of ChatML has insulated it from extra tokens seen before the start of the prompt template, whereas CapyBara's use of a simple USER: .. ASSISTANT: has not. But that's only speculation on my part.

I only tested a couple of simple single prompts, not a full chat, and even though I couldn't spot any degredation in this model I still think it's better to update this model as well, in case it could cause issues in other circumstances.

Here's the Capybara result:

OLD GGUF:

USER: write a story about llamas ASSISTANT: write a story</s> [end of text]

UPDATED GGUF:

USER: write a story about llamas ASSISTANT: Once upon a time, in the heart of the Andes Mountains, there was a small village named Llama-land. This village was known for its beautiful scenery and its love for animals, especially llamas. The people of Llama-land had domesticated these gentle creatures centuries ago, using them for transportation, agriculture, and even as companions.

In the center of the village, there was a large llama pasture where hundreds of llamas grazed peacefully during the day. They were cared for by expert herders who understood their behaviors and needs perfectly. The villagers believed that these animals brought good luck and prosperity to their community.

Among all the llamas in Llama-land, there was one named Lucky. He was a young male with beautiful brown fur, bright eyes, and an extremely friendly personality. Everyone who met him fell in love instantly, including a little girl named Mariana. She visited the llama pasture every day after school to spend time with her favorite friend, Lucky.

One sunny afternoon, while Mariana was playing with Lucky, she noticed something strange on his back - a small white patch shaped like a heart. The villagers had never seen anything like it before and soon word spread about this unusual marking on the beloved llama's fur. They all gathered around to see this amazing sight for themselves, marveling at the beauty of nature.

As days turned into weeks, the heart-shaped mark became more visible, and people from faraway places started visiting Llama-land just to meet Lucky. The villagers were proud of their special llama and took great care of him, ensuring he remained happy and healthy.

One day, a group of researchers arrived in Llama-land hoping to study the unique heart-shaped marking on Lucky's back. They believed that this rare occurrence could provide valuable insights into the genetics and behavior of llamas. The villagers welcomed them with open arms, eager to share their knowledge and love for these amazing animals.

After months of studying Lucky and his genetic makeup, the researchers discovered something incredible – the heart-shaped marking was not just a coincidence but rather a result of specific gene combinations that occurred very rarely among llamas. They also found that Lucky had unique personality traits compared to other llamas, making him even more special.

The news about Lucky's scientific significance spread across the globe, and Llama-land became a popular destination for tourists who wanted to meet the famous heart-shaped llama. The villagers took advantage of this opportunity by starting businesses related to tourism, such as guided tours, souvenir shops, and traditional food stalls.

Despite all the attention, Lucky remained humble and true to his nature. He continued spending his days grazing peacefully alongside Mariana, who never forgot how special their friendship was. The people of Llama-land cherished their bond with these amazing creatures even more than before, knowing that their love for llamas had made a significant impact on the world.

And so, the story of Lucky, the heart-shaped llama, lived on in the hearts and minds of everyone who visited Llama-land. His unique marking served as a reminder of the wonders of nature and the deep connection between humans and animals, inspiring people to appreciate and protect these incredible creatures for generations to come.</s> [end of text]

In my tests, I didn't see a noticeable change in this model's performance - which has been really good even before the fix. But it's good that the workaround was applied, just to be sure, until the proper fix is implemented inside llama.cpp and thus KoboldCpp.

Let me know if there's any action required on my part

@TheBloke

You're going to hate me, but correctly adding the add_bos_token flag metadata requires using this pull: https://github.com/ggerganov/llama.cpp/pull/4040

So you'd need to make at least this model and the Capybara one again using that. (The problem is they don't have a tokenizer.json, I don't think the original Yi models are affected).

I mentioned that here but you may have missed it: https://huggingface.co/TheBloke/Nous-Capybara-34B-GGUF/discussions/1#65530a38a69abe5afba6e9d9

You should see output like this when you use convert.py:

gguf: Setting special token type bos to 1
gguf: Setting special token type eos to 2
gguf: Setting special token type pad to 0
gguf: Setting add_bos_token to False
gguf: Setting add_eos_token to False

If you don't see the gguf: Setting add_bos_token to False part then something went wrong.

edit: If you want to @ me on Discord or something to have me check that it's working or whatever before regenerating a whole bunch of models, feel free.

Yeah I saw that, but I just did the workaround for now - changing the BOS token to newline, which fixes Capybara.

I didn't want to do the other change until the PR is merged. If I understand correctly, it doesn't even do anything yet - add_bos_token I mean? Like it has no effect on inference yet. So presumably even with that PR I'd still need to set BOS to newline as I've done here.

So I figured I'd wait for the PR to be merged, and once it is I can remake affected models with it

Is there some reason I'm missing to do it before it's merged?

@TheBloke

Is there some reason I'm missing to do it before it's merged?

That patch started out as making inference respect the add_bos_token metadata, but the other day I found there was a case where metadata didn't get added during conversion. So that fix is also included - that's what I'm talking about.

The reason to worry about it now is all the models created without that won't have the correct metadata. So for it to actually get respected correctly, you'll not only need to generate the new models (with that metadata) but every user will also have to download the models again to have a version with the metadata.

So you're correct that it has no effect on inference right now but ideally people will already have models that include the metadata so when it does get merged they can get the benefits. Hopefully that makes sense.

I just tested both versions of Dolphin, and the updated version appears to give better output, but it also doesn't output any newlines.

@HiroseKoichi Huh? It seems to have no trouble outputting newlines. With prompt Once upon a time, in a dark forest, there lived a little fox:

llm_load_print_meta: BOS token = 144 '
'  <--- Note the newline
llm_load_print_meta: EOS token = 7 '<|im_end|>'
llm_load_print_meta: UNK token = 0 '<unk>'
llm_load_print_meta: PAD token = 0 '<unk>'
llm_load_print_meta: LF token  = 315 '<0x0A>'
[...]
generate: n_ctx = 512, n_batch = 512, n_predict = 100, n_keep = 0



 Once upon a time, in a dark forest, there lived a little fox named Fluffy. Fluffy was curious about everything and loved going on adventures. One day, she decided to explore the depths of the forest and find new friends.

As she wandered deeper into the woods, she came across a beautiful crystal waterfall. The water shimmered with all colors of the rainbow, and Fluffy couldn't help but be drawn closer to it. As she approached, she noticed something peculiar – tiny, glowing creatures dancing around the waterfall!

"Who are

I'm using a fresh install of the latest version of TextGen Webui, and I used Q4_K_M for both models.

New version with the prompt 'Write a simple story about a man buying groceries.':

Sure, here's a simple story about a man buying groceries:John was feeling rather peckish after a long day at work. He decided to head to the local grocery store to pick up some food for dinner. As he perused the aisles, he couldn't help but feel overwhelmed by all the options available to him.After much deliberation, John finally settled on a selection of items that would make for a delicious meal. He grabbed a few fresh vegetables, some chicken, and a bottle of wine to wash it all down. As he stood in line to pay, he couldn't help but feel a sense of satisfaction at the thought of enjoying a home-cooked meal after such a long day.

Old version with the prompt 'Write a simple story about a man buying groceries.':

Sure, here's a simple story about a man buying groceries:

John was walking through the aisles of his local grocery store, pushing his cart filled with food items he needed for the week. He picked up some fresh produce, including apples, bananas, and avocados. He also stocked up on grains, pasta, and canned goods.

As he made his way to the checkout counter, John noticed that the line was quite long. He patiently waited his turn, taking a moment to look at the items in his cart and make sure he hadn't forgotten anything. When it was finally his turn, John unloaded his groceries onto the conveyor belt and the cashier scanned each item.

I can't recreate that issue with llama.cpp:

Test1:

<|im_start|>system
You are an author who writes stories<|im_end|>
<|im_start|>user
Write a simple story about a man buying groceries.<|im_end|>
<|im_start|>assistant
John was a middle-aged man with a wife and two children, living in the suburbs of New York City. Every Friday evening, it was his responsibility to go to the local supermarket and buy groceries for the weekend and the upcoming week.

As John made his way through the aisles of the grocery store, he noticed how busy it was. He could see that there were many people shopping this time of day, as they probably had the same idea as him - wanting to get their errands done before the weekend began.

John picked up his usual items: milk, bread, eggs, and juice for the kids. As he continued down each aisle, he made a mental list of everything else his family might need. He grabbed some fruits, vegetables, and meats for dinner, as well as snacks and cereals for breakfast. He also added some cleaning supplies to his cart, knowing that they were running low on laundry detergent and dishwasher pods.

As John neared the checkout lines, he noticed that there was a long waiting time due to the busy hour. He patiently stood in line with his cart full of groceries while observing all the other shoppers around him. It struck him how different everyone's shopping experience was - some people had only a few items, while others seemed to be restocking their entire pantry.

Finally, it was John's turn at the checkout counter. The cashier greeted him with a friendly smile and began scanning his items. As he paid for his groceries, he couldn't help but feel a sense of accomplishment and satisfaction knowing that his family would have everything they needed for the upcoming week.<|im_end|> [end of text]

Test2:

<|im_start|>system
You are an author who writes stories<|im_end|>
<|im_start|>user
Write a simple story about a man buying groceries.<|im_end|>
<|im_start|>assistant
John was a middle-aged man with a passion for cooking. Every weekend, he would visit the local grocery store to buy fresh ingredients for his elaborate meals. This particular Saturday morning, as he pushed his cart through the aisles filled with colorful produce and neatly stacked packages, John couldn't help but feel excited about what culinary delights he would create that evening.

First on his list was fresh vegetables. He picked up crisp lettuce leaves, juicy tomatoes, and tender bell peppers - all perfect for his famous salad. Next, he moved to the meat section where he carefully selected a succulent chicken breast, which would be cooked until it had just enough crunch on the outside while staying tender and juicy inside.

As John continued shopping, he couldn't ignore the sweet smell of baking that wafted from the bakery section. He decided to treat himself and picked up a warm loaf of crusty bread and a few buttery croissants for dessert. Finally, he made his way over to the spice aisle, adding herbs and spices that would give his dishes an extra burst of flavor - garlic powder, paprika, cayenne pepper, and rosemary.

With bags loaded with ingredients, John headed home eager to start cooking. He knew that tonight's dinner would be one to remember, not only because of the delicious food but also because it brought him joy in creating something special for those he loved. And so, as he began chopping vegetables and searing chicken, John smiled, knowing that every ingredient and step taken was an essential part of crafting a truly unforgettable meal.<|im_end|> [end of text]

Test3:

<|im_start|>system
You are an author who writes stories<|im_end|>
<|im_start|>user
Write a simple story about a man buying groceries.<|im_end|>
<|im_start|>assistant
John was a busy man, always running from one thing to the next. One day he realized that his fridge was empty and decided it was time to buy groceries. He made a list of all the items he needed: milk, bread, eggs, fruits, vegetables, and some snacks for his kids.

As John walked into the store, he noticed how crowded it was with people pushing their carts around. He took a deep breath and began collecting everything on his list. First, he found himself in front of the dairy section where he picked up two jugs of milk and four loaves of bread because you can never have too much bread at home!

Next, he moved onto the produce aisle where colorful fruits and vegetables filled every corner. He carefully selected apples, bananas, carrots, broccoli, and lettuce for his family's meals this week. Finally, he made sure to grab some cookies and chips for those late-night cravings that often sneak up on them.

John loaded all these items into his cart and headed towards the checkout lines. There were so many people waiting in line, but John patiently waited his turn. Once it was finally his turn, he unloaded all of his groceries onto the conveyor belt and watched as they went through the barcode scanner.

After paying for his purchases, John loaded everything into his car and drove back home with a full cart. He couldn't wait to see the look on his kids' faces when they saw all the delicious treats he had brought back!<|im_end|> [end of text]

Apparently in Dolphin, men are always called John :)

Test4:

<|im_start|>system
You are an author who writes stories<|im_end|>
<|im_start|>user
Write a long story about a woman buying shopping.<|im_end|>
<|im_start|>assistant
Once upon a time, in a quaint little village nestled between the rolling hills and the bustling city, there lived a woman named Margaret. Margaret was known for her generosity, kindness, and warm smile that could brighten anyone's day. She owned a small but thriving flower shop at the heart of the village, where she created beautiful arrangements to celebrate life's special moments.

One morning, as the sun began to rise, casting a golden hue on the cobblestone streets, Margaret decided it was time to restock her beloved flower shop. With a list of fresh flowers and supplies in hand, she embarked on her shopping journey. As she approached the entrance of the village market, she couldn't help but notice that something seemed different.

The marketplace was alive with activity - vendors called out their specials, children played games, and the air smelled like freshly baked bread. Margaret felt a sense of excitement building inside her as she began to explore the market. She passed by colorful stalls filled with ripe fruits, plump vegetables, and succulent meats. The aromas wafted through the air, making her stomach growl with anticipation.

As she continued walking, she stumbled upon a small, unassuming stall that caught her eye. It was lined with delicate fabrics in shades of ivory and gold, and displayed an array of handcrafted jewelry, each piece more exquisite than the last. Margaret's heart skipped a beat as she approached the stall. She had always been fascinated by artistry, and these unique creations were no exception.

The artist behind the stall was a middle-aged woman with salt-and-pepper hair and kind eyes that sparkled like the morning dew. Her name was Eleanor, and she greeted Margaret warmly. They exchanged pleasantries before delving into their shared love for creativity and beauty.

As Margaret perused the intricate designs, her hands touched a delicate necklace of gold beads adorned with tiny opalescent flowers. It felt as though the piece had been crafted just for her. She couldn't help but ask Eleanor about it, who revealed that it was an heirloom from her grandmother, passed down through generations.

Margaret knew in her heart that she needed to have this necklace. With a smile, she asked Eleanor if it was available for purchase. The artist hesitated, but after a moment's pause, she agreed to sell the precious piece to Margaret for a modest sum.

Overwhelmed with joy and gratitude, Margaret placed the necklace around her neck, feeling its warmth and energy pulsate through her body. She knew that this treasure would always hold special meaning in her life.

As their conversation continued, Eleanor confided that she was planning to close her stall and retire soon. She had been searching for someone who could carry on her grandmother's legacy of artistry, and Margaret's genuine love for the craft made her stand out as the perfect candidate.

Margaret hesitated at first, unsure if she could handle such a responsibility. But upon seeing the pride in Eleanor's eyes and the beauty of her creations, she couldn't deny her own passion and drive to learn more about this captivating art form. With a smile, they shook hands, sealing their newfound partnership.

The days turned into weeks, and soon enough, Margaret found herself immersed in the world of artistry. She spent long hours learning from Eleanor, absorbing every detail about the history, materials, and techniques used to create such stunning masterpieces. They worked side by side, forging not only an unbreakable bond but also a beautiful collection of unique pieces that captured their shared passion for beauty.

As the season changed, so did Margaret's life. Her flower shop began to incorporate Eleanor's exquisite jewelry and accessories, becoming a haven for both the artistry of nature and the artistry of human hands. The village flourished as word spread about this one-of-a-kind establishment, drawing in customers from far and wide.

Through it all, Margaret never forgot the simple shopping trip that changed her life forever. And as she stood in her now thriving business, surrounded by vibrant flowers and intricate artistry, she couldn't help but smile at the serendipity of it all. Little did she know that this seemingly ordinary shopping excursion would be the catalyst for a new chapter in both her own life and the lives of countless others who found solace, beauty, and inspiration within those precious walls.<|im_end|> [end of text]
Apparently in Dolphin, men are always called John :)

Just like I always see Lilly being used for women :). Do you know why this is?

@HiroseKoichi

I'm using a fresh install of the latest version of TextGen Webui, and I used Q4_K_M for both models.

It might be doing something like setting logit bias for the BOS token. You can try using 2 as the BOS token id instead and see how that works. You will need to have the llama.cpp repo checked out, be in that directory and have at least the numpy Python dependency installed. Then run:

python gguf-py/scripts/gguf-set-metadata.py /path/whatever.gguf tokenizer.ggml.bos_token_id 2

Or maybe there's some way to control that behavior in TextGen Webui. I'm not familiar with it.

By the way, if anyone wants to download a pre-updated GGUF, ie before I set BOS to newline, then you can do that using Hugging Face's revision system (HF is Git-based under the hood)

Here for example is the link to manually download the original Q4_K_M:
https://huggingface.co/TheBloke/dolphin-2_2-yi-34b-GGUF/blob/4d998f91545661e6813083b3d7cc1df2e0bd5eac/dolphin-2_2-yi-34b.Q4_K_M.gguf

It's a lot quicker just to re-set the metadata on your already-downloaded GGUF like Kerfuffle just described. But if anyone isn't comfortable doing that for any reason, or is doing a first-time download, then the original files are also still available.

(Not that I've personally seen any reason to make any change - the updated files seem to work great for me, with llama.cpp.)

That patch started out as making inference respect the add_bos_token metadata, but the other day I found there was a case where metadata didn't get added during conversion. So that fix is also included - that's what I'm talking about.

The reason to worry about it now is all the models created without that won't have the correct metadata. So for it to actually get respected correctly, you'll not only need to generate the new models (with that metadata) but every user will also have to download the models again to have a version with the metadata.

So you're correct that it has no effect on inference right now but ideally people will already have models that include the metadata so when it does get merged they can get the benefits. Hopefully that makes sense.

OK understood, thanks. I'll do them soon with the PR.

@HiroseKoichi

I'm using a fresh install of the latest version of TextGen Webui, and I used Q4_K_M for both models.

It might be doing something like setting logit bias for the BOS token. You can try using 2 as the BOS token id instead and see how that works. You will need to have the llama.cpp repo checked out, be in that directory and have at least the numpy Python dependency installed. Then run:

python gguf-py/scripts/gguf-set-metadata.py /path/whatever.gguf tokenizer.ggml.bos_token_id 2

Or maybe there's some way to control that behavior in TextGen Webui. I'm not familiar with it.

This does fix the issue for TextGen WebUI. I don't know if this is true or not, but I tried adding \n to the custom stopping strings in TextGen WebUI, and the new version just kept generating while the old version stopped at a newline like it should, so I think TextGen WebUI automatically bans the BOS token if add_bos_token is false in the metadata. I also tested the new version with llama.cpp, and the issue wasn't present, so it's definitely something to do with TextGen WebUI and not the model itself.

@TheBloke do I need to change anything in my config json?

@TheBloke do I need to change anything in my config json?

Nope, you're fine as far as I know. The issues described in this thread relate to llama.cpp not respecting the config set in the JSON files. I've had no reports of issues with the GPTQ or AWQs which use the JSON files directly.

Could this be an issue that people only run into on windows, maybe? I also have the problem that it omits any newline characters.
Is there anything I can do with prompt or a grammar file? (sorry, I'm a bit of a newbie with this stuff...)
Edit: I just tried it on runpod io and there it's definitely a linux environment and I run into exactly the same issue: all newlines are dropped.

Could this be an issue that people only run into on windows, maybe? I also have the problem that it omits any newline characters.
Is there anything I can do with prompt or a grammar file? (sorry, I'm a bit of a newbie with this stuff...)
Edit: I just tried it on runpod io and there it's definitely a linux environment and I run into exactly the same issue: all newlines are dropped.

If you're using TextGen WebUI, then you'll either need to download the first version that was uploaded or manually change the BOS token ID using the instructions above. Here's a link to the previous versions: https://huggingface.co/TheBloke/dolphin-2_2-yi-34b-GGUF/tree/20692fcd351cb6046ce1aed26d410a5430a28206
If you want to know how to get there manually for future reference, it's 'files and versions' -> 'history' -> select the old branch -> 'browse files' -> download the quantization you want to use.

Edit: The models got updated, so just re-download the latest version.

By the way, the changes to respect the add_bos_token metadata got merged into llama.cpp a couple days ago. So as long as you have a .gguf with the metadata in it and are using a recent enough llama.cpp version, you shouldn't have to worry about this stuff anymore.

Thanks Kerfuffle. I've just started a re-quant of this model and Capybara 34B, such that they'll be re-done with the latest llama.cpp code and therefore should have all the right metadata.

I was wondering what the new uploads for those were. You also shouldn't need to mess around with the BOS token id anymore. (Also, sorry that my suggestion of using newline as BOS seems to have caused problems for some people that aren't using llama.cpp directly.)

How can I fix this if I use https://github.com/turboderp/exllamav2 to inference? Really cant follow.

I'll retrain it on the new llama compatible base

I just simply made a new class but replaced <|im_end|> with < /s> (no space). Now it works fine

class PromptFormat_capychatml(PromptFormat):

    description = "ChatML format, as used by e.g. (Mistral)Orca"

    def __init__(self):
        super().__init__()
        pass

    def default_system_prompt(self):
        return \
            f"""You are {self.botname}, a large language model. Answer as concisely as possible."""

    def first_prompt(self):
        return \
            """<|im_start|>system\n""" + \
            """<|system_prompt|>\n""" + \
            """<|im_end|>\n""" + \
            """<|im_start|>user\n""" + \
            """<|user_prompt|><|im_end|>\n""" + \
            """<|im_start|>assistant\n""" 

    def subs_prompt(self):
        return \
            """<|im_end|>\n""" + \
            """<|im_start|>user\n""" + \
            """<|user_prompt|><|im_end|>\n""" + \
            """<|im_start|>assistant\n""" 

    def stop_conditions(self, tokenizer):
        return \
            [tokenizer.eos_token_id,
             """</s>"""]

    def encoding_options(self):
        return False, False, True

    def print_extra_newline(self):
        return True

I'm currently running into general errors with any GGUF model.. I'm guessing it's some python library mismatch

Sign up or log in to comment