File size: 1,086 Bytes
a12b50f
 
 
0f319be
 
 
 
 
 
 
 
 
 
 
 
 
 
3379e03
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
---
license: apache-2.0
---

GPTQ 4-bit no actor version for compatibility that works in textgen-webui

Generated by using scripts from https://gitee.com/yhyu13/llama_-tools

Merged weights: https://huggingface.co/Yhyu13/oasst-rlhf-2-llama-30b-7k-steps-hf

Converted LLaMA weights: https://huggingface.co/Yhyu13/llama-30B-hf-openassitant

Delta weights: https://huggingface.co/OpenAssistant/oasst-rlhf-2-llama-30b-7k-steps-xor

---

OA has done a great jobs in RLHF their pre-trained weights. I must say it is tuned to spit out CoT step by step thinking without you actively prompting it to do so,
which is a feature that we observe on ChatGPT and GPT-4.

But note, it still fails at logical paradox tasks such as era of time and bird shot. But none of the LLaMA based models or any available models other than GPT-4 and Claude+ can correct answer paradox questions anyway. So OA rlhf is expected to fail at these tasks, but I do like the RLHF-ed tone which make OA's response sounds professional and proficient.

![img1](./img/sample1)


![img2](./img/sample2)


![img3](./img/sample3)