Model thinks it's Claude

#2
by nonetrix - opened

I assume this model used synthetic data from Claude, so when asked who it was created by it says Anthropic and it's name is Claude. Maybe this could be fixed in next version, thanks

Hi, did you use it with Ollama?

Confirmed. Regardless of how I ask the 72b/34b/14b GGUFs provided by Qwen in GPT4All it always responds with Claude.

Example: "Name: Claude \ Company: Anthropic \ Feel free to ask me anything else or let me know if there's another topic you'd like to explore!"

Perhaps using too much synthetic data from Anthropic, which is known for often strange and excessive censorship, is partly responsible for the strong censorship reported by some (e.g. @sunnyyy ).

I wonder if this also played a role in the HUGE drop in world knowledge between Qwen2 and Qwen2.5. Synthetic data is often very targeted and limited (e.g. code, math, and science), which would explain why Qwen2.5 knows far less about music, movies, games, sports, and other very popular domains of knowledge that are commonly overlooked when relying on synthetic data vs human generated sources of data (e.g. Wikipedia & web dumps).

Sign up or log in to comment