CausalLM
/

14B

@@ -44,7 +44,7 @@ Thanks TheBloke for GGUF quants: [https://huggingface.co/TheBloke/CausalLM-14B-G
 Also see [7B Version](https://huggingface.co/CausalLM/7B)
-This model was trained based on the model weights of Qwen (and LLaMA2 was used, yes, for calculating some initial weights), you may also need to comply with the commercial use restrictions of these two models depending on the situation. The training process utilized a model structure that was identical to LLaMA2, using the same attention calculation method as the original MHA LLaMA2 models, and no additional scaling applied to the Relative Positional Encoding (RoPE).
 We manually curated a SFT dataset of 1.3B tokens for training, utilizing open source datasets from Hugging Face. For most of these sentences, we performed manual or synthetic rewrites and generated alternate language versions using larger language models. Additionally, we conducted augmented text training using carefully selected entries from Wikipedia, as well as featured entries from Fandom and filtered entries from Moegirlpedia. In order to strike a balance between efficiency and quality, 100% of the data used for training was synthetic data, no direct use of text from the internet or original texts from publicly available datasets was employed for fine-tuning.
@@ -105,7 +105,7 @@ GPT2Tokenizer 支持由 [Kerfuffle](https://github.com/KerfuffleV2) 修复于 [h
 另请参阅[7B版本](https://huggingface.co/CausalLM/7B)
-该模型是基于Qwen的权重（并使用了LLaMA2权重，是的，用于计算一些权重初始化），您根据情况可能还需要遵守这两个模型的商业使用限制。训练过程中使用了与LLaMA2相同的模型结构，使用原始MHA LLaMA2模型的相同注意力计算方法，对相对位置编码（RoPE）没有进行额外的缩放。
 我们手动筛选了一个包含13亿个标记的SFT数据集进行训练，利用了Hugging Face的开源数据集。对于大多数句子，我们进行了手动或合成改写，并使用更大的语言模型生成了其他语言版本。此外，我们还使用了精心挑选的来自维基百科的条目、来自Fandom的精选条目以及来自萌娘百科的过滤条目进行增强文本训练。为了在效率和质量之间取得平衡，训练所使用的100%数据都是合成数据，没有直接使用来自互联网或公开可用数据集的原始文本进行微调。

 Also see [7B Version](https://huggingface.co/CausalLM/7B)
+This model was trained based on the model weights of Qwen (and LLaMA2 was used, yes, for calculating some initial weights), you may also need to comply with the commercial use restrictions of these two models depending on the situation. The training process utilized a model structure that was identical to LLaMA2, using the same attention calculation method as the original MHA LLaMA2 models, and no additional scaling applied to the Rotary Positional Encoding (RoPE).
 We manually curated a SFT dataset of 1.3B tokens for training, utilizing open source datasets from Hugging Face. For most of these sentences, we performed manual or synthetic rewrites and generated alternate language versions using larger language models. Additionally, we conducted augmented text training using carefully selected entries from Wikipedia, as well as featured entries from Fandom and filtered entries from Moegirlpedia. In order to strike a balance between efficiency and quality, 100% of the data used for training was synthetic data, no direct use of text from the internet or original texts from publicly available datasets was employed for fine-tuning.
 另请参阅[7B版本](https://huggingface.co/CausalLM/7B)
+该模型是基于Qwen的权重（并使用了LLaMA2权重，是的，用于计算一些权重初始化），您根据情况可能还需要遵守这两个模型的商业使用限制。训练过程中使用了与LLaMA2相同的模型结构，使用原始MHA LLaMA2模型的相同注意力计算方法，对旋转位置编码（RoPE）没有进行额外的缩放。
 我们手动筛选了一个包含13亿个标记的SFT数据集进行训练，利用了Hugging Face的开源数据集。对于大多数句子，我们进行了手动或合成改写，并使用更大的语言模型生成了其他语言版本。此外，我们还使用了精心挑选的来自维基百科的条目、来自Fandom的精选条目以及来自萌娘百科的过滤条目进行增强文本训练。为了在效率和质量之间取得平衡，训练所使用的100%数据都是合成数据，没有直接使用来自互联网或公开可用数据集的原始文本进行微调。