本项目基于Meta发布的llama3-8B-Instruct模型进行开发。即将MLP复制8份,创建一个随机初始化的router,其余参数权重保持不变,搭建一个热启动的MoE模型。这种方式能够极大地降低从头开始训练一个MoE模型的成本,便于快速的在下游任务中微调使用。
其中 router_warmboot表示使用chines-mixtral-Instruct版本中的router参数进行llama3-MoE——Instruct参数的初始化,router_random是router随机初始化的版本。
详情请见github仓库https://github.com/cooper12121/llama3-8x8b-MoE
generate
import sys
sys.path.append("/apdcephfs_qy3/share_301372554/share_info/qianggao/")
from modeling_file.llama3_moe.modeling_llama_moe import LlamaMoEForCausalLM
from modeling_file.llama3_moe.tokenization_llama_fast import LlamaTokenizerFast
model_ckpt = "/apdcephfs_qy3/share_301372554/share_info/qianggao/ckpt/llama3-8x8b-MoE-base"
tokenizer = LlamaTokenizerFast.from_pretrained(model_ckpt)
# print(tokenizer)
model = LlamaMoEForCausalLM.from_pretrained(model_ckpt,device_map="auto",use_cache=False)
text_list = ["hello,what is your name?","你好,你叫什么名字"]
tokenizer.pad_token = tokenizer.eos_token
tokenizer.pad_token_id = tokenizer.eos_token_id
inputs = tokenizer(text_list,return_tensors="pt", padding=True).to("cuda")
output = model.generate(**inputs,pad_token_id=tokenizer.eos_token_id,max_new_tokens=100)
print(tokenizer.batch_decode(output))
其中modeling_file文件可从github仓库获取