YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
商品信息抽取模型
基于Qwen2.5-3B-Instruct模型,可以使用transformers库直接进行推理,经过Lora训练得到,可以在商品名称中抽取出品牌,型号,主商品等信息,以JSON格式输出。
没有经过rl,可能会出现提取错误或JSON格式异常等问题,请自行测试,不保证准确性。
生成模型可能会出现幻觉现象。
transformers版本低于4.37.0会报错KeyError: qwen2
以下是使用demo:
# encoding: utf-8
import time
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
pretrained_model = "ykallan/SkuInfo-Qwen2.5-3B-Instruct"
# model = AutoModelForCausalLM.from_pretrained(pretrained_model, low_cpu_mem_usage=True)
model = AutoModelForCausalLM.from_pretrained(pretrained_model)
tokenizer = AutoTokenizer.from_pretrained(pretrained_model)
model.to(device)
def ner(_name: str) -> str:
messages = [
{"role": "system", "content": "在以下商品名称中抽取出品牌、型号、主商品,并以JSON格式返回。"},
{"role": "user", "content": _name}
]
input_ids = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([input_ids], return_tensors="pt", padding=True).to(device)
generate_config = {
"max_new_tokens": 128
}
generated_ids = model.generate(model_inputs.input_ids, **generate_config)
generated_ids = [
output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
return response
sku_name_list = ["大嘴猴(paul frank)花漾柔护洗衣液",
"得力(deli)珊瑚海A3打印纸 70g500张*5包一箱 双面绘图纸复印纸 手抄报画纸 整箱2500张7365【销冠系列】",
"优露清 多功能全能清洁剂去污除垢厨房油污不锈钢瓷砖玻璃地板清洗剂 1瓶装",
"滴露(Dettol)消毒液消毒水1.2L衣物除菌液家居宠物环境地板杀菌除螨 非84酒精",
"得力(deli)佳铂A4打印纸 70g500张 高档单包复印纸 合同标书彩打纸 打印书写3584【纸中贵族】",
"得力(deli)1mm加厚高档磨砂档案盒A4塑料PP文件夹文件盒 财会文件收纳资料盒资料册 加厚款35mm蓝色-岩石纹理 5605",
"心相印DT25150经典系列抽纸2层150抽面巾纸餐巾纸卫生纸家用厕纸擦手纸 8包",
"FUJIFILM富士施乐CT203435原装墨粉盒适用于施乐AP4570/5570墨盒(约印36000张)",
"艾贝丽家用柔音豆浆机 ABL-DJ31",
"得力财务装订机 凭证装订机 GB506 全自动热熔胶装 电动打孔机档案记账财务 企业业务",
"得力(deli)1只55mmA4文件盒批发 加厚党建档案盒 桌面考试收纳 财会用品 5603黑色",
]
for sku_name in sku_name_list:
start = time.time()
result = ner(sku_name)
end = time.time()
print(round(end-start, 2), sku_name, "==>>", result.strip())
结果输出:
1.13 大嘴猴(paul frank)花漾柔护洗衣液 ==>> {"品牌": "大嘴猴", "型号": "花漾柔护洗衣液", "主商品": "花漾柔护洗衣液"}
0.74 得力(deli)珊瑚海A3打印纸 70g500张*5包一箱 双面绘图纸复印纸 手抄报画纸 整箱2500张7365【销冠系列】 ==>> {"品牌": "得力", "型号": "7365", "主商品": "打印纸"}
0.78 优露清 多功能全能清洁剂去污除垢厨房油污不锈钢瓷砖玻璃地板清洗剂 1瓶装 ==>> {"品牌": "优露清", "型号": "1瓶装", "主商品": "多功能全能清洁剂"}
0.73 滴露(Dettol)消毒液消毒水1.2L衣物除菌液家居宠物环境地板杀菌除螨 非84酒精 ==>> {"品牌": "滴露", "型号": "1.2L", "主商品": "消毒液"}
0.86 得力(deli)佳铂A4打印纸 70g500张 高档单包复印纸 合同标书彩打纸 打印书写3584【纸中贵族】 ==>> {"品牌": "得力", "型号": "3584", "主商品": "佳铂A4打印纸"}
0.74 得力(deli)1mm加厚高档磨砂档案盒A4塑料PP文件夹文件盒 财会文件收纳资料盒资料册 加厚款35mm蓝色-岩石纹理 5605 ==>> {"品牌": "得力", "型号": "5605", "主商品": "文件盒"}
0.83 心相印DT25150经典系列抽纸2层150抽面巾纸餐巾纸卫生纸家用厕纸擦手纸 8包 ==>> {"品牌": "心相印", "型号": "DT25150", "主商品": "抽纸"}
0.97 FUJIFILM富士施乐CT203435原装墨粉盒适用于施乐AP4570/5570墨盒(约印36000张) ==>> {"品牌": "富士施乐", "型号": "CT203435", "主商品": "原装墨粉盒"}
0.87 艾贝丽家用柔音豆浆机 ABL-DJ31 ==>> {"品牌": "艾贝丽", "型号": "ABL-DJ31", "主商品": "家用柔音豆浆机"}
0.8 得力财务装订机 凭证装订机 GB506 全自动热熔胶装 电动打孔机档案记账财务 企业业务 ==>> {"品牌": "得力", "型号": "GB506", "主商品": "财务装订机"}
0.74 得力(deli)1只55mmA4文件盒批发 加厚党建档案盒 桌面考试收纳 财会用品 5603黑色 ==>> {"品牌": "得力", "型号": "5603", "主商品": "文件盒"}
- Downloads last month
- 13
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.