YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

商品信息抽取模型

基于Qwen2.5-3B-Instruct模型,可以使用transformers库直接进行推理,经过Lora训练得到,可以在商品名称中抽取出品牌,型号,主商品等信息,以JSON格式输出。

没有经过rl,可能会出现提取错误或JSON格式异常等问题,请自行测试,不保证准确性。

生成模型可能会出现幻觉现象。

transformers版本低于4.37.0会报错KeyError: qwen2

以下是使用demo:

# encoding: utf-8

import time
from transformers import AutoModelForCausalLM, AutoTokenizer

import torch

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

pretrained_model = "ykallan/SkuInfo-Qwen2.5-3B-Instruct"

# model = AutoModelForCausalLM.from_pretrained(pretrained_model, low_cpu_mem_usage=True)
model = AutoModelForCausalLM.from_pretrained(pretrained_model)

tokenizer = AutoTokenizer.from_pretrained(pretrained_model)

model.to(device)


def ner(_name: str) -> str:
    messages = [
        {"role": "system", "content": "在以下商品名称中抽取出品牌、型号、主商品,并以JSON格式返回。"},
        {"role": "user", "content": _name}
    ]

    input_ids = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    model_inputs = tokenizer([input_ids], return_tensors="pt", padding=True).to(device)
    generate_config = {
        "max_new_tokens": 128
    }

    generated_ids = model.generate(model_inputs.input_ids, **generate_config)
    generated_ids = [
        output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
    ]
    response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

    return response


sku_name_list = ["大嘴猴(paul frank)花漾柔护洗衣液",
                 "得力(deli)珊瑚海A3打印纸 70g500张*5包一箱 双面绘图纸复印纸 手抄报画纸 整箱2500张7365【销冠系列】",
                 "优露清 多功能全能清洁剂去污除垢厨房油污不锈钢瓷砖玻璃地板清洗剂 1瓶装",
                 "滴露(Dettol)消毒液消毒水1.2L衣物除菌液家居宠物环境地板杀菌除螨 非84酒精",
                 "得力(deli)佳铂A4打印纸 70g500张 高档单包复印纸 合同标书彩打纸 打印书写3584【纸中贵族】",
                 "得力(deli)1mm加厚高档磨砂档案盒A4塑料PP文件夹文件盒 财会文件收纳资料盒资料册 加厚款35mm蓝色-岩石纹理 5605",
                 "心相印DT25150经典系列抽纸2层150抽面巾纸餐巾纸卫生纸家用厕纸擦手纸 8包",
                 "FUJIFILM富士施乐CT203435原装墨粉盒适用于施乐AP4570/5570墨盒(约印36000张)",
                 "艾贝丽家用柔音豆浆机 ABL-DJ31",
                 "得力财务装订机 凭证装订机 GB506 全自动热熔胶装 电动打孔机档案记账财务 企业业务",
                 "得力(deli)1只55mmA4文件盒批发 加厚党建档案盒 桌面考试收纳 财会用品 5603黑色",

                 ]
for sku_name in sku_name_list:
    start = time.time()
    result = ner(sku_name)
    end = time.time()
    print(round(end-start, 2), sku_name, "==>>", result.strip())


结果输出:

1.13 大嘴猴(paul frank)花漾柔护洗衣液 ==>> {"品牌": "大嘴猴", "型号": "花漾柔护洗衣液", "主商品": "花漾柔护洗衣液"}
0.74 得力(deli)珊瑚海A3打印纸 70g500张*5包一箱 双面绘图纸复印纸 手抄报画纸 整箱2500张7365【销冠系列】 ==>> {"品牌": "得力", "型号": "7365", "主商品": "打印纸"}
0.78 优露清 多功能全能清洁剂去污除垢厨房油污不锈钢瓷砖玻璃地板清洗剂 1瓶装 ==>> {"品牌": "优露清", "型号": "1瓶装", "主商品": "多功能全能清洁剂"}
0.73 滴露(Dettol)消毒液消毒水1.2L衣物除菌液家居宠物环境地板杀菌除螨 非84酒精 ==>> {"品牌": "滴露", "型号": "1.2L", "主商品": "消毒液"}
0.86 得力(deli)佳铂A4打印纸 70g500张 高档单包复印纸 合同标书彩打纸 打印书写3584【纸中贵族】 ==>> {"品牌": "得力", "型号": "3584", "主商品": "佳铂A4打印纸"}
0.74 得力(deli)1mm加厚高档磨砂档案盒A4塑料PP文件夹文件盒 财会文件收纳资料盒资料册 加厚款35mm蓝色-岩石纹理 5605 ==>> {"品牌": "得力", "型号": "5605", "主商品": "文件盒"}
0.83 心相印DT25150经典系列抽纸2层150抽面巾纸餐巾纸卫生纸家用厕纸擦手纸 8包 ==>> {"品牌": "心相印", "型号": "DT25150", "主商品": "抽纸"}
0.97 FUJIFILM富士施乐CT203435原装墨粉盒适用于施乐AP4570/5570墨盒(约印36000张) ==>> {"品牌": "富士施乐", "型号": "CT203435", "主商品": "原装墨粉盒"}
0.87 艾贝丽家用柔音豆浆机 ABL-DJ31 ==>> {"品牌": "艾贝丽", "型号": "ABL-DJ31", "主商品": "家用柔音豆浆机"}
0.8 得力财务装订机 凭证装订机 GB506 全自动热熔胶装 电动打孔机档案记账财务 企业业务 ==>> {"品牌": "得力", "型号": "GB506", "主商品": "财务装订机"}
0.74 得力(deli)1只55mmA4文件盒批发 加厚党建档案盒 桌面考试收纳 财会用品 5603黑色 ==>> {"品牌": "得力", "型号": "5603", "主商品": "文件盒"}
Downloads last month
13
Safetensors
Model size
3.09B params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ykallan/SkuInfo-Qwen2.5-3B-Instruct

Quantizations
1 model

Space using ykallan/SkuInfo-Qwen2.5-3B-Instruct 1