---
library_name: peft
base_model: DavidLanz/Llama2-tw-7B-v2.0.1-chat
inference: false
language:
- en
license: llama2
model_creator: Meta Llama 2
model_name: Llama 2 7B Chat
model_type: llama
pipeline_tag: text-generation
quantized_by: QLoRA
tags:
- facebook
- meta
- pytorch
- llama
- llama-2
---

# Model Card for Model ID

This PEFT weight is used to specify McDonald's invoice data in Taiwan to gain insights.

Disclaimer: This model is for a time series problem on LLM performance, and it's not for investment advice; any prediction results are not a basis for investment reference.

## Model Details

### Model Description

This repo contains QLoRA format model files for [Meta's Llama 2 7B-chat](https://huggingface.co/DavidLanz/Llama2-tw-7B-v2.0.1-chat).

## Uses

```python
import torch
from peft import LoraConfig, PeftModel

from transformers import (
    AutoModelForCausalLM,
    AutoTokenizer,
    BitsAndBytesConfig,
    HfArgumentParser,
    TrainingArguments,
    TextStreamer,
    pipeline,
    logging,
)

device_map = {"": 0}
use_4bit = True
bnb_4bit_compute_dtype = "float16"
bnb_4bit_quant_type = "nf4"
use_nested_quant = False
compute_dtype = getattr(torch, bnb_4bit_compute_dtype)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=use_4bit,
    bnb_4bit_quant_type=bnb_4bit_quant_type,
    bnb_4bit_compute_dtype=compute_dtype,
    bnb_4bit_use_double_quant=use_nested_quant,
)

based_model_path = "DavidLanz/Llama2-tw-7B-v2.0.1-chat"
adapter_path = "DavidLanz/llama2_7b_taiwan_invoice_qlora"

base_model = AutoModelForCausalLM.from_pretrained(
    based_model_path,
    low_cpu_mem_usage=True,
    # load_in_4bit=True,
    return_dict=True,
    quantization_config=bnb_config,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, adapter_path)

tokenizer = AutoTokenizer.from_pretrained(base_model_path, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

from transformers import pipeline

pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, torch_dtype=torch.bfloat16, device_map="auto")
messages = [
    {
        "role": "system",
        "content": "你是分析麥當勞速食店發票的專家。你已有 2024 年 2 月份的詳細發票資料，其中包含消費者 ID、消費者購買的商品項目、總金額、顧客的性別與年齡等人口統計資料，以及購買日期及時間。根據這些發票數據，請提供見解或預測顧客行為的趨勢。",
    },
    {"role": "user", "content": "2024年2月麥當勞在台北市中正區最熱賣的商品是什麼?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
```

## Training procedure


The following `bitsandbytes` quantization config was used during training:
- quant_method: bitsandbytes
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: False
- bnb_4bit_compute_dtype: float16

### Framework versions


- PEFT 0.10