File size: 3,663 Bytes

5d91a83
 
dab8b3f
 
 
 
 
 
5ef5b50
6797664
 
ac742dd
d97580a
8429aff
d97580a
8429aff
dab8b3f
 
32a590b
c6ef738
3488529
71547b1
3488529
afd636f
78adb3e
0ff18d4
74f6add
5571f87
 
650b7f8
 
b8d8133
84d51a3
41e9c6d
 
d97580a
5f6ee09
 
318ec28
 
41e9c6d
 
 
 
 
b42ff65
 
41e9c6d
 
 
 
024a65a
 
41e9c6d
024a65a
41e9c6d
024a65a
41e9c6d
 
 
 
 
 
 
 
 
 
 
 
318ec28
 
14af903
 
 
 
 
 
 
 
 
 
 
 
 
5f6ee09

---
license: bigscience-bloom-rail-1.0
datasets:
- jslin09/Fraud_Case_Verdicts
language:
- zh
metrics:
- accuracy
pipeline_tag: text-generation
text-generation:
  parameters:
    max_length: 400
    do_sample: true
    temperature: 0.75
    top_k: 50
    top_p: 0.9
tags:
- legal
widget:
- text: 王大明意圖為自己不法所有，基於竊盜之犯意，
  example_title: 生成竊盜罪之犯罪事實
- text: 騙人布意圖為自己不法所有，基於詐欺取財之犯意，
  example_title: 生成詐欺罪之犯罪事實
- text: 梅友乾明知其無資力支付酒店消費，亦無付款意願，竟意圖為自己不法之所有，
  example_title: 生成吃霸王餐之詐欺犯罪事實
- text: 闕很大明知金融帳戶之存摺、提款卡及密碼係供自己使用之重要理財工具，
  example_title: 生成賣帳戶幫助詐欺犯罪事實
- text: 趙甲王基於行使偽造特種文書及詐欺取財之犯意，
  example_title: 偽造特種文書(契約、車牌等)詐財
---

# 判決書草稿自動生成
本模型是以司法院公開之「詐欺」案件判決書做成之資料集，基於 [BLOOM 560m](https://huggingface.co/bigscience/bloomz-560m) 模型進行微調訓練，可以自動生成詐欺及竊盜案件之犯罪事實段落之草稿。資料集之資料範圍從100年1月1日至110年12月31日，所蒐集到的原始資料共有 74823 篇（判決以及裁定），我們只取判決書的「犯罪事實」欄位內容，並把這原始的資料分成三份，用於訓練的資料集有59858篇，約佔原始資料的80%，剩下的20%，則是各分配10%給驗證集（7482篇），10%給測試集（7483篇）。在本網頁進行測試時，請在模型載入完畢並生成第一小句後，持續按下Compute按鈕，就能持續生成文字。或是輸入自己想要測試的資料到文字框中進行測試。

# 使用範例
如果要在自己的程式中調用本模型，可以參考下列的 Python 程式碼，藉由呼叫 API 的方式來生成刑事判決書「犯罪事實」欄的內容。
<details>
  <summary> 點擊後展開 </summary>
<pre>
  <code>
import requests, json
from time import sleep
from tqdm.auto import tqdm, trange

API_URL = "https://api-inference.huggingface.co/models/jslin09/bloom-560m-finetuned-fraud"
API_TOKEN = 'XXXXXXXXXXXXXXX' # 調用模型的 API token
headers = {"Authorization": f"Bearer {API_TOKEN}"} 

def query(payload):
    response = requests.post(API_URL, headers=headers, json=payload)
    return json.loads(response.content.decode("utf-8"))

prompt = "森上梅前明知其無資力支付酒店消費，亦無付款意願，竟意圖為自己不法之所有，"
query_dict = {
	"inputs": prompt,
}
text_len = 300
t = trange(text_len, desc= '生成例稿', leave=True)
for i in t:
    response = query(query_dict)
    try:
        response_text = response[0]['generated_text']
        query_dict["inputs"] = response_text
        t.set_description(f"{i}: {response[0]['generated_text']}")
        t.refresh()
    except KeyError:
        sleep(30) # 如果伺服器太忙無回應，等30秒後再試。
        pass
print(response[0]['generated_text'])
</code>
</pre>
</details>

或是，你要使用 transformers 套件來實作你的程式，將本模型下載至你本地端的電腦中執行，可以參考下列程式碼:
<details>
  <summary> 點擊後展開 </summary>
<pre>
  <code>
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("jslin09/bloom-560m-finetuned-fraud")
model = AutoModelForCausalLM.from_pretrained("jslin09/bloom-560m-finetuned-fraud")
</code>
</pre>
</details>