Edit model card

tFINE-680m-e32-d16-gqa-flan

FLAN-tuned variant of a tFINE (t5) model with GQA.

  • 32 encoder layers
  • 16 decoder layers
  • 1024 hidden size

testing

install transformers fork with GQA updates for t5 (⚠️WIP🚧):

pip install -U git+https://github.com/pszemraj/transformers.git@t5-gqa

then

# pip install -U git+https://github.com/pszemraj/transformers.git@t5-gqa
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan")
model = AutoModelForSeq2SeqLM.from_pretrained(
    "BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan"
)

prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

generated_ids = model.generate(**inputs, max_new_tokens=64, no_repeat_ngram_size=3)
print(
    tokenizer.batch_decode(
        generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True
    )[0]
)

Quick eval

Quick eval for: BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan

hf (pretrained=BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan,trust_remote_code=True,dtype=bfloat16,trust_remote_code=True), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8

Tasks Version Filter n-shot Metric Value Stderr
boolq 2 none 0 acc 0.7040 ± 0.0080
openbookqa 1 none 0 acc 0.1580 ± 0.0163
none 0 acc_norm 0.2420 ± 0.0192
piqa 1 none 0 acc 0.6132 ± 0.0114
none 0 acc_norm 0.6159 ± 0.0113
social_iqa 0 none 0 acc 0.4319 ± 0.0112
tinyArc 0 none 25 acc_norm 0.2898 ± N/A
tinyHellaswag 0 none 10 acc_norm 0.3295 ± N/A
tinyMMLU 0 none 0 acc_norm 0.2980 ± N/A
winogrande 1 none 0 acc 0.5020 ± 0.0141

Training and evaluation data

used config 'all'

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 8e-05
  • train_batch_size: 4
  • eval_batch_size: 2
  • seed: 17868
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 32
  • total_train_batch_size: 256
  • total_eval_batch_size: 4
  • optimizer: Use paged_ademamix_32bit and the args are: No additional optimizer arguments
  • lr_scheduler_type: constant_with_warmup
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 1.0
Downloads last month
61
Safetensors
Model size
680M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan

Finetuned
(1)
this model

Dataset used to train BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan