This model serves as a proof of concept. You will very likely have better captioning results using SmilingWolf/wd-eva02-large-tagger-v3.

Trained with Florence-2ner using this config and 316K images from the animelover/danbooru2022 dataset (data-0880.zip to data-0943.zip).

{
    "model_name": "microsoft/Florence-2-base",
    "dataset_path": "./0000_Datasets/danbooru2022",
    "run_name": "Florence-2-base-danbooru2022-316k-run1",
    "epochs": 1,
    "learning_rate": 1e-5,
    "gradient_checkpointing": true,
    "freeze_vision": false,
    "freeze_language": false,
    "freeze_other": false,
    "train_batch_size": 8,
    "eval_batch_size": 16,
    "gradient_accumulation_steps": 32,
    "clip_grad_norm": 1,
    "weight_decay": 1e-5,
    "save_total_limit": 3,
    "save_steps": 50,
    "eval_steps": 50,
    "warmup_steps": 50,
    "eval_split_ratio": 0.01,
    "seed": 42,
    "filtering_processes": 128,
    "attn_implementation": "sdpa"
}

val_loss

Downloads last month
29
Safetensors
Model size
271M params
Tensor type
BF16
·
Inference API
Unable to determine this model's library. Check the docs .

Model tree for PJMixers-Dev/Florence-2-base-danbooru2022-316k

Finetuned
(6)
this model

Dataset used to train PJMixers-Dev/Florence-2-base-danbooru2022-316k