xzuyn's picture
Update README.md
567db1d verified
---
datasets:
- animelover/danbooru2022
base_model:
- microsoft/Florence-2-base
---
This model serves as a proof of concept. You will *very likely* have better captioning results using [`SmilingWolf/wd-eva02-large-tagger-v3`](https://huggingface.co/SmilingWolf/wd-eva02-large-tagger-v3).
Trained with [Florence-2ner](https://github.com/xzuyn/Florence-2ner) using this config and 316K images from the [`animelover/danbooru2022` dataset](https://huggingface.co/datasets/animelover/danbooru2022) (`data-0880.zip` to `data-0943.zip`).
```json
{
"model_name": "microsoft/Florence-2-base",
"dataset_path": "./0000_Datasets/danbooru2022",
"run_name": "Florence-2-base-danbooru2022-316k-run1",
"epochs": 1,
"learning_rate": 1e-5,
"gradient_checkpointing": true,
"freeze_vision": false,
"freeze_language": false,
"freeze_other": false,
"train_batch_size": 8,
"eval_batch_size": 16,
"gradient_accumulation_steps": 32,
"clip_grad_norm": 1,
"weight_decay": 1e-5,
"save_total_limit": 3,
"save_steps": 50,
"eval_steps": 50,
"warmup_steps": 50,
"eval_split_ratio": 0.01,
"seed": 42,
"filtering_processes": 128,
"attn_implementation": "sdpa"
}
```
![val_loss](https://huggingface.co/PJMixers-Dev/Florence-2-base-danbooru2022-316k/raw/main/val_loss.png)