Training details

#12

by ywlee88 - opened Jan 23

Discussion

ywlee88

Jan 23

Hi,

I'm impressed by your amazing work.

Could you describe the training details (e.g., batch size, lr, scheduler, etc) for lcm-lora-sdxl model?

When I tried to train lcm-lora-sdxl model with the official diffusers's training script, the intermediate validation result images were not as good as yours.

Thanks in advance.

sayakpaul

Latent Consistency org Jan 23

Did you try the exact same training setup? Dataset, hyperparameters, etc?

ywlee88

Jan 23

•

edited Jan 23

@sayakpaul

Thank you for your quick response.

Yes, except for the training data.

I used a subset of laion-aesthetic dataset (11K text-image pairs) provided by BK-SDM.

I shared validated generation images at 700 iterations.

This is hyper-params:

--train_data_dir=./data/laion_aes/preprocessed_11k --pretrained_teacher_model=stabilityai/stable-diffusion-xl-base-1.0 --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix --output_dir=./results/TOY_LCM_LORA_LAION/lcm_lora_sdxl_base_24x1x1_lr_1e-4 --tracker_project_name=TOY_LCM_LORA_LAION --tracker_output_name=lcm_lora_sdxl_base_24x1x1_lr_1e-4 --mixed_precision=fp16 --resolution=1024 --train_batch_size=24 --gradient_accumulation_steps=1 --gradient_checkpointing --use_8bit_adam --lora_rank=64 --learning_rate=1e-4 --report_to=wandb --lr_scheduler=constant --lr_warmup_steps=0 --max_train_steps=100000 --checkpointing_steps=2000 --validation_steps=20 --seed=0 --report_to=wandb

ywlee88

Jan 23

•

edited Jan 23

I have another question.

Could you let me know what data is used for training lcm-lora-ssd-1b model and lcm-lora-sdxl?

When I generated some samples, the result of lcm-lora-ssd-1b showed better quality than that of lcm-lora-sdxl.

I wonder if this difference in generation quality is caused by differences in the data used for training.

For the sake of the community, it would be very helpful if you could share the training details of the lcm-lora-sdxl and lcm-lora-ssd-1b models.

In my case, I'm trying to create an lcm-lora version of the koala model, which is a lightweight T2I model like ssd-1b.

Thanks in advance.

For lcm-lora-ssd-1b:

model_id = "segmind/SSD-1B"
adapter_id = "latent-consistency/lcm-lora-ssd-1b"

pipe = AutoPipelineForText2Image.from_pretrained(model_id, torch_dtype=torch.float16, variant="fp16")
pipe.scheduler = LCMScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")

# load and fuse lcm lora
pipe.load_lora_weights(adapter_id)
pipe.fuse_lora()

prompt = "Portrait photo of a standing girl, photograph, golden hair, depth of field, moody light, golden hour, centered, extremely detailed, award winning photography, realistic."
image = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=1).images[0]

For lcm-lora-sdxl:

pipe2 = AutoPipelineForText2Image.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0") 
pipe2.scheduler = LCMScheduler.from_config(pipe2.scheduler.config)
pipe2.to("cuda:4")

# load and fuse lcm lora
pipe2.load_lora_weights("latent-consistency/lcm-lora-sdxl") 
pipe2.fuse_lora()

prompt = "Portrait photo of a standing girl, photograph, golden hair, depth of field, moody light, golden hour, centered, extremely detailed, award winning photography, realistic."
image2 = pipe2(prompt=prompt, num_inference_steps=4, guidance_scale=1).images[0]

NoelVouitsis

May 12

@sayakpaul

Thank you for your quick response.

Yes, except for the training data.

I used a subset of laion-aesthetic dataset (11K text-image pairs) provided by BK-SDM.

I shared validated generation images at 700 iterations.

This is hyper-params:

--train_data_dir=./data/laion_aes/preprocessed_11k --pretrained_teacher_model=stabilityai/stable-diffusion-xl-base-1.0 --pretrained_vae_model_name_or_path=madebyollin/sdxl-vae-fp16-fix --output_dir=./results/TOY_LCM_LORA_LAION/lcm_lora_sdxl_base_24x1x1_lr_1e-4 --tracker_project_name=TOY_LCM_LORA_LAION --tracker_output_name=lcm_lora_sdxl_base_24x1x1_lr_1e-4 --mixed_precision=fp16 --resolution=1024 --train_batch_size=24 --gradient_accumulation_steps=1 --gradient_checkpointing --use_8bit_adam --lora_rank=64 --learning_rate=1e-4 --report_to=wandb --lr_scheduler=constant --lr_warmup_steps=0 --max_train_steps=100000 --checkpointing_steps=2000 --validation_steps=20 --seed=0 --report_to=wandb

In your example generated images, how many inference steps is that with?

sayakpaul

Latent Consistency org May 13

I didn’t work on the example so don’t have exact details but the dataset would definitely impact the quality here and also the length of the training schedule.

Cc: @pcuenq if any details pop up on the number of steps.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment