--- base_model: THUDM/CogVideoX-5b datasets: finetrainers/cakeify-smol library_name: diffusers license: other license_link: https://huggingface.co/THUDM/CogVideoX-5b/blob/main/LICENSE instance_prompt: PIKA_CAKEIFY A red tea cup is placed on a wooden surface. Suddenly, a knife appears and slices through the cup, revealing a cake inside. The cake turns into a hyper-realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful. widget: - text: PIKA_CAKEIFY A blue soap is placed on a modern table. Suddenly, a knife appears and slices through the soap, revealing a cake inside. The soap turns into a hyper-realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful. output: url: "./assets/output_0.mp4" - text: PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary. output: url: "./assets/output_1.mp4" - text: PIKA_CAKEIFY A red tea cup is placed on a wooden surface. Suddenly, a knife appears and slices through the cup, revealing a cake inside. The cake turns into a hyper-realistic prop cake, showcasing the creative transformation of everyday objects into something unexpected and delightful. output: url: "./assets/output_2.mp4" tags: - text-to-video - diffusers-training - diffusers - cogvideox - cogvideox-diffusers - template:sd-lora --- This is a fine-tune of the [THUDM/CogVideoX-5b](https://huggingface.co/THUDM/CogVideoX-5b) model on the [finetrainers/cakeify-smol](https://huggingface.co/datasets/finetrainers/cakeify-smol) dataset. We also provide a LoRA variant of the params. Check it out [here](#lora). Code: https://github.com/a-r-r-o-w/finetrainers > [!IMPORTANT] > This is an experimental checkpoint and its poor generalization is well-known. Inference code: ```py from diffusers import CogVideoXTransformer3DModel, DiffusionPipeline from diffusers.utils import export_to_video import torch transformer = CogVideoXTransformer3DModel.from_pretrained( "finetrainers/cakeify-v0", torch_dtype=torch.bfloat16 ) pipeline = DiffusionPipeline.from_pretrained( "THUDM/CogVideoX-5b", transformer=transformer, torch_dtype=torch.bfloat16 ).to("cuda") prompt = """ PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary. """ negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs" video = pipeline( prompt=prompt, negative_prompt=negative_prompt, num_frames=81, height=512, width=768, num_inference_steps=50 ).frames[0] export_to_video(video, "output.mp4", fps=25) ``` Training logs are available on WandB [here](https://wandb.ai/diffusion-guidance/finetrainers-cogvideox/runs/q7z660f3/). ## LoRA We extracted a 64-rank LoRA from the finetuned checkpoint (script here). This LoRA can be used to emulate the same kind of effect: ```py from diffusers import DiffusionPipeline from diffusers.utils import export_to_video import torch pipeline = DiffusionPipeline.from_pretrained("THUDM/CogVideoX-5b", torch_dtype=torch.bfloat16).to("cuda") pipeline.load_lora_weights("finetrainers/cakeify-v0", weight_name="extracted_cakeify_lora_64.safetensors") prompt = """ PIKA_CAKEIFY On a gleaming glass display stand, a sleek black purse quietly commands attention. Suddenly, a knife appears and slices through the shoe, revealing a fluffy vanilla sponge at its core. Immediately, it turns into a hyper-realistic prop cake, delighting the senses with its playful juxtaposition of the everyday and the extraordinary. """ negative_prompt = "inconsistent motion, blurry motion, worse quality, degenerate outputs, deformed outputs" video = pipeline( prompt=prompt, negative_prompt=negative_prompt, num_frames=81, height=512, width=768, num_inference_steps=50 ).frames[0] export_to_video(video, "output_lora.mp4", fps=25) ```