1 4 24

Burning ray

adarksky

aeryskyB

AI & ML interests

None yet

Recent Activity

updated a model 3 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

published a model 4 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

liked a model 8 days ago

openai/whisper-tiny

View all activity

Organizations

adarksky's activity

updated a model 3 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

Text2Text Generation • Updated 3 days ago

published a model 4 days ago

adarksky/Qwen2.5-0.5B-sft-lora-rel-therapy

Text2Text Generation • Updated 3 days ago

liked a model 8 days ago

openai/whisper-tiny

Automatic Speech Recognition • Updated Feb 29, 2024 • 474k • 278

upvoted a paper 8 days ago

SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training

Paper • 2501.17161 • Published 10 days ago • 100

upvoted a paper 11 days ago

Humanity's Last Exam

Paper • 2501.14249 • Published 15 days ago • 54

liked a model 12 days ago

deepseek-ai/Janus-Pro-1B

Any-to-Any • Updated 7 days ago • 71.6k • 340

liked a model 19 days ago

deepseek-ai/DeepSeek-R1

Text Generation • Updated 7 days ago • 1.86M • • 7.63k

updated a model 23 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 6 days ago • 209k • 2.9k

New activity in hexgrad/Kokoro-82M 23 days ago

Update kokoro.py

#43 opened 23 days ago by

adarksky

liked a model 25 days ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 6 days ago • 209k • 2.9k

liked a model about 1 month ago

deepseek-ai/Janus-1.3B

Any-to-Any • Updated 12 days ago • 82.8k • 568

reacted to merve's post with 🔥 2 months ago

Post

2675

small but mighty 🔥
you can fine-tune SmolVLM on an L4 with batch size of 4 and it will only take 16.4 GB VRAM 🫰🏻 also with gradient accumulation simulated batch size is 16 ✨
I made a notebook that includes all the goodies: QLoRA, gradient accumulation, gradient checkpointing with explanations on how they work 💝 https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb

liked a model 3 months ago

Qwen/Qwen2.5-Coder-32B-Instruct

Text Generation • Updated 27 days ago • 120k • • 1.56k

updated a model 3 months ago

adarksky/pokemon-DDPM

Unconditional Image Generation • Updated Nov 11, 2024 • 57

liked a model 3 months ago

tencent/Tencent-Hunyuan-Large

Text Generation • Updated 20 days ago • 462 • 560

updated a model 3 months ago

adarksky/bart-base-rel-therapy

Text2Text Generation • Updated Nov 11, 2024 • 103

liked 4 datasets 3 months ago