Oleksii Maryshchenko's picture

Oleksii Maryshchenko

omaryshchenko
Β·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

omaryshchenko's activity

upvoted an article about 9 hours ago
view article
Article

SmolVLM Grows Smaller – Introducing the 250M & 500M Models!

β€’ 49
reacted to JingzeShi's post with 🀯 4 days ago
upvoted an article 7 days ago
view article
Article

Timm ❀️ Transformers: Use any timm model with transformers

β€’ 33
reacted to mlabonne's post with πŸ€— 8 days ago
view post
Post
3066
πŸ†• LLM Course 2025 edition!

I updated the LLM Scientist roadmap and added a ton of new information and references. It covers training, datasets, evaluation, quantization, and new trends like test-time compute scaling.

The LLM Course has been incredibly popular (41.3k stars!) and I've been touched to receive many, many messages about how it helped people in their careers.

I know how difficult this stuff can be, so I'm super proud of the impact it had. I want to keep updating it in 2025, especially with the LLM Engineer roadmap.

Thanks everyone, hope you'll enjoy it!

πŸ’» LLM Course: https://huggingface.co/blog/mlabonne/llm-course
upvoted an article 8 days ago
reacted to clem's post with πŸš€ about 1 month ago
view post
Post
1948
Coming back to Paris Friday to open our new Hugging Face office!

We're at capacity for the party but add your name in the waiting list as we're trying to privatize the passage du Caire for extra space for robots πŸ€–πŸ¦ΎπŸ¦Ώ

https://t.co/enkFXjWndJ
  • 1 reply
Β·
reacted to csabakecskemeti's post with πŸ‘ about 1 month ago
reacted to merve's post with πŸ”₯ about 2 months ago
view post
Post
3928
Small yet mighty! πŸ’«

We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🀠

We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base HuggingFaceTB/smolvlm-6740bd584b2dcbf51ecb1f39

Learn more from our blog here: huggingface.co/blog/smolvlm
This release comes with a demo, fine-tuning code, MLX integration and TRL integration for DPO πŸ’
Try the demo: HuggingFaceTB/SmolVLM
Fine-tuning Recipe: https://github.com/huggingface/smollm/blob/main/finetuning/Smol_VLM_FT.ipynb
Also TRL integration for DPO πŸ’—
upvoted an article 3 months ago
view article
Article

Transformers.js v3: WebGPU support, new models & tasks, and more…

β€’ 66
reacted to Xenova's post with πŸš€ 7 months ago
view post
Post
6042
Florence-2, the new vision foundation model by Microsoft, can now run 100% locally in your browser on WebGPU, thanks to Transformers.js! πŸ€—πŸ€―

It supports tasks like image captioning, optical character recognition, object detection, and many more! 😍 WOW!
- Demo: Xenova/florence2-webgpu
- Models: https://huggingface.co/models?library=transformers.js&other=florence2
- Source code: https://github.com/xenova/transformers.js/tree/v3/examples/florence2-webgpu
reacted to merve's post with πŸ€— 7 months ago
view post
Post
6068
Fine-tune Florence-2 on any task πŸ”₯

Today we release a notebook and a walkthrough blog on fine-tuning Florence-2 on DocVQA dataset @andito @SkalskiP

Blog: https://huggingface.co/blog πŸ“•
Notebook: https://colab.research.google.com/drive/1hKDrJ5AH_o7I95PtZ9__VlCTNAo1Gjpf?usp=sharing πŸ“–
Florence-2 is a great vision-language model thanks to it's massive dataset and small size!

This model requires conditioning through task prefixes and it's not as generalist, requiring fine-tuning on a new task, such as DocVQA πŸ“

We have fine-tuned the model on A100 (and one can also use a smaller GPU with smaller batch size) and saw that model picks up new tasks πŸ₯Ή

See below how it looks like before and after FT 🀩
Play with the demo here andito/Florence-2-DocVQA πŸ„β€β™€οΈ
reacted to merve's post with πŸ‘€ 7 months ago
view post
Post
4350
Florence-2 is a new vision foundation model capable of a wide variety of tasks 🀯
Demo πŸ‘‰πŸ» gokaygokay/Florence-2
Collection πŸ‘‰πŸ» microsoft/florence-6669f44df0d87d9c3bfb76de

This model can handle tasks that vary from OCR to semantic segmentation.

The difference from previous models is that the authors have compiled a dataset consisting of 126M images with 5.4B annotations labelled with their own data engine pseudolabelled by smaller specialized models and APIs.

The model has a similar architecture to previous models: an image encoder and a multimodality encoder with a text decoder. The authors have compiled the multitask dataset with prompts for each task.

You can also fine-tune this model on any task of choice. The authors also released different results on downstream tasks and reported their results when un/freezing the vision encoder πŸ€“πŸ“‰
They have released fine-tuned models too, you can find them in the collection above πŸ€—
Β·
New activity in Xenova/Phi-3-mini-4k-instruct 9 months ago

Awww yes!

25
#2 opened 9 months ago by
BoscoTheDog
liked a Space 9 months ago