Mariusz Kurman PRO
mkurman



Β·
AI & ML interests
AI Tech Lead | MD
Recent Activity
updated
a model
3 days ago
mkurman/llama-3.2-MEDIT-3B-o1-GRPO-LLM-Eval
published
a model
3 days ago
mkurman/llama-3.2-MEDIT-3B-o1-GRPO-LLM-Eval
Organizations
mkurman's activity

reacted to
JingzeShi's
post with π
3 days ago

reacted to
CultriX's
post with πβ€οΈ
12 days ago
Post
2313
Final upgrade to the Multi-Agent Task Completion Space:
CultriX/MultiAgent-CodeTask .
It now includes :
- a live stream of the progress being made on the task (see included video),
- The following components:
1. Automatic prompt optimization
2. An orchestrator deciding which agent to call dynamically including feedback from a human (human-in-the-loop)
3. A coding agent to complete the task
4. A code reviewing agent to iteratively provide feedback to improve the code generated by the coding agent until the code meets the required criteria after which it is approved.
5. A testing agent that tests the approved code or provides information on how to test it.
6. A documentation agent that provides documentation and a help message for the approved and tested code.
It now includes :
- a live stream of the progress being made on the task (see included video),
- The following components:
1. Automatic prompt optimization
2. An orchestrator deciding which agent to call dynamically including feedback from a human (human-in-the-loop)
3. A coding agent to complete the task
4. A code reviewing agent to iteratively provide feedback to improve the code generated by the coding agent until the code meets the required criteria after which it is approved.
5. A testing agent that tests the approved code or provides information on how to test it.
6. A documentation agent that provides documentation and a help message for the approved and tested code.

posted
an
update
13 days ago
Post
2011
I've been working on something cool: a GRPO with an LLM evaluator that can also perform SFT on the feedback data - if you want. Check it out π
Any πare more than welcome π€
https://github.com/mkurman/grpo-llm-evaluator
Any πare more than welcome π€
https://github.com/mkurman/grpo-llm-evaluator

posted
an
update
18 days ago
Post
1579
Blurred-Thoughts Supervised-Finetuning π
After hours of working with GitHub Copilot to organize the code, I'm keen to announce the release of Blurred Thoughts Supervised-Finetuning (BT-SFT), a new method for fine-tuning LLMs to produce more diverse and creative responses.
BT-SFT introduces:
β Smart tokenization method randomly masks tokens within <think> ... </think> tags, promoting the model to generate diverse responses that align better with its probability distribution instead of memorizing the thought process from distilled data.
β Reward function that ensures responses are well-structured.
Explore and contribute to the project available in my GitHub repository:
https://github.com/mkurman/blurred-thoughts-SFT
Keep me updated on your experiments with BT-SFT! π
After hours of working with GitHub Copilot to organize the code, I'm keen to announce the release of Blurred Thoughts Supervised-Finetuning (BT-SFT), a new method for fine-tuning LLMs to produce more diverse and creative responses.
BT-SFT introduces:
β Smart tokenization method randomly masks tokens within <think> ... </think> tags, promoting the model to generate diverse responses that align better with its probability distribution instead of memorizing the thought process from distilled data.
β Reward function that ensures responses are well-structured.
Explore and contribute to the project available in my GitHub repository:
https://github.com/mkurman/blurred-thoughts-SFT
Keep me updated on your experiments with BT-SFT! π
Issue with Padding
1
#1 opened 21 days ago
by
akashD22

reacted to
nicolay-r's
post with π₯
23 days ago
Post
1619
π’ The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen
π Notebook for using it in reasoning over series of data π§ :
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb
Loading using the pipeline API of the transformers library:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py
π‘ GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)
π Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? π€
Model name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
β Framework: https://github.com/nicolay-r/bulk-chain
π Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate
π Notebook for using it in reasoning over series of data π§ :
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb
Loading using the pipeline API of the transformers library:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py
π‘ GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)
π Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? π€
Model name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
β Framework: https://github.com/nicolay-r/bulk-chain
π Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate

reacted to
fuzzy-mittenz's
post with ππ€ππ₯
24 days ago
Post
2620
Not many seemed to notice but what was probably meant to be a WIN for artist's rights in the US Office of Copyright has solved some fundamental issues for the community.
In our recent article I outline how Companies like Suno, OpenAI, Midjourney etc can no longer claim any right to copy your work that you create with their platforms
We also look at other ways this study and new rules for AI will fundamentally effect creators who use it and companies incentives to give them control over certain aspects might change because of this. it's broken down pretty well here: https://huggingface.co/blog/fuzzy-mittenz/copyright-in-ai
In our recent article I outline how Companies like Suno, OpenAI, Midjourney etc can no longer claim any right to copy your work that you create with their platforms
We also look at other ways this study and new rules for AI will fundamentally effect creators who use it and companies incentives to give them control over certain aspects might change because of this. it's broken down pretty well here: https://huggingface.co/blog/fuzzy-mittenz/copyright-in-ai