MedIT Solutions

company
Verified

AI & ML interests

None defined yet.

Recent Activity

meditsolutions's activity

mkurmanΒ 
posted an update about 8 hours ago
view post
Post
193
I’ve simplified things for the AI OS community!

Check out Qwen-2.5-14B-DeepSeek-R1-1M! This one's a cool blend of the latest Qwen 2.5 with 14 billion parameters and has a massive 1 million token context window. It also comes with the DeepSeek R1 version of the Qwen 2.5 14B base model.

Enjoy! πŸš€

mkurman/Qwen2.5-14B-DeepSeek-R1-1M
mkurmanΒ 
posted an update 9 days ago
mkurmanΒ 
posted an update 22 days ago
view post
Post
1906
I kindly invite you to try my experimental Llama 3.2 3B with o1-like thinking.

It utilizes Thoughts when needed, so don't be surprised when it's not. It also has a minor bug that requires further fine-tuning (sometimes it starts with the <|python_tag|> instead of <Thought>).

Enjoy!

Give some likes and whatever to make me feel better and motivated to keep going πŸ˜‚

mkurman/llama-3.2-MEDIT-3B-o1
mkurmanΒ 
posted an update about 2 months ago
view post
Post
346
How Do I Contribute (HDIC)

Exciting times to come? We are working on a layer self-esteem technique to score their contribution to the final prediction. For now, it unlocks a lot of knowledge already stored in weights we couldn't force the model to extract by further fine-tuning!
mkurmanΒ 
posted an update about 2 months ago
view post
Post
438
What AI-enhanced research tools would you recommend for searching and analyzing scientific papers?
  • 5 replies
Β·
mkurmanΒ 
posted an update about 2 months ago
view post
Post
1180
We built a new small language model SmolLM2-MedIT-Upscale-2B, based on SmolLM2-1.7B-Instruct from Hugging Face. The premise was simple - increasing the vector in attention layers would positively impact the model's capabilities.

What did we prove?
In total, not much really, since we don't have the original trained under the same conditions as our upscale. However...

1. We scaled up the model without losing its quality
2. We confirmed that the method we devised works
3. After extremely short fine-tuning, the model achieved much better results in IFEval compared to the original (53.68 vs 64.29) and a higher overall average score in Open LLM Leaderboard (14.75 vs 15.17)

I consider this a big success πŸ˜‡, since surpassing the original in metrics is often very time-consuming, generates high costs, and doesn't always work out.

Meanwhile, we're moving forward, training SmolLM2 400M Instruct as an upscale of 136M.

We're curious about how increasing the base and intermediate vectors will affect the model's quality. We'll compare it to the original and the 360M Instruct version released by Hugging Face.

License: Apache 2.0​​​​​​​​​​​​​​​​

meditsolutions/SmolLM2-MedIT-Upscale-2B

Adding Evaluation Results

#1 opened about 2 months ago by
mkurman
mkurmanΒ 
posted an update 3 months ago