primeLine Research Community

community

AI & ML interests

None defined yet.

Recent Activity

pL-Community's activity

DavidGFย 
posted an update 3 months ago
view post
Post
3044
๐ŸŽ‰ Celebrating One Year of #SauerkrautLM with Two Groundbreaking Releases!

We're thrilled to announce the release of SauerkrautLM-v2-14b in two specialized versions: VAGOsolutions/SauerkrautLM-v2-14b-SFT and VAGOsolutions/SauerkrautLM-v2-14b-DPO. Built on the robust Qwen2.5-14B foundation, these models represent a significant leap forward in multilingual AI capabilities.

๐Ÿ”ฌ Technical Breakthroughs:
๐Ÿ’  Innovative three-phase Fine-Tuning approach
๐Ÿ’  Two-step Spectrum SFT + one-step Spectrum DPO optimization phase for enhanced performance
๐Ÿ’  Balance of German and English language capabilities
๐Ÿ’  Advanced function calling - almost on par with Claude-3.5-Sonnet-20240620

๐Ÿ‡ฉ๐Ÿ‡ช German Language Excellence:
What sets this release apart is our unique achievement in simultaneously improving both German and English capabilities. Through our specialized training approach with over 1.2B tokens across two phases, we've managed to:
๐Ÿ’  Enhance German language understanding and generation (SFT Version > DPO Version)
๐Ÿ’  Maintain authentic German linguistic nuances
๐Ÿ’  Improve cross-lingual capabilities
๐Ÿ’  Preserve cultural context awareness

๐Ÿ“Š Training Innovation:
Our three-phase approach targeted specific layer percentages (15%, 20% and 25%) with carefully curated datasets, including:
๐Ÿ’  Mathematics-focused content (proprietary classifier-selected)
๐Ÿ’  High-quality German training data
๐Ÿ’  Specialized function calling datasets
๐Ÿ’  Premium multilingual content

๐ŸŽ Community Contribution:
We're also releasing two new datasets in a few days:
1๏ธโƒฃ SauerkrautLM-Fermented-GER-DPO: 3,300 high-quality German training samples
2๏ธโƒฃ SauerkrautLM-Fermented-Irrelevance-GER-DPO: 2,000 specialized samples for optimized function call irrelevance handling

Thank you to our incredible community and partners who have supported us throughout this journey. Here's to another year of AI innovation!ย ๐Ÿš€
flozi00ย 
posted an update 7 months ago
view post
Post
2090
๐ŸŒŸ Progress in the German FineWeb edu reproduction ๐ŸŒŸ

We're delighted to share the launch of our new Data Quality Classification Model, designed specifically for evaluating educational content in German. This tool uses advanced machine learning techniques to assess texts across all educational levels, from primary school to university.

๐Ÿ” Inspired by Huggingface's fine web edu dataset, we've worked hard to refine data classification methods ensuring educators and learners access top-quality resources.
We're excited about the future as we continue improving our models and expanding our datasets.

Access the model here: pL-Community/GermanEduScorer-Qwen2-1.5b

๐Ÿ™ A huge thank you to David and Daryoush from Vago Solutions; Bjรถrn and Jan from Ellamind / DiscoResearch for their expert insights throughout this project. Your support has been crucial.
This project was made possible by the support of PrimeLine AI.
  • 2 replies
ยท
flozi00ย 
updated a Space 8 months ago
DavidGFย 
posted an update 8 months ago
view post
Post
1506
Introducing Kraken-LoRA โ€“ a lightweight version of Kraken that uses LoRA-Adapters as Experts based on the base model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken-LoRA!

๐Ÿ” Whatโ€™s the big deal?

โœ… Size Consistency: While Krakenโ€™s size increases with more Experts, Kraken-LoRA remains as compact as the base model (e.g., 8b if you use Meta-Llama3-8b-Instruct).
โœ… VRAM Efficiency: Kraken-LoRA is highly VRAM efficient, maintaining the power of all experts without the bloat.
โœ… Dynamic Adaptation: LoRA adapters are applied dynamically at runtime, following the routing process.
โœ… High Efficiency: Enjoy increased efficiency without compromising performance, as long as the LoRA adapters match the base model.

๐Ÿ’ก Conclusion: Kraken-LoRA empowers businesses to experience enhanced flexibility and performance from our architecture, enabling further scalability without sacrificing performance.

Check out the model here: VAGOsolutions/Kraken-LoRA
Explore the code here: https://github.com/cognitivecomputations/kraken/tree/main/Kraken-LoRA

Have fun with Kraken-LoRA! ๐Ÿ™
DavidGFย 
posted an update 8 months ago
view post
Post
1587
The kraken has awakened!
A Game-Changer in LLM Flexibility and Performance!

Over the past few weeks, VAGO solutions teamed up with Cognitive Computations and HyperSpace to develop a groundbreaking architecture that redefines flexibility in combining different LLM into one model.

@fernandofernandes , me, @Crystalcareai , @ehartford created the Kraken!

What Can It Do? ๐Ÿ™
โœ… Versatile Architecture: Kraken allows the seamless combination of LLMs with varying sizes, quantizations, and model architectures. It currently supports quantizations in 4-bit, 8-bit, and AWQ, with more on the way. And it runs on Hugging Face Transformers 4.40+

โœ… Kraken Router: Utilizing a custom sequence classification model with a context length of 32k tokens, The Kraken Router directs inputs to the most suitable Expert based on their characteristics.

โœ… Adaptability: Enhanced input formatting supports the modelโ€™s adaptability to diverse conversational contexts.

โœ… Extreme Versatility: Easily swap experts within Kraken for your specific use cases without retraining the entire model. For example, if you've built a Kraken for coding in Python you can upgrade your Python model without retraining the router or add a C# model by retraining the router.

โœ… Open Source Pipeline: Weโ€™re sharing the entire pipeline, including router creation, training, architecture setup, and Kraken inference, on JupyterNotebooks: https://github.com/cognitivecomputations/kraken

Kraken marks the beginning of an exciting new journey in #OpenSource LLM. Why? Because it empowers the open source community in accelerating the catch-up process to proprietary LLMs like #GPT and #Claude ๐Ÿคฉ

We proudly introduce the very first 2 Kraken models, that integrates top-tier LLM and Multilingual capabilities:
cognitivecomputations/Kraken
VAGOsolutions/Kraken-Multilingual
Right now it's supported by Hugging Face transformers library. Would love to see the integration into VLM and TGWI!
DavidGFย 
posted an update 9 months ago
view post
Post
1735
Please... feed this Llama some Sauerkraut! ๐Ÿฒ

Said and done. Here it is. Our Sauerkraut Version of the strong Llama3-8b by Meta. Released from HANNOVER MESSE, just in front of meta booth.
VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

According to benchmarks (LM-Evaluation-Harness 0.4.2), our #SauerkrautLM Dataset and fine-tuning pipeline improved the Model noticeably (AVG = 74,57), especially Reasoning and Common Sense capabilities.

Again we provide some more detail on the whole process:
โœ… Original model: Llama-3-8b-Instruct
โœ… Training Duration: 12 hours
โœ… Training procedure: 2-staged DPO
โœ… Trained data: 70k (first stage) and 20k (second stage)
โœ… GPU: 4x RTX6000 ADA
โœ… New model: Llama-3-SauerkrautLM-8b-Instruct
โœ… Total training costs: 54,72 Dollar ๐Ÿ’ด - RunPod FTW (excluding synthesizing data, curating data, benchmarks, error handling, testing)

See our model card on Hugging Face for more details: VAGOsolutions/Llama-3-SauerkrautLM-8b-Instruct

There will be more details on benchmarks during the next days.
DavidGFย 
posted an update 10 months ago
view post
Post
3046
"How expensive is it actually to teach a #LanguageModel German through #finetuning ๐Ÿ’ฐ๐Ÿ’ฐ๐Ÿ’ฐ? We get asked this quite often.

There is no one-size-fits-all answer to this question, as among other factors:
โน each fine-tuning is different,
โน the hardware used can be a major cost driver,
โน the amount and type of training data can extend the process,
โน and the skills to be trained can increase the difficulty of fine-tuning.

However, we have broken down the costs incurred for our latest fine-tune ( VAGOsolutions/SauerkrautLM-Qwen-32b)


Base model: Qwen/Qwen1.5-32B
Fine-Tuning Goal: Train German language
Training dataset size: 160,000 SFT data / 110,000 DPO data
Training duration: 72.5 hours (2 epochs SFT / 1 epoch DPO)
GPU: 2x A100 SXM
New model: VAGOsolutions/SauerkrautLM-Qwen-32b

Total cost: 312 euros ๐Ÿ’ถ

These are quite reasonable training costs considering the model now speaks passable German (previously very broken). Depending on the use case and process requirements, this can even be a real alternative to the costly continuous pre-training of foreign language models.