view post Post 497 How to deploy compressed ML models in your pipeline? We wrote a series of blogs on this topics. Hope that it can be helpful to people:- Standard Model Compression in ML Pipeline: https://www.pruna.ai/blog/standard-model-compression-ml-pipeline- Boost Your Replicate Models with Pruna AI: A Step-by-Step Guide: https://www.pruna.ai/blog/guide-replicate-pruna-ai- Pruna + Triton: A Winning Combination for High-Performance AI Deployments: https://www.pruna.ai/blog/pruna-triton-combinationFeel free to join our discord (https://discord.com/invite/rskEr4BZJx) if you have questions ;) See translation 👀 1 1 + Reply
view post Post 1882 We compressed SmolLMs to make 135 variations of them (see https://huggingface.co/PrunaAI?search_models=smolLM) with different quantization configurations with pruna (https://docs.pruna.ai/en/latest/). We made a blog to summarize our findings (see https://www.pruna.ai/blog/smollm2-smaller-faster) and small LM can be made smaller! :) See translation 3 replies · 🚀 9 9 🔥 2 2 + Reply