Simon Pagezy's picture

Simon Pagezy

pagezyhf

AI & ML interests

Healthcare ML

Recent Activity

upvoted a collection 6 days ago
Cosmos
updated a model 6 days ago
pagezyhf/qwen
View all activity

Articles

Organizations

Hugging Face's profile picture AWS Inferentia and Trainium's profile picture Hugging Face Optimum's profile picture Hugging Test Lab's profile picture Hugging Face OSS Metrics's profile picture Core ML Projects's profile picture Blog-explorers's profile picture Enterprise Explorers's profile picture Paris AI Running Club's profile picture Google Cloud 🀝🏻 Hugging Face's profile picture PagezyTest's profile picture

pagezyhf's activity

replied to singhsidhukuldeep's post 10 days ago
reacted to singhsidhukuldeep's post with 🀯 10 days ago
view post
Post
1604
Excited to share insights from Walmart's groundbreaking semantic search system that revolutionizes e-commerce product discovery!

The team at Walmart Global Technology(the team that I am a part of 😬) has developed a hybrid retrieval system that combines traditional inverted index search with neural embedding-based search to tackle the challenging problem of tail queries in e-commerce.

Key Technical Highlights:

β€’ The system uses a two-tower BERT architecture where one tower processes queries and another processes product information, generating dense vector representations for semantic matching.

β€’ Product information is enriched by combining titles with key attributes like category, brand, color, and gender using special prefix tokens to help the model distinguish different attribute types.

β€’ The neural model leverages DistilBERT with 6 layers and projects the 768-dimensional embeddings down to 256 dimensions using a linear layer, achieving optimal performance while reducing storage and computation costs.

β€’ To improve model training, they implemented innovative negative sampling techniques combining product category matching and token overlap filtering to identify challenging negative examples.

Production Implementation Details:

β€’ The system uses a managed ANN (Approximate Nearest Neighbor) service to enable fast retrieval, achieving 99% recall@20 with just 13ms latency.

β€’ Query embeddings are cached with preset TTL (Time-To-Live) to reduce latency and costs in production.

β€’ The model is exported to ONNX format and served in Java, with custom optimizations like fixed input shapes and GPU acceleration using NVIDIA T4 processors.

Results:
The system showed significant improvements in both offline metrics and live experiments, with:
- +2.84% improvement in NDCG@10 for human evaluation
- +0.54% lift in Add-to-Cart rates in live A/B testing

This is a fantastic example of how modern NLP techniques can be successfully deployed at scale to solve real-world e-
  • 1 reply
Β·
reacted to julien-c's post with πŸ”₯❀️ about 1 month ago
view post
Post
8232
After some heated discussion πŸ”₯, we clarify our intent re. storage limits on the Hub

TL;DR:
- public storage is free, and (unless blatant abuse) unlimited. We do ask that you consider upgrading to PRO and/or Enterprise Hub if possible
- private storage is paid above a significant free tier (1TB if you have a paid account, 100GB otherwise)

docs: https://huggingface.co/docs/hub/storage-limits

We optimize our infrastructure continuously to scale our storage for the coming years of growth in Machine learning, to the benefit of the community πŸ”₯

cc: @reach-vb @pierric @victor and the HF team
Β·
reacted to merve's post with ❀️ about 1 month ago
view post
Post
5600
This week in open-source AI was insane 🀠 A small recapπŸ•ΊπŸ» merve/dec-6-releases-67545caebe9fc4776faac0a3

Multimodal πŸ–ΌοΈ
> Google shipped a PaliGemma 2, new iteration of PaliGemma with more sizes: 3B, 10B and 28B, with pre-trained and captioning variants πŸ‘
> OpenGVLab released InternVL2, seven new vision LMs in different sizes, with sota checkpoint with MIT license ✨
> Qwen team at Alibaba released the base models of Qwen2VL models with 2B, 7B and 72B ckpts

LLMs πŸ’¬
> Meta released a new iteration of Llama 70B, Llama3.2-70B trained further
> EuroLLM-9B-Instruct is a new multilingual LLM for European languages with Apache 2.0 license πŸ”₯
> Dataset: CohereForAI released GlobalMMLU, multilingual version of MMLU with 42 languages with Apache 2.0 license
> Dataset: QwQ-LongCoT-130K is a new dataset to train reasoning models
> Dataset: FineWeb2 just landed with multilinguality update! πŸ”₯ nearly 8TB pretraining data in many languages!

Image/Video Generation πŸ–ΌοΈ
> Tencent released HunyuanVideo, a new photorealistic video generation model
> OminiControl is a new editing/control framework for image generation models like Flux

Audio πŸ”Š
> Indic-Parler-TTS is a new text2speech model made by community
posted an update about 1 month ago
reacted to fdaudens's post with ❀️ about 1 month ago
view post
Post
1755
Keeping up with open-source AI in 2024 = overwhelming.

Here's help: We're launching our Year in Review on what actually matters, starting today!

Fresh content dropping daily until year end. Come along for the ride - first piece out now with @clem 's predictions for 2025.

Think of it as your end-of-year AI chocolate calendar.

Kudos to @BrigitteTousi @clefourrier @Wauplin @thomwolf for making it happen. We teamed up with aiworld.eu for awesome visualizations to make this digestibleβ€”it's a charm to work with their team.

Check it out: huggingface/open-source-ai-year-in-review-2024
posted an update about 1 month ago
view post
Post
971
It’s 2nd of December , here’s your Cyber Monday present 🎁 !

We’re cutting our price down on Hugging Face Inference Endpoints and Spaces!

Our folks at Google Cloud are treating us with a 40% price cut on GCP Nvidia A100 GPUs for the next 3️⃣ months. We have other reductions on all instances ranging from 20 to 50%.

Sounds like the time to give Inference Endpoints a try? Get started today and find in our documentation the full pricing details.
https://ui.endpoints.huggingface.co/
https://huggingface.co/pricing
reacted to Xenova's post with πŸš€ about 2 months ago
view post
Post
3976
We just released Transformers.js v3.1 and you're not going to believe what's now possible in the browser w/ WebGPU! 🀯 Let's take a look:
πŸ”€ Janus from Deepseek for unified multimodal understanding and generation (Text-to-Image and Image-Text-to-Text)
πŸ‘οΈ Qwen2-VL from Qwen for dynamic-resolution image understanding
πŸ”’ JinaCLIP from Jina AI for general-purpose multilingual multimodal embeddings
πŸŒ‹ LLaVA-OneVision from ByteDance for Image-Text-to-Text generation
πŸ€Έβ€β™€οΈ ViTPose for pose estimation
πŸ“„ MGP-STR for optical character recognition (OCR)
πŸ“ˆ PatchTST & PatchTSMixer for time series forecasting

That's right, everything running 100% locally in your browser (no data sent to a server)! πŸ”₯ Huge for privacy!

Check out the release notes for more information. πŸ‘‡
https://github.com/huggingface/transformers.js/releases/tag/3.1.0

Demo link (+ source code): webml-community/Janus-1.3B-WebGPU
reacted to AdinaY's post with πŸ”₯ about 2 months ago
view post
Post
1606
🌊 The wave of reasoning models from the Chinese community has arrived!

πŸš€ Marco-o1 by AIDC, Alibaba
πŸ‘‰ AIDC-AI/Marco-o1

✨ QwQ by Qwen, Alibaba
πŸ‘‰ Qwen/qwq-674762b79b75eac01735070a

🌟 Skywork-o1 by Kunlun Tech
πŸ‘‰ Skywork/skywork-o1-open-67453df58e12f6c3934738d0

πŸ”₯ Xkev/Llama-3.2V-11B-cot by PKU Yuan group
πŸ‘‰ Xkev/Llama-3.2V-11B-cot

πŸ’‘ DeepSeek-R1-Lite-Preview by DeepSeek AI
πŸ‘‰ https://chat.deepseek.com/

πŸ” InternThinker Preview by Shanghai AI Lab
πŸ‘‰ https://sso.openxlab.org.cn/login?redirect=https://internlm-chat.intern-ai.org.cn/&clientId=ebmrvod6yo0nlzaek1yp

πŸ“˜ k0-math by Moonshot AI
πŸš€ https://kimi.moonshot.cn/ ( coming soon! )

Who's next? πŸ‘€
zh-ai-community/reasoning-models-67409fb3aa1ed78f10087cd7
posted an update about 2 months ago
view post
Post
302
Hello Hugging Face Community,

if you use Google Kubernetes Engine to host you ML workloads, I think this series of videos is a great way to kickstart your journey of deploying LLMs, in less than 10 minutes! Thank you @wietse-venema-demo !

To watch in this order:
1. Learn what are Hugging Face Deep Learning Containers
https://youtu.be/aWMp_hUUa0c?si=t-LPRkRNfD3DDNfr

2. Learn how to deploy a LLM with our Deep Learning Container using Text Generation Inference
https://youtu.be/Q3oyTOU1TMc?si=V6Dv-U1jt1SR97fj

3. Learn how to scale your inference endpoint based on traffic
https://youtu.be/QjLZ5eteDds?si=nDIAirh1r6h2dQMD

If you want more of these small tutorials and have any theme in mind, let me know!
posted an update about 2 months ago
view post
Post
1361
Hello Hugging Face Community,

I'd like to share here a bit more about our Deep Learning Containers (DLCs) we built with Google Cloud, to transform the way you build AI with open models on this platform!

With pre-configured, optimized environments for PyTorch Training (GPU) and Inference (CPU/GPU), Text Generation Inference (GPU), and Text Embeddings Inference (CPU/GPU), the Hugging Face DLCs offer:

⚑ Optimized performance on Google Cloud's infrastructure, with TGI, TEI, and PyTorch acceleration.
πŸ› οΈ Hassle-free environment setup, no more dependency issues.
πŸ”„ Seamless updates to the latest stable versions.
πŸ’Ό Streamlined workflow, reducing dev and maintenance overheads.
πŸ”’ Robust security features of Google Cloud.
☁️ Fine-tuned for optimal performance, integrated with GKE and Vertex AI.
πŸ“¦ Community examples for easy experimentation and implementation.
πŸ”œ TPU support for PyTorch Training/Inference and Text Generation Inference is coming soon!

Find the documentation at https://huggingface.co/docs/google-cloud/en/index
If you need support, open a conversation on the forum: https://discuss.huggingface.co/c/google-cloud/69
reacted to fdaudens's post with πŸ”₯ 4 months ago
view post
Post
2088
IBM & NASA just released open-source AI model for weather & climate on Hugging Face.

Prithvi WxC offers insights beyond forecasting, tackling challenges from local weather to global climate. Potential apps: targeted forecasts, severe weather detection & more. https://huggingface.co/Prithvi-WxC

This is impressive. Check out this comparison of the Ida hurricane between ground truth and the AI model's prediction.
reacted to alvarobartt's post with πŸ”₯ 4 months ago
view post
Post
2897
πŸ€— Serving Meta Llama 3.1 405B on Google Cloud is now possible via the Hugging Face Deep Learning Containers (DLCs) for Text Generation Inference (TGI)

In this post, we showcase how to deploy https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8 on an A3 instance with 8 x H100 GPUs on Vertex AI

Thanks to the Hugging Face DLCs for TGI and Google Cloud Vertex AI, deploying a high-performance text generation container for serving Large Language Models (LLMs) has never been easier. And we’re not going to stop here – stay tuned as we enable more experiences to build AI with open models on Google Cloud!

Read the full post at https://huggingface.co/blog/llama31-on-vertex-ai
reacted to clem's post with πŸ”₯ 4 months ago
view post
Post
4134
Just crossed 200,000 free public AI datasets shared by the community on Hugging Face! Text, image, video, audio, time-series & many more... Thanks everyone!

http://hf.co/datasets