Kuldeep Singh Sidhu

singhsidhukuldeep

https://singhsidhukuldeep.github.io

AI & ML interests

😃 TOP 3 on HuggingFace for posts 🤗 Seeking contributors for a completely open-source 🚀 Data Science platform! singhsidhukuldeep.github.io

Recent Activity

posted an update 1 day ago

Exciting News in AI: JinaAI Releases JINA-CLIP-v2! The team at Jina AI has just released a groundbreaking multilingual multimodal embedding model that's pushing the boundaries of text-image understanding. Here's why this is a big deal: 🚀 Technical Highlights: - Dual encoder architecture combining a 561M parameter Jina XLM-RoBERTa text encoder and a 304M parameter EVA02-L14 vision encoder - Supports 89 languages with 8,192 token context length - Processes images up to 512×512 pixels with 14×14 patch size - Implements FlashAttention2 for text and xFormers for vision processing - Uses Matryoshka Representation Learning for efficient vector storage ⚡️ Under The Hood: - Multi-stage training process with progressive resolution scaling (224→384→512) - Contrastive learning using InfoNCE loss in both directions - Trained on massive multilingual dataset including 400M English and 400M multilingual image-caption pairs - Incorporates specialized datasets for document understanding, scientific graphs, and infographics - Uses hard negative mining with 7 negatives per positive sample 📊 Performance: - Outperforms previous models on visual document retrieval (52.65% nDCG@5) - Achieves 89.73% image-to-text and 79.09% text-to-image retrieval on CLIP benchmark - Strong multilingual performance across 30 languages - Maintains performance even with 75% dimension reduction (256D vs 1024D) 🎯 Key Innovation: The model solves the long-standing challenge of unifying text-only and multi-modal retrieval systems while adding robust multilingual support. Perfect for building cross-lingual visual search systems! Kudos to the research team at Jina AI for this impressive advancement in multimodal AI!

posted an update 2 days ago

Fascinating insights from @Pinterest 's latest research on improving feature interactions in recommendation systems! Pinterest's engineering team has tackled a critical challenge in their Homefeed ranking system that serves 500M+ monthly active users. Here's what makes their approach remarkable: >> Technical Deep Dive Architecture Overview • The ranking model combines dense features, sparse features, and embedding features to represent users, Pins, and context • Sparse features are processed using learnable embeddings with size based on feature cardinality • User sequence embeddings are generated using a transformer architecture processing past engagements Feature Processing Pipeline • Dense features undergo normalization for numerical stability • Sparse and embedding features receive L2 normalization • All features are concatenated into a single feature embedding Key Innovations • Implemented parallel MaskNet layers with 3 blocks • Used projection ratio of 2.0 and output dimension of 512 • Stacked 4 DCNv2 layers on top for higher-order interactions Performance Improvements • Achieved +1.42% increase in Homefeed Save Volume • Boosted Overall Time Spent by +0.39% • Maintained memory consumption increase to just 5% >> Industry Constraints Addressed Memory Management • Optimized for 60% GPU memory utilization • Prevented OOM errors while maintaining batch size efficiency Latency Optimization • Removed input-output concatenation before MLP • Reduced hidden layer sizes in MLP • Achieved zero latency increase while improving performance System Stability • Ensured reproducible results across retraining • Maintained model stability across different data distributions • Successfully deployed in production environment This work brilliantly demonstrates how to balance academic innovations with real-world industrial constraints. Kudos to the Pinterest team!

updated a Space 3 days ago

singhsidhukuldeep/posts_leaderboard

View all activity

Organizations

singhsidhukuldeep's activity

New activity in maxiw/hf-posts about 1 month ago

Update Request

#2 opened about 1 month ago by

singhsidhukuldeep

New activity in TechxGenus/Mistral-Large-Instruct-2407-AWQ 5 months ago

The model can be started using vllm, but no dialogue is possible.

#2 opened 5 months ago by

SongXiaoMao

Adding chat_template to tokenizer_config.json file

#3 opened 5 months ago by

singhsidhukuldeep

Script request

#1 opened 5 months ago by

singhsidhukuldeep

New activity in casperhansen/mistral-large-instruct-2407-awq 5 months ago

Requesting script

#1 opened 5 months ago by

singhsidhukuldeep

New activity in open-llm-leaderboard/open_llm_leaderboard 5 months ago

Increasing upper limit of `Select the number of parameters (B)` to support larger open-source models like `meta-llama/Meta-Llama-3.1-405B-Instruct`

#858 opened 5 months ago by

singhsidhukuldeep