Abhay Gupta's picture

1 10 2

Abhay Gupta

abhaygupta

·

AI & ML interests

LLM Training & Inference; Sparsity and Quantization

Recent Activity

upvoted a collection about 1 month ago

liked a dataset about 1 month ago

Magpie-Align/Magpie-Reasoning-V2-250K-CoT-Llama3

liked a dataset about 1 month ago

argilla/magpie-ultra-v1.0

View all activity

Organizations

abhaygupta's activity

upvoted a collection about 1 month ago

R3GAN

R3GAN: A Modern BaselineGAN https://github.com/brownvc/R3GAN/ https://arxiv.org/abs/2501.05441 • 7 items • Updated about 1 month ago • 9

upvoted 5 papers 6 months ago

BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Paper • 2408.08274 • Published Aug 15, 2024 • 13

DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search

Paper • 2408.08152 • Published Aug 15, 2024 • 54

xGen-MM (BLIP-3): A Family of Open Large Multimodal Models

Paper • 2408.08872 • Published Aug 16, 2024 • 98

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 51

MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models

Paper • 2408.02718 • Published Aug 5, 2024 • 61

upvoted a paper 9 months ago

Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment

Paper • 2405.03594 • Published May 6, 2024 • 7

upvoted 3 collections 11 months ago

DBRX

DBRX is a mixture-of-experts (MoE) large language model trained from scratch by Databricks. • 3 items • Updated Mar 27, 2024 • 94

Sparse Foundational Llama 2 Models

Sparse pre-trained and fine-tuned Llama models made by Neural Magic + Cerebras • 27 items • Updated Sep 26, 2024 • 9

Cerebras LLaVA

Cerebras implementation and training recipes related to multimodal LLaVA models • 4 items • Updated Aug 21, 2024 • 1