Mayank Mishra's picture

Mayank Mishra

mayank-mishra

·

https://mayank31398.github.io/

AI & ML interests

Large Language Models, Distributed Training and Inference

Recent Activity

authored a paper 10 days ago

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

updated a model 23 days ago

ibm-research/moe-7b-1b-active-shared-experts

published a model 23 days ago

ibm-research/moe-7b-1b-active-shared-experts

View all activity

Organizations

mayank-mishra's activity

authored a paper 10 days ago

Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping

Paper • 2501.06589 • Published Jan 11

updated a model 23 days ago

ibm-research/moe-7b-1b-active-shared-experts

Updated 23 days ago • 2.05k • 2

published a model 23 days ago

ibm-research/moe-7b-1b-active-shared-experts

Updated 23 days ago • 2.05k • 2

updated a model 28 days ago

ibm-granite/granite-3.2-8b-instruct-preview

Text Generation • Updated 8 days ago • 9.24k • 66

New activity in ibm-granite/granite-3.1-2b-instruct about 2 months ago

RE-ADD float32 please.

#3 opened 2 months ago by

ctranslate2-4you

upvoted a collection 3 months ago

Granite 3.1 Language Models

A series of language models with 128K context length trained by IBM licensed under Apache 2.0 license. • 9 items • Updated 10 days ago • 58

New activity in ibm-granite/granite-3.1-8b-instruct 3 months ago

Exceptional creative writer

#1 opened 3 months ago by

authored a paper 3 months ago

Selective Self-Rehearsal: A Fine-Tuning Approach to Improve Generalization in Large Language Models

Paper • 2409.04787 • Published Sep 7, 2024 • 1

upvoted a paper 4 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 24

upvoted a collection 4 months ago

SmolLM2

State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated 14 days ago • 244

New activity in ibm-granite/granite-3.0-2b-instruct 4 months ago

add base model metadata

#3 opened 4 months ago by

New activity in ibm-granite/granite-3.0-8b-instruct 4 months ago

add base model metadata

#5 opened 4 months ago by

New activity in ibm-granite/granite-3.0-1b-a400m-instruct 4 months ago

Add base model metadata

#2 opened 4 months ago by

upvoted a collection 5 months ago

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 10 days ago • 96

updated 2 collections 5 months ago

Power-LM

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated Oct 17, 2024 • 15

Granite 3.0 Language Models

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated 10 days ago • 96

New activity in ibm-research/PowerMoE-3b 6 months ago

torch and llama.cpp integration

#1 opened 6 months ago by

updated a collection 6 months ago

Granite Code Models

A series of code models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 23 items • Updated 10 days ago • 184