Nikita Arsenin

lumirey

AI & ML interests

None yet

Recent Activity

upvoted a paper 24 days ago

Facilitating large language model Russian adaptation with Learned Embedding Propagation

liked a model about 2 months ago

nvidia/Hymba-1.5B-Instruct

upvoted a paper about 2 months ago

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

View all activity

Organizations

None yet

lumirey's activity

upvoted a paper 24 days ago

Facilitating large language model Russian adaptation with Learned Embedding Propagation

Paper • 2412.21140 • Published 25 days ago • 16

liked a model about 2 months ago

nvidia/Hymba-1.5B-Instruct

Text Generation • Updated 22 days ago • 4.3k • 222

upvoted a paper about 2 months ago

O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?

Paper • 2411.16489 • Published Nov 25, 2024 • 42

upvoted a paper 8 months ago

Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality

Paper • 2405.21060 • Published May 31, 2024 • 64

upvoted a paper 9 months ago

Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing

Paper • 2404.12253 • Published Apr 18, 2024 • 55

upvoted 6 papers 11 months ago

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11, 2024 • 91

Adding NVMe SSDs to Enable and Accelerate 100B Model Fine-tuning on a Single GPU

Paper • 2403.06504 • Published Mar 11, 2024 • 53

GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6, 2024 • 184

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29, 2024 • 53

Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29, 2024 • 50

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 137

liked a model 11 months ago

sander-wood/bgpt

Updated Mar 17, 2024 • 33

liked a Space 11 months ago

Running

💻

BigCode - Editor

upvoted 2 papers 11 months ago

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29, 2024 • 33

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16, 2024 • 80

upvoted a paper about 1 year ago

Kandinsky 3.0 Technical Report

Paper • 2312.03511 • Published Dec 6, 2023 • 44