Dmitry Ryumin's picture

Dmitry Ryumin

DmitryRyumin

·

https://dmitryryumin.github.io

DmitryRyumin

AI & ML interests

Machine Learning and Applications, Multi-Modal Understanding

Recent Activity

liked a Space about 12 hours ago

nanotron/ultrascale-playbook

liked a model about 16 hours ago

yandex/YandexGPT-5-Lite-8B-pretrain

upvoted a paper 1 day ago

SurveyX: Academic Survey Automation via Large Language Models

View all activity

Organizations

DmitryRyumin's activity

upvoted a paper 1 day ago

SurveyX: Academic Survey Automation via Large Language Models

Paper • 2502.14776 • Published 5 days ago • 80

upvoted a collection 21 days ago

DeepSeek R1 (All Versions)

DeepSeek R1 - the most powerful reasoning open-source model - available in GGUF, original & 4-bit formats. Includes Llama & Qwen distilled models. • 29 items • Updated 18 days ago • 199

upvoted a paper 23 days ago

Tensor Product Attention Is All You Need

Paper • 2501.06425 • Published Jan 11 • 84

upvoted a collection 3 months ago

KaLM-embedding

9 items • Updated 5 days ago • 23

upvoted 5 papers 5 months ago

FAN: Fourier Analysis Networks

Paper • 2410.02675 • Published Oct 3, 2024 • 25

Differential Transformer

Paper • 2410.05258 • Published Oct 7, 2024 • 171

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Paper • 2410.01036 • Published Oct 1, 2024 • 15

HeadGAP: Few-shot 3D Head Avatar via Generalizable Gaussian Priors

Paper • 2408.06019 • Published Aug 12, 2024 • 15

Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction

Paper • 2409.18124 • Published Sep 26, 2024 • 32

upvoted a collection 5 months ago

Llama 3.2

Meta's new Llama 3.2 vision and text models including 1B, 3B, 11B and 90B. Includes GGUF, 4-bit bnb and original versions. • 27 items • Updated 20 days ago • 54

upvoted 3 articles 5 months ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18, 2024

• 223

Article

Exploring the Daily Papers Page on Hugging Face

Sep 23, 2024

• 50

Article

XetHub is joining Hugging Face!

Aug 8, 2024

• 82

upvoted a collection 5 months ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 529

upvoted 5 papers 6 months ago

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3, 2024 • 78

ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Paper • 2408.15496 • Published Aug 28, 2024 • 11

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27, 2024 • 39

The Russian-focused embedders' exploration: ruMTEB benchmark and Russian embedding model design

Paper • 2408.12503 • Published Aug 22, 2024 • 24

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22, 2024 • 64

upvoted a collection 6 months ago

Jamba-1.5

The AI21 Jamba family of models are state-of-the-art, hybrid SSM-Transformer instruction following foundation models • 2 items • Updated 8 days ago • 84