Kristaller486's picture

Kristaller486

kristaller486

·

AI & ML interests

NLP, Machine Translation

Recent Activity

updated a dataset 3 days ago

kristaller486/Nebo-T1-Russian

posted an update 4 days ago

Nebo-T1-Russian (Probably) the first "longCoT" dataset for the Russian language created via Deeseek-R1. - Prompts taken from the Sky-T1 dataset and translated via Llama3.3-70B. - Answers and reasoning generated by Deepseek-R1 (685B). - 16.4K samples in total, ≈12.4K Russian-only (in the rest, either the answer or reasoning is in English). - Languages in the answers and reasoning are labeled using fasttext. https://huggingface.co/datasets/kristaller486/Nebo-T1-Russian

published a dataset 4 days ago

kristaller486/Nebo-T1-Russian

View all activity

Organizations

kristaller486's activity

upvoted an article 8 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

8 days ago

• 625

upvoted a collection 14 days ago

EvaByte

3 items • Updated 14 days ago • 3

upvoted a paper about 1 month ago

Facilitating large language model Russian adaptation with Learned Embedding Propagation

Paper • 2412.21140 • Published Dec 30, 2024 • 16

upvoted a collection about 1 month ago

DeepSeek-V3

3 items • Updated about 1 month ago • 174

upvoted a collection about 2 months ago

FineWeb2 Collaborative Annotation Sprint

5 items • Updated Dec 24, 2024 • 6

upvoted a paper 2 months ago

Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis

Paper • 2412.01819 • Published Dec 2, 2024 • 35

upvoted a paper 3 months ago

Multi-Granularity Prediction for Scene Text Recognition

Paper • 2209.03592 • Published Sep 8, 2022 • 2

upvoted a collection 3 months ago

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 275

upvoted 2 papers 3 months ago

Constraint Back-translation Improves Complex Instruction Following of Large Language Models

Paper • 2410.24175 • Published Oct 31, 2024 • 17

Language Models can Self-Lengthen to Generate Long Texts

Paper • 2410.23933 • Published Oct 31, 2024 • 17

upvoted a collection 4 months ago

DocLayout-YOLO

Dataset and model for DocLayout-YOLO • 10 items • Updated 22 days ago • 12

upvoted a collection 5 months ago

Qwen2.5

Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Nov 28, 2024 • 501

upvoted 4 papers 5 months ago

GRIN: GRadient-INformed MoE

Paper • 2409.12136 • Published Sep 18, 2024 • 16

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Paper • 2409.08239 • Published Sep 12, 2024 • 17

PingPong: A Benchmark for Role-Playing Language Models with User Emulation and Multi-Model Evaluation

Paper • 2409.06820 • Published Sep 10, 2024 • 64

Strategic Chain-of-Thought: Guiding Accurate Reasoning in LLMs through Strategy Elicitation

Paper • 2409.03271 • Published Sep 5, 2024 • 2

upvoted a paper 7 months ago

Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients

Paper • 2407.08296 • Published Jul 11, 2024 • 31

upvoted a paper 8 months ago

Depth Anything V2

Paper • 2406.09414 • Published Jun 13, 2024 • 97

upvoted 2 papers 9 months ago

Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

Paper • 2405.13929 • Published May 22, 2024 • 54

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14, 2024 • 22