Collections
Discover the best community collections!
Collections including paper arxiv:2406.09900
-
Compression Represents Intelligence Linearly
Paper • 2404.09937 • Published • 27 -
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Paper • 2404.06395 • Published • 21 -
Long-context LLMs Struggle with Long In-context Learning
Paper • 2404.02060 • Published • 35 -
Are large language models superhuman chemists?
Paper • 2404.01475 • Published • 16
-
OLMo: Accelerating the Science of Language Models
Paper • 2402.00838 • Published • 80 -
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Paper • 2403.05530 • Published • 60 -
StarCoder: may the source be with you!
Paper • 2305.06161 • Published • 29 -
SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling
Paper • 2312.15166 • Published • 56
-
TheBloke/Wizard-Vicuna-13B-Uncensored-GPTQ
Text Generation • Updated • 8.43k • 319 -
TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ
Text Generation • Updated • 57k • 134 -
mistralai/Mixtral-8x7B-Instruct-v0.1
Text Generation • Updated • 742k • • 4.19k -
TheBloke/MixtralOrochi8x7B-GPTQ
Text Generation • Updated • 14 • 7
-
Efficient LLM Inference on CPUs
Paper • 2311.00502 • Published • 7 -
Exponentially Faster Language Modelling
Paper • 2311.10770 • Published • 118 -
Cached Transformers: Improving Transformers with Differentiable Memory Cache
Paper • 2312.12742 • Published • 12 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 258
-
DeepSpeed Ulysses: System Optimizations for Enabling Training of Extreme Long Sequence Transformer Models
Paper • 2309.14509 • Published • 17 -
LLM Augmented LLMs: Expanding Capabilities through Composition
Paper • 2401.02412 • Published • 36 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 42 -
Tuning Language Models by Proxy
Paper • 2401.08565 • Published • 20
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 22 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 16 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 9 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 8