Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2403.02884

KAN or MLP: A Fairer Comparison

Paper • 2407.16674 • Published Jul 23 • 41
MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5 • 15
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20 • 46
DPTDR: Deep Prompt Tuning for Dense Passage Retrieval

Paper • 2208.11503 • Published Aug 24, 2022

MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5 • 15
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 69
Improving Small Language Models' Mathematical Reasoning via Mix Thoughts Distillation

Paper • 2401.11864 • Published Jan 22 • 2
Common 7B Language Models Already Possess Strong Math Capabilities

Paper • 2403.04706 • Published Mar 7 • 16

daily_paper_coll

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
Beyond Language Models: Byte Models are Digital World Simulators

Paper • 2402.19155 • Published Feb 29 • 49
StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 134
Simple linear attention language models balance the recall-throughput tradeoff

Paper • 2402.18668 • Published Feb 28 • 18

Datasets - Math

introspector/unimath

Updated Feb 12 • 5.98k • 6
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms

Paper • 1905.13319 • Published May 30, 2019 • 2
Measuring Mathematical Problem Solving With the MATH Dataset

Paper • 2103.03874 • Published Mar 5, 2021 • 3
MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5 • 15

Orca-Math: Unlocking the potential of SLMs in Grade School Math

Paper • 2402.14830 • Published Feb 16 • 24
MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5 • 15
meta-math/MetaMath-Mistral-7B

Text Generation • Updated Dec 21, 2023 • 3.38k • 90
meta-math/MetaMath-13B-V1.0

Text Generation • Updated Dec 21, 2023 • 1.24k • 13

Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 99
Teaching Large Language Models to Reason with Reinforcement Learning

Paper • 2403.04642 • Published Mar 7 • 46
PERL: Parameter Efficient Reinforcement Learning from Human Feedback

Paper • 2403.10704 • Published Mar 15 • 57
MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5 • 15

How to Train Data-Efficient LLMs

Paper • 2402.09668 • Published Feb 15 • 38
Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 77
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182
MathScale: Scaling Instruction Tuning for Mathematical Reasoning

Paper • 2403.02884 • Published Mar 5 • 15

Rethinking Optimization and Architecture for Tiny Language Models

Paper • 2402.02791 • Published Feb 5 • 12
More Agents Is All You Need

Paper • 2402.05120 • Published Feb 3 • 51
Scaling Laws for Forgetting When Fine-Tuning Large Language Models

Paper • 2401.05605 • Published Jan 11
Aligning Large Language Models with Counterfactual DPO

Paper • 2401.09566 • Published Jan 17 • 2

System 2 Attention (is something you might need too)

Paper • 2311.11829 • Published Nov 20, 2023 • 39
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems

Paper • 2311.11315 • Published Nov 19, 2023 • 6
Alignment for Honesty

Paper • 2312.07000 • Published Dec 12, 2023 • 11
Steering Llama 2 via Contrastive Activation Addition

Paper • 2312.06681 • Published Dec 9, 2023 • 11

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs