Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2404.07129

Papers - ResNet

Wide Residual Networks

Paper • 1605.07146 • Published May 23, 2016 • 2
Characterizing signal propagation to close the performance gap in unnormalized ResNets

Paper • 2101.08692 • Published Jan 21, 2021 • 2
Pareto-Optimal Quantized ResNet Is Mostly 4-bit

Paper • 2105.03536 • Published May 7, 2021 • 2
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

Paper • 2106.01548 • Published Jun 3, 2021 • 2

Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29 • 22
Scaling Laws of RoPE-based Extrapolation

Paper • 2310.05209 • Published Oct 8, 2023 • 6
Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Paper • 2404.12387 • Published Apr 18 • 38
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 124

Papers - Attention

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
ChunkAttention: Efficient Self-Attention with Prefix-Aware KV Cache and Two-Phase Partition

Paper • 2402.15220 • Published Feb 23 • 19
Sequence Parallelism: Long Sequence Training from System Perspective

Paper • 2105.13120 • Published May 26, 2021 • 5

🔍 Daily Picks in Interpretability & Analysis of LMs

Outstanding research in interpretability and evaluation of language models, summarized

Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics

Paper • 2410.21272 • Published 13 days ago • 1
Llama Scope: Extracting Millions of Features from Llama-3.1-8B with Sparse Autoencoders

Paper • 2410.20526 • Published 14 days ago • 1
Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering

Paper • 2410.15999 • Published 20 days ago • 17
Decomposing The Dark Matter of Sparse Autoencoders

Paper • 2410.14670 • Published 23 days ago • 1

Previous
1
2
3
4
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs