Collections
Discover the best community collections!
Collections including paper arxiv:2305.19466
-
Length Generalization of Causal Transformers without Position Encoding
Paper • 2404.12224 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2
-
LIMA: Less Is More for Alignment
Paper • 2305.11206 • Published • 21 -
Garment3DGen: 3D Garment Stylization and Texture Generation
Paper • 2403.18816 • Published • 21 -
EgoLifter: Open-world 3D Segmentation for Egocentric Perception
Paper • 2403.18118 • Published • 10 -
The Unreasonable Ineffectiveness of the Deeper Layers
Paper • 2403.17887 • Published • 78
-
Beyond A*: Better Planning with Transformers via Search Dynamics Bootstrapping
Paper • 2402.14083 • Published • 47 -
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Paper • 2305.13245 • Published • 5 -
Training a T5 Using Lab-sized Resources
Paper • 2208.12097 • Published • 1 -
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Paper • 2212.05055 • Published • 5
-
The Impact of Positional Encoding on Length Generalization in Transformers
Paper • 2305.19466 • Published • 2 -
Latent Positional Information is in the Self-Attention Variance of Transformer Language Models Without Positional Embeddings
Paper • 2305.13571 • Published • 2 -
Position Prediction as an Effective Pretraining Strategy
Paper • 2207.07611 • Published • 1 -
Transformer Language Models without Positional Encodings Still Learn Positional Information
Paper • 2203.16634 • Published • 5
-
Cure the headache of Transformers via Collinear Constrained Attention
Paper • 2309.08646 • Published • 12 -
YaRN: Efficient Context Window Extension of Large Language Models
Paper • 2309.00071 • Published • 65 -
PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training
Paper • 2309.10400 • Published • 26 -
Dynamically Relative Position Encoding-Based Transformer for Automatic Code Edit
Paper • 2205.13522 • Published • 1