m-ric
's Collections
🚀 Spinning Up in LLMs
updated
Lost in the Middle: How Language Models Use Long Contexts
Paper
•
2307.03172
•
Published
•
38
Efficient Estimation of Word Representations in Vector Space
Paper
•
1301.3781
•
Published
•
6
BERT: Pre-training of Deep Bidirectional Transformers for Language
Understanding
Paper
•
1810.04805
•
Published
•
16
Attention Is All You Need
Paper
•
1706.03762
•
Published
•
50
Language Models are Few-Shot Learners
Paper
•
2005.14165
•
Published
•
12
Llama 2: Open Foundation and Fine-Tuned Chat Models
Paper
•
2307.09288
•
Published
•
244
Emergent Abilities of Large Language Models
Paper
•
2206.07682
•
Published
•
3
Scaling Laws for Neural Language Models
Paper
•
2001.08361
•
Published
•
7
Are Emergent Abilities of Large Language Models a Mirage?
Paper
•
2304.15004
•
Published
•
6
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Paper
•
2201.11903
•
Published
•
9
Judging LLM-as-a-judge with MT-Bench and Chatbot Arena
Paper
•
2306.05685
•
Published
•
33
Training Compute-Optimal Large Language Models
Paper
•
2203.15556
•
Published
•
10
Neural Machine Translation of Rare Words with Subword Units
Paper
•
1508.07909
•
Published
•
4
Jamba: A Hybrid Transformer-Mamba Language Model
Paper
•
2403.19887
•
Published
•
107
Paper
•
2401.04088
•
Published
•
158
Mixture-of-Depths: Dynamically allocating compute in transformer-based
language models
Paper
•
2404.02258
•
Published
•
104
Textbooks Are All You Need
Paper
•
2306.11644
•
Published
•
143
Rho-1: Not All Tokens Are What You Need
Paper
•
2404.07965
•
Published
•
90
Large Language Models Struggle to Learn Long-Tail Knowledge
Paper
•
2211.08411
•
Published
•
3
Large Language Models are Zero-Shot Reasoners
Paper
•
2205.11916
•
Published
•
1