Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.09353

Rho-1: Not All Tokens Are What You Need

Paper • 2404.07965 • Published Apr 11 • 84
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16 • 17
Instruction-tuned Language Models are Better Knowledge Learners

Paper • 2402.12847 • Published Feb 20 • 24
DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 26

Applied Machine Learning Papers

Reading List (Mainly Focused of VLM's and Diffusion Models)

about 18 hours ago

Scalable Diffusion Models with Transformers

Paper • 2212.09748 • Published Dec 19, 2022 • 16
Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets

Paper • 2311.15127 • Published Nov 25, 2023 • 12
Learning Transferable Visual Models From Natural Language Supervision

Paper • 2103.00020 • Published Feb 26, 2021 • 11
U-Net: Convolutional Networks for Biomedical Image Segmentation

Paper • 1505.04597 • Published May 18, 2015 • 7

Must Reads On Transformers and Diffusers

Explore the cutting-edge of AI with our curated list of must reads on Transformers & Diffusers, driving innovation in generative-AI and beyond.

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Paper • 1701.06538 • Published Jan 23, 2017 • 4
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 44
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

Paper • 2005.11401 • Published May 22, 2020 • 12
Language Model Evaluation Beyond Perplexity

Paper • 2106.00085 • Published May 31, 2021

Efficient Training

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 26
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 602
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

NLP Parameter Efficient Finetuning

Explore research optimizing NLP models for maximum performance with minimal computational resources.

Efficient Few-Shot Learning Without Prompts

Paper • 2209.11055 • Published Sep 22, 2022 • 3
Parameter-Efficient Transfer Learning for NLP

Paper • 1902.00751 • Published Feb 2, 2019 • 1
GPT Understands, Too

Paper • 2103.10385 • Published Mar 18, 2021 • 8
The Power of Scale for Parameter-Efficient Prompt Tuning

Paper • 2104.08691 • Published Apr 18, 2021 • 9

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 602
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection

Paper • 2403.03507 • Published Mar 6 • 182
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models

Paper • 2402.19427 • Published Feb 29 • 52
ResLoRA: Identity Residual Mapping in Low-Rank Adaption

Paper • 2402.18039 • Published Feb 28 • 11

Foundation AI Papers

Curated List of Must-Reads on LLM reasoning at Temus AI team

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 8
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 99
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 109

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 26
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 126
Resonance RoPE: Improving Context Length Generalization of Large Language Models

Paper • 2403.00071 • Published Feb 29 • 22
AtP*: An efficient and scalable method for localizing LLM behaviour to components

Paper • 2403.00745 • Published Mar 1 • 11

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 26

DoRA: Weight-Decomposed Low-Rank Adaptation

Paper • 2402.09353 • Published Feb 14 • 26
LESS: Selecting Influential Data for Targeted Instruction Tuning

Paper • 2402.04333 • Published Feb 6 • 3

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs