FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 13
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 13 • 2
MobileCLIP Models + DataCompDR Data Collection MobileCLIP: Mobile-friendly image-text models with SOTA zero-shot capabilities. DataCompDR: Improved datasets for training image-text SOTA models. • 22 items • Updated Oct 4, 2024 • 26
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 51
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17, 2024 • 51
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks Paper • 2405.08911 • Published May 14, 2024 • 1
APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations Paper • 2210.03927 • Published Oct 8, 2022
Reinforce Data, Multiply Impact: Improved Model Accuracy and Robustness with Dataset Reinforcement Paper • 2303.08983 • Published Mar 15, 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding Paper • 2310.15308 • Published Oct 23, 2023 • 23
CLIP meets Model Zoo Experts: Pseudo-Supervision for Visual Enhancement Paper • 2310.14108 • Published Oct 21, 2023 • 1