AdamG012
/

Llama-2-13b-deepspeed-visualchat

Model card Files Files and versions Community

Llama-2-13b-deepspeed-visualchat / README.md

xiaoxia-microsoft's picture

xiaoxia-microsoft

Update README.md

cbb4b96 about 1 year ago

|

856 Bytes

	---
	language:
	- en
	tags:
	- deepspeed
	- visualchat
	- multi-image
	- causal
	- chat
	license: apache-2.0
	datasets:
	- openai/clip-vit-large-patch14
	---
	---

	# Llama-2-13b-deepspeed-visualchat

	> ATTENTION: this encoder needs QwenCLIP model

	DeepSpeed-VisualChat is a scalable, efficient, and user-friendly multi-modal training pipeline that leverages a novel multi-modal causal attention mechanism for better alignment of visual and text features. It uses data blending techniques to address the scarcity of interleaved text-and-image inputs in datasets.


	The framework trains using a 2B visual encoder from QWen-VL and a 13B-70B language decoder from LLaMA-2, showcasing its extraordinary scalability. DeepSpeed-VisualChat is now open-sourced and encourages community contributions and collaborations. Visit the GitHub page to get started.