|
--- |
|
license: apache-2.0 |
|
datasets: |
|
- THUdyh/Oryx-Image-Data |
|
base_model: |
|
- Qwen/Qwen2-7B-Instruct |
|
pipeline_tag: text-generation |
|
language: |
|
- en |
|
- zh |
|
--- |
|
# Oryx-7B-Image |
|
|
|
## Model Summary |
|
|
|
The Oryx-Image models are 7/34B parameter models trained on [Oryx-Image-Data](https://huggingface.co/datasets/THUdyh/Oryx-Image-Data), based on Qwen2 language model with a context window of 32K tokens. |
|
|
|
Oryx offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths. |
|
|
|
- **Repository:** https://github.com/Oryx-mllm/Oryx |
|
- **Languages:** English, Chinese |
|
- **Paper:** https://arxiv.org/abs/2409.12961 |
|
|
|
|
|
### Model Architecture |
|
|
|
- **Architecture:** Pre-trained [Oryx-ViT](https://huggingface.co/THUdyh/Oryx-ViT) + Qwen2-7B |
|
- **Data:** a mixture of 4M image data |
|
- **Precision:** BFloat16 |
|
|
|
#### Hardware & Software |
|
|
|
- **Hardware:** 64 * NVIDIA Tesla A100 |
|
- **Orchestration:** HuggingFace Trainer |
|
- **Code:** Pytorch |
|
|
|
## Citation |