Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,34 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# PixelBytes: Unified Multimodal Generation
|
2 |
+
|
3 |
+
Welcome to the **PixelBytes** repository! This project features models designed to generate text and images simultaneously, pixel by pixel, using a unified embedding.
|
4 |
+
|
5 |
+
## Overview
|
6 |
+
|
7 |
+
### Key Concepts
|
8 |
+
- **Image Transformer**: Generates images pixel by pixel.
|
9 |
+
- **Bi-Mamba+**: A bidirectional model for time series prediction.
|
10 |
+
- **MambaByte**: A selective state-space model without tokens.
|
11 |
+
|
12 |
+
The PixelByte model generates mixed sequences of text and images, handling transitions with line breaks and maintaining image dimension consistency.
|
13 |
+
|
14 |
+
## Dataset
|
15 |
+
|
16 |
+
We use the **PixelBytes-Pokemon** dataset, available on Hugging Face: [PixelBytes-Pokemon](https://huggingface.co/datasets/ffurfaro/PixelBytes-Pokemon). It contains text and image sequences of Pokémon for training our model.
|
17 |
+
|
18 |
+
## Models Trained
|
19 |
+
|
20 |
+
- **8 LSTM Models**: Bidirectional + 1, 2, 3 layers (including p_embed + bi-2 layers)
|
21 |
+
- **6 Mamba Models**: Bidirectional + 1, 2, 3 layers
|
22 |
+
- **3 Transformer Models**: 1, 2, 3 layers
|
23 |
+
|
24 |
+
## Pre-test
|
25 |
+
|
26 |
+
Before training the LSTMs, we will test the pembed-bi-2 LSTM for generation. The model generates the next central element, reconstructing a 2D structure.
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
Thank you for exploring **PixelBytes**! We hope this model aids your multimodal generation projects.
|
31 |
+
|
32 |
+
---
|
33 |
+
license: mit
|
34 |
+
---
|