takarajordan's picture
add library (#1)
681ac80 verified
---
datasets:
- stanfordnlp/imdb
language:
- en
library_name: swarmformer
---
# Model Card for SwarmFormer-Base
SwarmFormer-Base is a compact transformer variant that achieves competitive performance on text classification tasks through a hierarchical architecture combining local swarm-based updates with cluster-level global attention.
## Model Details
### Model Description
SwarmFormer-Base consists of:
- Token embedding layer with heavy dropout (0.4)
- Multiple SwarmFormer layers
- Mean pooling layer
- Final classification layer
- Comprehensive dropout throughout (0.3-0.4)
- **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai
- **Funded by**: Takara.ai
- **Shared by**: Takara.ai
- **Model type**: Hierarchical transformer
- **Language(s)**: English
- **License**: Not specified
- **Finetuned from model**: Trained from scratch
### Model Sources
- **Repository**: https://github.com/takara-ai/SwarmFormer
- **Paper**: "SwarmFormer: Local-Global Hierarchical Attention via Swarmed Token Representations"
- **Demo**: Not available
## Uses
### Direct Use
- Text classification
- Sentiment analysis
- Document processing
### Downstream Use
- Feature extraction for NLP tasks
- Transfer learning
- Building block for larger systems
### Out-of-Scope Use
- Text generation
- Machine translation
- Tasks requiring >768 tokens
- Real-time processing without adequate hardware
## Bias, Risks, and Limitations
- Fixed cluster size (4 tokens)
- Maximum sequence length: 768 tokens
- Potential information loss in clustering
- Limited evaluation (English text classification only)
## Training Details
### Training Data
- Dataset: IMDB Movie Review (50k samples)
- Augmentation techniques:
- Sentence-level shuffling
- Controlled synonym replacement
- Hierarchical sample creation
### Training Procedure
#### Model Architecture Details
1. **Token Embedding Layer**:
```python
- Embedding layer (vocab_size β†’ d_model)
- Dropout rate: 0.4
```
2. **Local Swarm Aggregator**:
```python
- Input processing dropout: 0.3
- Local aggregation MLP:
- Linear(d_model β†’ d_model)
- GELU activation
- Dropout(0.3)
- Linear(d_model β†’ d_model)
- Gate network:
- Linear(2*d_model β†’ d_model)
- GELU activation
- Linear(d_model β†’ d_model)
- Sigmoid activation
- Output dropout: 0.3
```
3. **Clustering Mechanism**:
- Groups tokens into fixed-size clusters (size=4)
- Computes mean representation per cluster
4. **Global Cluster Attention**:
```python
- Query/Key/Value projections: Linear(d_model β†’ d_model)
- Scaled dot-product attention
- Attention dropout: 0.3
- Output dropout: 0.3
```
5. **Broadcast Updater**:
```python
- Linear projection: d_model β†’ d_model
- Dropout: 0.1
- Gate network:
- Linear(2*d_model β†’ d_model)
- GELU activation
- Linear(d_model β†’ d_model)
- Sigmoid activation
```
#### Training Hyperparameters
- Embedding dimension: 192
- Number of layers: 2
- Local update steps (T_local): 3
- Cluster size: 4
- Batch size: 48
- Learning rate: 4.74 Γ— 10⁻⁴
- Weight decay: 0.0381
- Dropout rates:
- Embedding: 0.4
- Local aggregation: 0.3
- Attention: 0.3
- Final: 0.4
## Evaluation
### Testing Data, Factors & Metrics
- IMDB test split (25k samples)
- Full FP32 inference
- Batch size: 256
### Results
- Accuracy: 89.03%
- Precision: 87.22%
- Recall: 91.46%
- F1: 89.29%
- Mean batch latency: 4.83ms
- Peak memory: 9.13GB
## Technical Specifications
### Model Architecture and Objective
Complete architecture flow:
1. Input β†’ Token Embedding (with dropout)
2. For each layer:
- Multiple iterations of Local Swarm Updates
- Cluster Formation
- Global Attention between clusters
- Broadcast updates back to tokens
3. Mean pooling across sequence
4. Final dropout and classification
### Compute Infrastructure
- GPU: NVIDIA RTX 2080 Ti or equivalent
- VRAM: 10GB+ recommended
- Framework: PyTorch
### Software Requirements
```python
import torch
import torch.nn as nn
```
## Citation
```bibtex
@article{legg2025swarmformer,
title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
journal={Takara.ai Research},
year={2025},
url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}
```
## Model Card Authors
Jordan Legg, Mikus Sturmanis, Takara.ai Research Team
## Model Card Contact
[email protected]