add library (#1)

681ac80 verified 3 days ago

4.58 kB

	---
	datasets:
	- stanfordnlp/imdb
	language:
	- en
	library_name: swarmformer
	---

	# Model Card for SwarmFormer-Base

	SwarmFormer-Base is a compact transformer variant that achieves competitive performance on text classification tasks through a hierarchical architecture combining local swarm-based updates with cluster-level global attention.

	## Model Details

	### Model Description
	SwarmFormer-Base consists of:
	- Token embedding layer with heavy dropout (0.4)
	- Multiple SwarmFormer layers
	- Mean pooling layer
	- Final classification layer
	- Comprehensive dropout throughout (0.3-0.4)

	- Developed by: Jordan Legg, Mikus Sturmanis, Takara.ai
	- Funded by: Takara.ai
	- Shared by: Takara.ai
	- Model type: Hierarchical transformer
	- Language(s): English
	- License: Not specified
	- Finetuned from model: Trained from scratch

	### Model Sources
	- Repository: https://github.com/takara-ai/SwarmFormer
	- Paper: "SwarmFormer: Local-Global Hierarchical Attention via Swarmed Token Representations"
	- Demo: Not available

	## Uses

	### Direct Use
	- Text classification
	- Sentiment analysis
	- Document processing

	### Downstream Use
	- Feature extraction for NLP tasks
	- Transfer learning
	- Building block for larger systems

	### Out-of-Scope Use
	- Text generation
	- Machine translation
	- Tasks requiring >768 tokens
	- Real-time processing without adequate hardware

	## Bias, Risks, and Limitations
	- Fixed cluster size (4 tokens)
	- Maximum sequence length: 768 tokens
	- Potential information loss in clustering
	- Limited evaluation (English text classification only)

	## Training Details

	### Training Data
	- Dataset: IMDB Movie Review (50k samples)
	- Augmentation techniques:
	- Sentence-level shuffling
	- Controlled synonym replacement
	- Hierarchical sample creation

	### Training Procedure

	#### Model Architecture Details
	1. Token Embedding Layer:
	```python
	- Embedding layer (vocab_size → d_model)
	- Dropout rate: 0.4
	```

	2. Local Swarm Aggregator:
	```python
	- Input processing dropout: 0.3
	- Local aggregation MLP:
	- Linear(d_model → d_model)
	- GELU activation
	- Dropout(0.3)
	- Linear(d_model → d_model)
	- Gate network:
	- Linear(2*d_model → d_model)
	- GELU activation
	- Linear(d_model → d_model)
	- Sigmoid activation
	- Output dropout: 0.3
	```

	3. Clustering Mechanism:
	- Groups tokens into fixed-size clusters (size=4)
	- Computes mean representation per cluster

	4. Global Cluster Attention:
	```python
	- Query/Key/Value projections: Linear(d_model → d_model)
	- Scaled dot-product attention
	- Attention dropout: 0.3
	- Output dropout: 0.3
	```

	5. Broadcast Updater:
	```python
	- Linear projection: d_model → d_model
	- Dropout: 0.1
	- Gate network:
	- Linear(2*d_model → d_model)
	- GELU activation
	- Linear(d_model → d_model)
	- Sigmoid activation
	```

	#### Training Hyperparameters
	- Embedding dimension: 192
	- Number of layers: 2
	- Local update steps (T_local): 3
	- Cluster size: 4
	- Batch size: 48
	- Learning rate: 4.74 × 10⁻⁴
	- Weight decay: 0.0381
	- Dropout rates:
	- Embedding: 0.4
	- Local aggregation: 0.3
	- Attention: 0.3
	- Final: 0.4

	## Evaluation

	### Testing Data, Factors & Metrics
	- IMDB test split (25k samples)
	- Full FP32 inference
	- Batch size: 256

	### Results
	- Accuracy: 89.03%
	- Precision: 87.22%
	- Recall: 91.46%
	- F1: 89.29%
	- Mean batch latency: 4.83ms
	- Peak memory: 9.13GB

	## Technical Specifications

	### Model Architecture and Objective
	Complete architecture flow:
	1. Input → Token Embedding (with dropout)
	2. For each layer:
	- Multiple iterations of Local Swarm Updates
	- Cluster Formation
	- Global Attention between clusters
	- Broadcast updates back to tokens
	3. Mean pooling across sequence
	4. Final dropout and classification

	### Compute Infrastructure
	- GPU: NVIDIA RTX 2080 Ti or equivalent
	- VRAM: 10GB+ recommended
	- Framework: PyTorch

	### Software Requirements
	```python
	import torch
	import torch.nn as nn
	```

	## Citation

	```bibtex
	@article{legg2025swarmformer,
	title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
	author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
	journal={Takara.ai Research},
	year={2025},
	url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
	}
	```

	## Model Card Authors
	Jordan Legg, Mikus Sturmanis, Takara.ai Research Team

	## Model Card Contact
	[email protected]