Vibes Benchmark v0.1
A tool for benchmarking different AI models by comparing their responses to custom questions.
Prerequisites
- Python 3.8 or higher
- An OpenRouter API key (Get one here)
Setup
Clone the repository:
git clone [repository-url] cd vibes-benchmark
Install dependencies:
pip install -r requirements.txt
Configure environment variables:
cp .env.example .env
Then edit
.env
and add your OpenRouter API key
Usage
- Prepare a text file with your questions (one per line)
- Run the application:
python app.py
- Upload your questions file through the web interface
- Click "Run Benchmark" to start comparing model responses
Features
- Compare responses from different AI models side by side
- Supports up to 10 questions per benchmark
- Randomly selects different models for comparison
- Real-time response generation
Supported Models
- Claude 3 Opus
- Claude 3 Sonnet
- Gemini Pro
- Mistral Medium
- Claude 2.1
- GPT-4 Turbo
- GPT-3.5 Turbo
License
[Your chosen license]
Run it with
python app.py