---
title: Vibesmark Test Suite
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.13.1
app_file: app.py
pinned: false
---

# Vibesmark Test Suite

A benchmarking tool for comparing different language models side by side. This application allows users to:

- Upload custom test questions
- Compare responses from different language models
- Record preferences between model outputs
- Generate summary statistics of model performance

## Setup

1. Create a `.env` file with your OpenRouter API credentials:
```
OPENROUTER_API_KEY=your_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the application:
```bash
python app.py
```

## Usage

1. Select two models to compare
2. Upload a text file containing test questions (one per line)
3. Start the test and evaluate responses
4. View results summary when finished

## Deployment

This app is ready to deploy on Hugging Face Spaces. Just add your OpenRouter API credentials as secrets in your Space settings.

## Features

- Compare responses from different AI models side by side
- Supports up to 10 questions per benchmark
- Randomly selects different models for comparison
- Real-time response generation

## Supported Models

- Claude 3 Opus
- Claude 3 Sonnet
- Gemini Pro
- Mistral Medium
- Claude 2.1
- GPT-4 Turbo
- GPT-3.5 Turbo

## License

[Your chosen license]

Run it with 
`python app.py`