VibesMark / README.md
atin121's picture
Update README.md
a0dc44c verified
|
raw
history blame
1.48 kB
---
title: Vibesmark Test Suite
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.13.1
app_file: app.py
pinned: false
---
# Vibesmark Test Suite
A benchmarking tool for comparing different language models side by side. This application allows users to:
- Upload custom test questions
- Compare responses from different language models
- Record preferences between model outputs
- Generate summary statistics of model performance
## Setup
1. Create a `.env` file with your OpenRouter API credentials:
```
OPENROUTER_API_KEY=your_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the application:
```bash
python app.py
```
## Usage
1. Select two models to compare
2. Upload a text file containing test questions (one per line)
3. Start the test and evaluate responses
4. View results summary when finished
## Deployment
This app is ready to deploy on Hugging Face Spaces. Just add your OpenRouter API credentials as secrets in your Space settings.
## Features
- Compare responses from different AI models side by side
- Supports up to 10 questions per benchmark
- Randomly selects different models for comparison
- Real-time response generation
## Supported Models
- Claude 3 Opus
- Claude 3 Sonnet
- Gemini Pro
- Mistral Medium
- Claude 2.1
- GPT-4 Turbo
- GPT-3.5 Turbo
## License
[Your chosen license]
Run it with
`python app.py`