A newer version of the Gradio SDK is available:
5.20.1
metadata
title: Vibesmark Test Suite
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.13.1
app_file: app.py
pinned: false
Vibesmark Test Suite
A benchmarking tool for comparing different language models side by side. This application allows users to:
- Upload custom test questions
- Compare responses from different language models
- Record preferences between model outputs
- Generate summary statistics of model performance
Setup
- Create a
.env
file with your OpenRouter API credentials:
OPENROUTER_API_KEY=your_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions
- Install dependencies:
pip install -r requirements.txt
- Run the application:
python app.py
Usage
- Select two models to compare
- Upload a text file containing test questions (one per line)
- Start the test and evaluate responses
- View results summary when finished
Deployment
This app is ready to deploy on Hugging Face Spaces. Just add your OpenRouter API credentials as secrets in your Space settings.
Features
- Compare responses from different AI models side by side
- Supports up to 10 questions per benchmark
- Randomly selects different models for comparison
- Real-time response generation
Supported Models
- Claude 3 Opus
- Claude 3 Sonnet
- Gemini Pro
- Mistral Medium
- Claude 2.1
- GPT-4 Turbo
- GPT-3.5 Turbo
License
[Your chosen license]
Run it with
python app.py