VibesMark / README.md
atin121's picture
Update README.md
a0dc44c verified

A newer version of the Gradio SDK is available: 5.20.1

Upgrade
metadata
title: Vibesmark Test Suite
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.13.1
app_file: app.py
pinned: false

Vibesmark Test Suite

A benchmarking tool for comparing different language models side by side. This application allows users to:

  • Upload custom test questions
  • Compare responses from different language models
  • Record preferences between model outputs
  • Generate summary statistics of model performance

Setup

  1. Create a .env file with your OpenRouter API credentials:
OPENROUTER_API_KEY=your_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python app.py

Usage

  1. Select two models to compare
  2. Upload a text file containing test questions (one per line)
  3. Start the test and evaluate responses
  4. View results summary when finished

Deployment

This app is ready to deploy on Hugging Face Spaces. Just add your OpenRouter API credentials as secrets in your Space settings.

Features

  • Compare responses from different AI models side by side
  • Supports up to 10 questions per benchmark
  • Randomly selects different models for comparison
  • Real-time response generation

Supported Models

  • Claude 3 Opus
  • Claude 3 Sonnet
  • Gemini Pro
  • Mistral Medium
  • Claude 2.1
  • GPT-4 Turbo
  • GPT-3.5 Turbo

License

[Your chosen license]

Run it with python app.py