Spaces:

atin121
/

VibesMark

Sleeping

App Files Files Community

VibesMark / README.md

atin121's picture

Update README.md

a0dc44c verified about 1 month ago

|

history blame contribute delete

1.48 kB

A newer version of the Gradio SDK is available: 5.20.1

Upgrade

metadata

title: Vibesmark Test Suite
emoji: 🎯
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.13.1
app_file: app.py
pinned: false

Vibesmark Test Suite

A benchmarking tool for comparing different language models side by side. This application allows users to:

Upload custom test questions
Compare responses from different language models
Record preferences between model outputs
Generate summary statistics of model performance

Setup

Create a .env file with your OpenRouter API credentials:

OPENROUTER_API_KEY=your_api_key_here
OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions

Install dependencies:

pip install -r requirements.txt

Run the application:

python app.py

Usage

Select two models to compare
Upload a text file containing test questions (one per line)
Start the test and evaluate responses
View results summary when finished

Deployment

This app is ready to deploy on Hugging Face Spaces. Just add your OpenRouter API credentials as secrets in your Space settings.

Features

Compare responses from different AI models side by side
Supports up to 10 questions per benchmark
Randomly selects different models for comparison
Real-time response generation

Supported Models

Claude 3 Opus
Claude 3 Sonnet
Gemini Pro
Mistral Medium
Claude 2.1
GPT-4 Turbo
GPT-3.5 Turbo

License

[Your chosen license]

Run it with python app.py