Spaces:

atin121
/

VibesMark

Sleeping

VibesMark / README.md

Update README.md

a0dc44c verified about 1 month ago

1.48 kB

	---
	title: Vibesmark Test Suite
	emoji: 🎯
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.13.1
	app_file: app.py
	pinned: false
	---

	# Vibesmark Test Suite

	A benchmarking tool for comparing different language models side by side. This application allows users to:

	- Upload custom test questions
	- Compare responses from different language models
	- Record preferences between model outputs
	- Generate summary statistics of model performance

	## Setup

	1. Create a `.env` file with your OpenRouter API credentials:
	```
	OPENROUTER_API_KEY=your_api_key_here
	OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the application:
	```bash
	python app.py
	```

	## Usage

	1. Select two models to compare
	2. Upload a text file containing test questions (one per line)
	3. Start the test and evaluate responses
	4. View results summary when finished

	## Deployment

	This app is ready to deploy on Hugging Face Spaces. Just add your OpenRouter API credentials as secrets in your Space settings.

	## Features

	- Compare responses from different AI models side by side
	- Supports up to 10 questions per benchmark
	- Randomly selects different models for comparison
	- Real-time response generation

	## Supported Models

	- Claude 3 Opus
	- Claude 3 Sonnet
	- Gemini Pro
	- Mistral Medium
	- Claude 2.1
	- GPT-4 Turbo
	- GPT-3.5 Turbo

	## License

	[Your chosen license]

	Run it with
	`python app.py`