--- title: Vibesmark Test Suite emoji: 🎯 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.13.1 app_file: app.py pinned: false --- # Vibesmark Test Suite A benchmarking tool for comparing different language models side by side. This application allows users to: - Upload custom test questions - Compare responses from different language models - Record preferences between model outputs - Generate summary statistics of model performance ## Setup 1. Create a `.env` file with your OpenRouter API credentials: ``` OPENROUTER_API_KEY=your_api_key_here OPENROUTER_BASE_URL=https://openrouter.ai/api/v1/chat/completions ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Run the application: ```bash python app.py ``` ## Usage 1. Select two models to compare 2. Upload a text file containing test questions (one per line) 3. Start the test and evaluate responses 4. View results summary when finished ## Deployment This app is ready to deploy on Hugging Face Spaces. Just add your OpenRouter API credentials as secrets in your Space settings. ## Features - Compare responses from different AI models side by side - Supports up to 10 questions per benchmark - Randomly selects different models for comparison - Real-time response generation ## Supported Models - Claude 3 Opus - Claude 3 Sonnet - Gemini Pro - Mistral Medium - Claude 2.1 - GPT-4 Turbo - GPT-3.5 Turbo ## License [Your chosen license] Run it with `python app.py`