---
license: apache-2.0
language:
- en
---

# ChessCLIP

ChessCLIP is a CLIP model trained to align (board, action) representation with natural language and calculate the similarity in Chess game.

## Model Details
- **Language(s)**: English
- **License**: Apache 2.0
- **Model Description**: A CLIP model for chess.

# Quick Start

```bash
git clone https://github.com/waterhorse1/ChessGPT
```
Clone our codebase and install all dependencies according to our README.

You can also refer to https://github.com/waterhorse1/ChessGPT/blob/main/chessclip_demo.ipynb for a demo notebook of ChessCLIP.

## Inference

```python
import sys
sys.path.append('./chessclip/src')
import torch
import io
import chess.pgn
import numpy as np
from chess_ai.feature_converter import get_lc0_input_planes_tf
from chess_ai.datasets.tfds.pgn_base import generate_examples_from_game_no_comment

from open_clip.factory import get_tokenizer, load_checkpoint

# init
model_name = 'chessclip-quickgelu'
model = open_clip.create_model(model_name, pretrained='openai')
tokenizer = get_tokenizer(model_name)

# load model
load_checkpoint(model, './ChessCLIP/epoch_latest.pt')

# check parameters
model.eval()
context_length = model.text.context_length
vocab_size = model.text.vocab_size

print("Model parameters:", f"{np.sum([int(np.prod(p.shape)) for p in model.parameters()]):,}")
print("Context length:", context_length)
print("Vocab size:", vocab_size)

# generate board/action embedding based on pgn string
def generate_representation_for_final(pgn):
    game = chess.pgn.read_game(io.StringIO(pgn))
    data = list(generate_examples_from_game_no_comment(game))[-1]
    for key in data.keys():
        data[key] = np.array(data[key])
    board = get_lc0_input_planes_tf(data).numpy()
    action = data['probs']
    return board, action

# Prepare input
prompt = "Black plays Sicilian Defense"
pgn_str = '1. e4 c5'
board, action = generate_representation_for_final(pgn_str)
text_tokens = tokenizer([prompt])

image_input = torch.from_numpy(np.stack([board], axis=0))
action_input = torch.from_numpy(np.stack([action], axis=0))

# infer
with torch.no_grad():
    image_features = model.encode_image((image_input, action_input)).float()
    text_features = model.encode_text(text_tokens).float()
image_features /= image_features.norm(dim=-1, keepdim=True) # n * dim
text_features /= text_features.norm(dim=-1, keepdim=True)# m * dim
similarity = text_features.cpu().numpy() @ image_features.cpu().numpy().T # m * n
print(similarity)
```

## Limitations
"ChessCLIP," like other CLIP-based models, has certain limitations that need to be taken into consideration. For instance, the model may produce incorrect similarities, especially when faced with complex, ambiguous, or language inputs that fall outside its training data.

We highly appreciate contributions from individuals and organizations to enhance the model's performance and stability. Specifically, we welcome annotated data, such as annotated PGN (Portable Game Notation), which can be utilized to train a more robust and reliable CLIP model.

## Benchmark

Please refer to our [paper](https://together.xyz) and [code](https://github.com/waterhorse1/ChessGPT)for benchmark results.