# Count Tokens in Sample Prompts

---

This script is a utility for analyzing prompts in text files, counting the tokens for each sample prompt, and displaying the results in a visually appealing table format using the Rich library. It also provides a warning if the positive prompt's token count exceeds a certain threshold (77 in this case).

This script is designed to process text files containing positive and negative prompts, count the number of tokens for each prompt, and display the results in a tabular format using the Rich library.

1. It imports the necessary libraries: `os` for file and directory operations, `tiktoken` for encoding and counting tokens, and `rich.console` and `rich.table` for creating a console interface and a table for displaying the results.

2. The `count_tokens` function takes a text input and returns the number of tokens in that text using the `tiktoken` library's `cl100k_base` encoding.

3. The script creates a `Console` object from the `rich` library.

4. It iterates through all files in the `E:\training_dir` directory that end with `-sample-prompts.txt`.

5. For each file, it opens the file and reads its contents line by line.

6. Each line is expected to be in the format `<positive_prompt> --n <negative_prompt> --<additional_arguments>`. The script splits the line at `--n` to separate the positive and negative prompts. As it works in 

7. It counts the number of tokens for both the positive and negative prompts using the `count_tokens` function.

8. A `Table` object from the `rich` library is created, and the positive and negative prompts, along with their token counts, are added to the table as separate rows. The positive prompts are displayed in green, and the negative prompts are displayed in red.

9. The table is printed to the console using the `Console.print` method.

10. If the positive prompt's token count exceeds 77 (75 plus a buffer of 2), a warning message is printed in bold red.

11. The script keeps track of the total number of prompts processed in the current file and prints it at the end.

In [9]:
import os
import tiktoken
from rich.console import Console
from rich.table import Table

def count_tokens(text):
    enc = tiktoken.get_encoding("cl100k_base")
    tokens = enc.encode(text)
    return len(tokens)

count_tokens("canine genitalia, knot")

console = Console()

for file in os.listdir("E:\\training_dir"):
    if file.endswith("-sample-prompts.txt"):
        file_path = os.path.join("E:\\training_dir", file)
        console.print(f"Processing file: {file_path}")

        prompt_count = 0
        with open(file_path, "r") as f:
            lines = f.readlines()

        for line in lines:
            if line.startswith("#"):
                continue
            parts = line.split("--n")
            positive_prompt = parts[0].strip()
            negative_prompt = parts[1].strip().split(" --")[0]

            positive_token_count = count_tokens(positive_prompt)
            negative_token_count = count_tokens(negative_prompt)

            table = Table()
            table.add_column("Prompt Type", justify="left")
            table.add_column("Prompt", justify="left")
            table.add_column("Token Count", justify="right")
            table.add_row("[green]Positive[/green]", positive_prompt, str(positive_token_count))
            table.add_row("[red]Negative[/red]", negative_prompt, str(negative_token_count))
            console.print(table)

            if positive_token_count > 77:
                console.print(
                    "[bold red]Warning: Positive prompt token count exceeds 75.[/bold red]"
                )
            prompt_count += 1

        console.print(f"[bold]Total number of prompts in {file}: {prompt_count}[/bold]")