{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Duplicate checker\n", "----\n", "\n", "This Python script checks for duplicate tags within text files (`.txt`) in a specified directory and its subdirectories. It reads each text file, splits its content into tags separated by commas, and identifies any duplicates. If duplicates are found, it prints out a message indicating the file where the duplicates were found and the duplicate tags themselves. Finally, the `check_tags_in_directory` function iterates through the directory and calls `check_duplicate_tags` for each text file found." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "\n", "def check_duplicate_tags(file_path):\n", " with open(file_path, 'r') as file:\n", " tags = file.read().split(', ')\n", " duplicates = set()\n", " unique_tags = set()\n", " for tag in tags:\n", " if tag in unique_tags:\n", " duplicates.add(tag)\n", " else:\n", " unique_tags.add(tag)\n", " if duplicates:\n", " print(f\"Duplicate tags found in {file_path}: {', '.join(duplicates)}\")\n", "\n", "def check_tags_in_directory(directory):\n", " for root, _, files in os.walk(directory):\n", " for file_name in files:\n", " if file_name.endswith('.txt'):\n", " file_path = os.path.join(root, file_name)\n", " check_duplicate_tags(file_path)\n", "\n", "if __name__ == \"__main__\":\n", " directory_path = r'C:\\Users\\kade\\Desktop\\training_dir_staging'\n", " check_tags_in_directory(directory_path)\n" ] } ], "metadata": { "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 2 }