OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installedOpenCLIP not installed OpenCLIP not installedOpenCLIP not installedOpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed OpenCLIP not installed [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:00,887] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,027] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,080] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,080] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,080] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,080] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,080] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,080] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,080] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,081] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,114] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,145] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,167] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,244] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,244] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,244] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,244] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,244] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,244] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,244] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,245] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,339] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,361] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,361] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,361] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,361] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,361] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,361] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,361] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,362] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,363] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,363] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,363] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,363] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,364] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,364] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,364] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,364] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,435] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,435] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,435] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,435] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,435] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,435] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,435] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,436] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,442] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:01,800] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,530] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:02,996] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,237] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,237] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,237] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,237] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,238] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,240] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,241] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:05,251] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cuda (auto detect) [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,443] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,444] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,444] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,444] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,444] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,444] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,444] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,446] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,517] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,517] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,517] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,517] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,517] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,517] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,517] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,518] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,542] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,547] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,547] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,547] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,547] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,555] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,555] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,555] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,555] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,618] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,660] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,764] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,764] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,764] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,764] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,764] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,764] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,764] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,765] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,798] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,798] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,798] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,798] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,798] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,798] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,805] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,805] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,828] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,856] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,856] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,856] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,856] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,856] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,856] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,856] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:06,857] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,173] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,173] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,173] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,173] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,173] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,174] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,174] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,176] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,519] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,519] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,519] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,519] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,519] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,519] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,519] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:07,520] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) [2025-02-16 06:18:08,299] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:08,299] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) [2025-02-16 06:18:08,303] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:08,304] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:08,304] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:08,304] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:08,304] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:08,306] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-165, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspeval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspeval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspload_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-74, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-52, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-113, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_tpush_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ype='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_asppush_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspload_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_teval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, ype='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspeval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_asplocal_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-170, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-172, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-141, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspeval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-65, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_tModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspype='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, ect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-206, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-179, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-75, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-183, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-07_h100-st-p548xlarge-112, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-08_h100-st-p548xlarge-42, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-06_h100-st-p548xlarge-169, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) [2025-02-16 06:18:10,566] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:10,566] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:10,566] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:10,566] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:10,566] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:10,566] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:10,566] [INFO] [comm.py:683:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl [2025-02-16 06:18:10,566] [INFO] [comm.py:652:init_distributed] cdb=None [2025-02-16 06:18:10,567] [INFO] [comm.py:652:init_distributed] cdb=None ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=0, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=4, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=5, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=1, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=6, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4')ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square')DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') ModelArguments(model_name_or_path='Qwen/Qwen2.5-VL-7B-Instruct', version='qwen', freeze_backbone=True, tune_mm_mlp_adapter=False, vision_tower=None, gen_vision_tower='eva-clip-E-14-plus', mm_vision_select_layer=-2, pretrain_mm_mlp_adapter=None, pretrain_gen_mlp_adapter=None, vision_tower_pretrained=None, mm_projector_type='mlp2x_gelu', gen_projector_type='mlp2x_gelu', mm_use_im_start_end=False, mm_use_im_patch_token=False, mm_patch_merge_type='flat', mm_vision_select_feature='patch', n_query=64, n_und_query=729, gen_pooling='early_pool2d_4') DataArguments(data_path='/fsx_0/user/zhaojiang/data/ShareGPT4V/pixelporse_sharegpt4v_text_image_both.json', lazy_preprocess=True, is_multimodal=False, image_folder='/fsx_0/user/zhaojiang/data/LLaVA-Instruct-150K', pixelprose_image_folder='/fsx_0/user/zhaojiang/models/hub/datasets--tomg-group-umd--pixelprose-shards/snapshots/36facc0ec7ff5ee9bdde1c2e217b3d7999b58411', datacomp_shortcaption_image_folder=None, datacomp_longcaption_image_folder=None, data_type='mix', image_aspect_ratio='square') TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=7, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, )TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=2, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) TrainingArguments( _n_gpu=1, accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False}, adafactor=False, adam_beta1=0.9, adam_beta2=0.999, adam_epsilon=1e-08, auto_find_batch_size=False, average_tokens_across_devices=False, batch_eval_metrics=False, bf16=True, bf16_full_eval=False, bits=16, cache_dir=None, data_seed=None, dataloader_drop_last=False, dataloader_num_workers=4, dataloader_persistent_workers=False, dataloader_pin_memory=True, dataloader_prefetch_factor=None, ddp_backend=None, ddp_broadcast_buffers=None, ddp_bucket_cap_mb=None, ddp_find_unused_parameters=None, ddp_timeout=1800, debug=[], deepspeed=./scripts/zero1.json, disable_tqdm=False, dispatch_batches=None, do_eval=False, do_predict=False, do_train=False, double_quant=True, eval_accumulation_steps=None, eval_delay=0, eval_do_concat_batches=True, eval_on_start=False, eval_steps=None, eval_strategy=no, eval_use_gather_object=False, evaluation_strategy=None, fp16=False, fp16_backend=auto, fp16_full_eval=False, fp16_opt_level=O1, freeze_mm_mlp_adapter=False, fsdp=[], fsdp_config={'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}, fsdp_min_num_params=0, fsdp_transformer_layer_cls_to_wrap=None, full_determinism=False, gradient_accumulation_steps=1, gradient_checkpointing=True, gradient_checkpointing_kwargs=None, greater_is_better=None, group_by_length=False, group_by_modality_length=False, half_precision_backend=auto, hub_always_push=False, hub_model_id=zhaojiang/llava-clip-text-image-16-nodes, hub_private_repo=None, hub_strategy=every_save, hub_token=, ignore_data_skip=False, include_for_metrics=[], include_inputs_for_metrics=False, include_num_input_tokens_seen=False, include_tokens_per_second=False, jit_mode_eval=False, label_names=None, label_smoothing_factor=0.0, learning_rate=0.0001, length_column_name=length, load_best_model_at_end=False, local_rank=3, log_level=passive, log_level_replica=warning, log_on_each_node=True, logging_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen/runs/Feb16_06-18-10_h100-st-p548xlarge-41, logging_first_step=False, logging_nan_inf_filter=True, logging_steps=1.0, logging_strategy=steps, lora_alpha=16, lora_bias=none, lora_dropout=0.05, lora_enable=False, lora_r=64, lora_weight_path=, lr_scheduler_kwargs={}, lr_scheduler_type=constant_with_warmup, max_grad_norm=1.0, max_steps=-1, metric_for_best_model=None, mm_projector_lr=None, model_max_length=2048, mp_parameters=, mpt_attn_impl=triton, neftune_noise_alpha=None, no_cuda=False, num_train_epochs=3.0, optim=adamw_torch, optim_args=None, optim_target_modules=None, output_dir=/fsx_0/user/zhaojiang/models/qwen-vl-gen, overwrite_output_dir=False, past_index=-1, per_device_eval_batch_size=4, per_device_train_batch_size=8, prediction_loss_only=False, push_to_hub=False, push_to_hub_model_id=None, push_to_hub_organization=None, push_to_hub_token=, quant_type=nf4, ray_scope=last, remove_unused_columns=False, report_to=['wandb'], restore_callback_states_from_checkpoint=False, resume_from_checkpoint=None, run_name=qwen-vl-diff-clip-16-nodes_early_pool2d_4, save_on_each_node=False, save_only_model=False, save_safetensors=True, save_steps=1000, save_strategy=steps, save_total_limit=1, seed=42, skip_memory_metrics=True, split_batches=None, tf32=True, torch_compile=False, torch_compile_backend=None, torch_compile_mode=None, torch_empty_cache_steps=None, torchdynamo=None, tpu_metrics_debug=False, tpu_num_cores=None, use_cpu=False, use_ipex=False, use_legacy_prediction_loop=False, use_liger_kernel=False, use_mps_device=False, warmup_ratio=0.01, warmup_steps=0, weight_decay=0.0, ) Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Using conversation format: qwen Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Pretrained: NonePretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: NoneLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Pretrained: NonePretrained: NoneLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: NonePretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: NonePretrained: None Pretrained: NonePretrained: None Pretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plusPretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Pretrained: NonePretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plusLoading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Pretrained: None Pretrained: NonePretrained: None Pretrained: None Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None Loading EVA ViT: eva-clip-E-14-plus Pretrained: None